解决方案: 我一开始使用的版本是JDK1.8.0_181,这个也是官网上推荐的版本,但是安装Cloudera Manager Server的时候就是会报上面的错误,根据这边的报错指引,访问 http://openjdk.java.net/install/ ,通过yum方式安装换了一个JDK版本: -> su -c “yum -y install java-1.8.0-openjdk”
安装agent时报错:Transaction check error: file /usr/lib/systemd/system/supervisord.service from install of cloudera-manager-agent-6.3.1-1466458.el7.x86_64 conflicts with file from package supervisor-3.4.0-1.el7.noarch 解决方案: 运行以下命令卸载掉有冲突的包,在重新安装agent即可; -> yum -y remove supervisor-3.4.0-1.el7.noarch
安装 Oozie时报错:Oozie Check DB schema does not exist 解决方案:到oozie数据库中删除所有初始化的表结构,重新安装即可。
安装启动hue时报错:First failure: Failed to execute command Start on service Hue
解决方案: 在安装hbase时需要为hbase Thrift Server分配服务器,并且在hue配置中勾选HBaseThrift Server即可
创建 HDFS /tmp 目录报错:Command aborted because of exception: Command timed-out after 90 seconds 解决方案:因为HDFS处于安全模式状态,执行以下命令关闭安全模式即可 -> su - hdfs -> hdfs dfsadmin -safemode leave
安装启动hue server报错:Couldn’t import snappy. Support for snappy compression disabled.
解决方案:缺少软件包的问题,运行以下命令即可 -> sudo yum install krb5-devel cyrus-sasl-gssapi cyrus-sasl-deve libxml2-devel libxslt-devel mysql mysql-devel openldap-devel python-devel python-simplejson sqlite-devel
执行完命令后再次刷新页面,就可以正常访问了
安装完hbase后通过服务器的root账号运行hbase pe性能测试命令报错: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x 解决方案:这是由于用服务器的root账号运行hbase命令会默认用root账号将数据写入HDFS文件系统中,而hdfs上没有root用户,也没有对应的文件夹/user/root,因此没有写入权限。需要在HDFS上创建 /user/root 目录,并修改目录所有者为root账号即可; # 切换至hdfs账号 su - hdfs # 创建HDFS上的 /user/root 目录 hdfs dfs -mkdir /user/root # 修改HDFS上的 /user/root 目录权限 hdfs dfs -chown root:root /user/root 启动RegionServer报错: java.io.IOException: Problem binding to /0.0.0.0:60020 : Address already in use. 解决方案: # 登录启动失败的RegionServer服务器,查看60020端口占用情况: netstat -tap | grep 60020 tcp 0 0 slave02.insigma.c:60020 master01.insi:oa-system ESTABLISHED 13294/java # 通过ps命令查看对应进程号是谁在占用,发现原来是DataNode占用了13294的端口号 ps -ef|grep 13294到Cloudera Manager管理平台上停止对应服务器的DataNode服务,然后先启动RegionServer服务,在启动DataNode服务即可;
HDFS警报:The health test result for HDFS_CANARY_HEALTH has become bad: Canary test failed to create file in directory /tmp/.cloudera_health_monitoring_canary_files.解决方案:
# 确认hdfs是否处于safemode状态 hdfs dfsadmin -safemode get # 如果hdfs安全模式是关闭的,查看hdfs上目录 /tmp/.cloudera_health_monitoring_canary_files 的权限 hdfs dfs -ls /tmp/.cloudera_health_monitoring_canary_files [root@slave02 data1]# hdfs dfs -ls /tmp Found 4 items d--------- - hdfs supergroup 0 2020-09-16 16:55 /tmp/.cloudera_health_monitoring_canary_files drwx--x--x - hbase supergroup 0 2020-09-16 09:44 /tmp/hbase-staging drwx-wx-wx - hive supergroup 0 2020-09-16 11:01 /tmp/hive drwxrwxrwt - mapred hadoop 0 2020-09-16 13:46 /tmp/logs # 可以看到/tmp/.cloudera_health_monitoring_canary_files 目录的权限是有问题的,更改该目录的权限即可 hdfs dfs -chmod 777 /tmp/.cloudera_health_monitoring_canary_files 所有的服务都有预警警告:以下网络接口似乎未以全速运行:virbr0-nic。2 主机网络接口似乎以全速运行。对于 1 主机网络接口,Cloudera Manager Agent 无法确定双工模式或接口速度。 解决方案: # 查看网卡速度,发现预警警告的网卡virbr0-nic的速度是10Mb/S [root@master02 home]# ethtool virbr0-nic Settings for virbr0-nic: Supported ports: [ ] Supported link modes: Not reported Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Current message level: 0xffffffa1 (-95) drv ifup tx_err tx_queued intr tx_done rx_status pktdata hw wol 0xffff8000 Link detected: no # 查看网卡情况,发现virbr0-nic只是虚拟网卡,还有其他的网卡 [root@master02 home]# ifconfig ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.0.40.24 netmask 255.255.255.0 broadcast 10.0.40.255 inet6 fe80::8b87:6e63:4dfe:e57a prefixlen 64 scopeid 0x20<link> inet6 fe80::c340:168f:b840:3fbe prefixlen 64 scopeid 0x20<link> ether 00:0c:29:3a:64:5e txqueuelen 1000 (Ethernet) RX packets 4015665 bytes 4081376526 (3.8 GiB) RX errors 0 dropped 156042 overruns 0 frame 0 TX packets 1933526 bytes 1378935171 (1.2 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 324308 bytes 1529249358 (1.4 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 324308 bytes 1529249358 (1.4 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:a4:a7:eb txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # 查看ens192的网卡速度有1000Mb/S [root@master02 home]# ethtool ens192 Settings for ens192: Supported ports: [ TP ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: No Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Speed: 10000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: off MDI-X: Unknown Supports Wake-on: uag Wake-on: d Link detected: yes通过以上分析原来是CDH采集了virbr0-nic这张虚拟网卡的信息,发现网速不符合预期值因此预警告警了,可以在CDH配置网络接口收集设置排除正则表达式^virbr,重启 修改完后服务器的网络预警警告就消失了,终于看到了绿色的打钩
主机运行状态不良 : 该主机与 Cloudera Manager Server 失去联系的时间过长。 该主机未与 Host Monitor 建立联系。 解决方案: 可能是由于agent服务挂了,Cloudera Manager Server是通过agent来监控管理集群的,如果agent挂了无法在平台上重启,只能登陆对应的服务器运行agent重启的命令: # 查看agent状态 service cloudera-scm-agent status # 重新启动agent service cloudera-scm-agent start HBase Thrift Server 进程会因为内存溢出而自动终止:The health test result for HBASE_THRIFT_SERVER_UNEXPECTED_EXITS has become bad: This role encountered 1 unexpected exit(s) in the previous 5 minute(s).This included 1 exit(s) due to OutOfMemory errors. Critical threshold: any. 解决方案: CDH搭建的HBASE集群有一个 内存不足时停止 的配置,默认情况下当内存发生溢出时会自动停止相关的进程。Hue依赖HBase Thrift Server查询数据,如果Hue或客户端与Thrift Server进行大量的scan操作,可能会占用Thrift Server相当部分的堆内存,尤其在多客户端并发访问时更明显,就有可能导致 HBase Thrift Server 内存溢出。可以调大 HBase Thrift Server 的 Java 堆栈大小(字节) 值或者开启自动重启进程。