YARN HA 环境搭建

tech2025-07-22  4

1. 主机与服务规划

1.1 主机规划

主机IPHostNameCPUMEMERYUSERPWDhadoop181192.168.207.181hadoop1814 CORE8Ghadoophadoophadoop182192.168.207.182hadoop1824 CORE8Ghadoophadoophadoop183192.168.207.183hadoop1834 CORE8Ghadoophadoop

1.2 服务规划

服务hadoop181hadoop182hadoop183DataNode√√√Journal Node√√√Zookeeper√√√ZKFS√√√ResourceManager√√√NodeManager√√√Name Node√√√HistoryServer√

2. 具体安装

安装yarn 高可用之前需要先部署HDFS的环境, 可以参考 hdfs ha 搭建实录

2.1 修改 yarn-site.xml 配置文件

(1)vim 编辑 yarn-site.xml 文件

[hadoop@hadoop181 ~]$ vim $HADOOP_HOME/etc/hadoop/yarn-site.xml

(2)配置启用 resource manager ha

<!--启用resourcemanager ha--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--声明HA resourcemanager的地址--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarncluster</value> </property> <!-- 指定RM的逻辑列表 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2,rm3</value> </property>

(3)配置rm1

<!-- 指定rm1 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop181</value> </property> <!-- 指定rm1的web端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop181:8088</value> </property> <!-- 指定rm1的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm1</name> <value>hadoop181:8032</value> </property> <!-- 指定AM向rm1申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>hadoop181:8030</value> </property> <!-- 指定供NM连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>hadoop181:8031</value> </property>

(4)配置rm2

<!-- 指定rm2 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop182</value> </property> <!-- 指定rm2的web端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop182:8088</value> </property> <!-- 指定rm2的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm2</name> <value>hadoop182:8032</value> </property> <!-- 指定AM向rm2申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>hadoop182:8030</value> </property> <!-- 指定供NM连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>hadoop182:8031</value> </property>

(5)配置rm3

<!-- 指定rm3 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm3</name> <value>hadoop183</value> </property> <!-- 指定rm3的web端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm3</name> <value>hadoop183:8088</value> </property> <!-- 指定rm3的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm3</name> <value>hadoop183:8032</value> </property> <!-- 指定AM向rm3申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm3</name> <value>hadoop183:8030</value> </property> <!-- 指定供NM连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm3</name> <value>hadoop183:8031</value> </property>

(6)指定zookeeper集群的地址

<!--指定zookeeper集群的地址--> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop181:2181,hadoop182:2181,hadoop183:2181</value> </property>

(7)配置 ZKRM

<!--启用自动恢复 true才有下面 ZKRMStateStore类存--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--指定resourcemanager的状态信息存储在zookeeper集群--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property>

(8)配置环境变量的继承

<!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property>

2.2 注意事项

(1)上面所有配置都是在yarn-site.xml 配置文件 (2)文件中之前配置的ResourceManager地址一定要删除

<!-- 指定YARN的ResourceManager的地址 , 这一段都要删掉--> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop182</value> </property>

2.3 文件分发

[hadoop@hadoop181 ~]$ xsync $HADOOP_HOME/etc/hadoop/yarn-site.xml

3. 服务的启动

3.1 需要 启动hdfs 先

[hadoop@hadoop181 ~]$ xssh jps -l [DEBUG] 1 command is :jps -l [DEBUG] ssh to hadoop181 to execute commands [ jps -l] 7057 org.apache.hadoop.hdfs.server.namenode.NameNode 11969 sun.tools.jps.Jps 7186 org.apache.hadoop.hdfs.server.datanode.DataNode 10409 org.apache.zookeeper.server.quorum.QuorumPeerMain 7437 org.apache.hadoop.hdfs.qjournal.server.JournalNode [DEBUG] ssh to hadoop182 to execute commands [ jps -l] 7044 org.apache.zookeeper.server.quorum.QuorumPeerMain 8581 sun.tools.jps.Jps 5112 org.apache.hadoop.hdfs.server.datanode.DataNode 5225 org.apache.hadoop.hdfs.qjournal.server.JournalNode 5020 org.apache.hadoop.hdfs.server.namenode.NameNode [DEBUG] ssh to hadoop183 to execute commands [ jps -l] 5168 org.apache.hadoop.hdfs.qjournal.server.JournalNode 4963 org.apache.hadoop.hdfs.server.namenode.NameNode 8515 sun.tools.jps.Jps 6987 org.apache.zookeeper.server.quorum.QuorumPeerMain 5055 org.apache.hadoop.hdfs.server.datanode.DataNode

3.2 启动yarn

# 启动,在哪一台都可以 [hadoop@hadoop181 ~]$ start-yarn.sh

3.3 启动状态监控

(1)查看jps

# 启动状态查看 [hadoop@hadoop181 ~]$ xssh jps -l [DEBUG] 1 command is :jps -l [DEBUG] 1 command is :jps -l [DEBUG] ssh to hadoop181 to execute commands [ jps -l] 7057 org.apache.hadoop.hdfs.server.namenode.NameNode 7186 org.apache.hadoop.hdfs.server.datanode.DataNode 10757 org.apache.spark.deploy.worker.Worker 14870 sun.tools.jps.Jps 14472 org.apache.hadoop.yarn.server.nodemanager.NodeManager 10409 org.apache.zookeeper.server.quorum.QuorumPeerMain 14347 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager 7437 org.apache.hadoop.hdfs.qjournal.server.JournalNode 9246 org.apache.spark.deploy.history.HistoryServer [DEBUG] ssh to hadoop182 to execute commands [ jps -l] 9952 org.apache.hadoop.yarn.server.nodemanager.NodeManager 10291 sun.tools.jps.Jps 7044 org.apache.zookeeper.server.quorum.QuorumPeerMain 7252 org.apache.spark.deploy.worker.Worker 7364 org.apache.spark.deploy.master.Master 5112 org.apache.hadoop.hdfs.server.datanode.DataNode 5225 org.apache.hadoop.hdfs.qjournal.server.JournalNode 9867 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager 5020 org.apache.hadoop.hdfs.server.namenode.NameNode [DEBUG] ssh to hadoop183 to execute commands [ jps -l] 5168 org.apache.hadoop.hdfs.qjournal.server.JournalNode 10224 sun.tools.jps.Jps 4963 org.apache.hadoop.hdfs.server.namenode.NameNode 7188 org.apache.spark.deploy.worker.Worker 7301 org.apache.spark.deploy.master.Master 9801 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager 6987 org.apache.zookeeper.server.quorum.QuorumPeerMain 9886 org.apache.hadoop.yarn.server.nodemanager.NodeManager 5055 org.apache.hadoop.hdfs.server.datanode.DataNode [hadoop@hadoop181 hadoop]$

(2)WEB浏览器查看

http://hadoop181:8088/cluster http://hadoop182:8088/cluster http://hadoop183:8088/cluster

(3)激活状态查看

[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm1 standby [hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm2 active [hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm3 standby [hadoop@hadoop181 hadoop]$

(4)kill 激活的resourcemanager 看看

[hadoop@hadoop181 hadoop]$ ssh hadoop@hadoop182 "kill -9 10564"

(5)kill rn2 之后再看看

[hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm1 active [hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm3 standby [hadoop@hadoop181 hadoop]$ yarn rmadmin -getServiceState rm2 2020-09-04 11:55:25,237 INFO ipc.Client: Retrying connect to server: hadoop182/192.168.207.182:8033. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) Operation failed: Call From hadoop181/192.168.207.181 to hadoop182:8033 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused [hadoop@hadoop181 hadoop]$

好了, 到此 我们的yarn ha 环境搭建完毕 ~~~~~

最新回复(0)