Hadoop集群搭建
一、前期准备
机器名IP
hadoop11192.168.1.11hadoop12192.168.1.12hadoop13192.168.1.13
1.使用VMvare新建hadoop11
1)新建虚拟机,选择光盘映像文件位置
2)选择系统为linux,Cent OS7
3)设置虚拟机名称为hadoop11,选择存储位置,注意:路径最后一层起名hadoop11,方便系统管理
4)设置磁盘大小为40G
5)编辑虚拟机设置,配置内存和处理器,这里根据实际需要配置即可,点击开启此虚拟机,进行安装
Ctrl+Alt:呼出鼠标
6)点击安装分区,选择完成
7)设置root密码为hadoop
8)等待安装完成后,点击重启即可
2.配置hadoop11网络
1)用root用户登录hadoop11
2)编辑配置文件
vi /etc/sysconfig/network-scripts/ifcfg-ens33
IPADDR
=192.168.1.11
GATEWAY
=192.168.1.2
NETMASK
=255.255.255.0
DNS1
=114.114.114.114
BOOTPROTO
=static
ONBOOT
=yes
3)重启网卡
systemctl restart network
ip addr
4)设置hadoop11为NAT模式,hadoop11与VMware的NAT配置必须在同一网段下,192.168.1.XXX
3.配置VMware网络
本次搭建使用NAT模式网络:同一台机器互相可以访问,局域网下无法直接访问
1)编辑–>虚拟网络编辑器,选择Vmnet8,NAT模式
设置子网、子网掩码
2)DHCP设置
设置可用ip范围
3)NAT设置
设置网关
4.配置PC机虚拟网卡VMnet8 IPv4
注意网关和DNS服务器的配置
5.验证网络
1)检查hadoop11是否可以联网
ping www.baidu.com
2)hadoop11和PC机是否可相互访问
①hadoop11–>PC
ping 192.168.XXX.XXX
②PC–>hadoop11
ping 192.168.1.11
6.使用SSH工具访问hadoop11
7.安装常用的插件
yum
install -y epel-release
yum
install -y psmisc nc net-tools
rsync vim lrzsz ntp libzstd openssl-static tree iotop
git
8.修改机器名与映射
sudo hostnamectl --static set-hostname hadoop11
sudo vim /etc/hosts
192.168.1.11 hadoop11
192.168.1.12 hadoop12
192.168.1.13 hadoop13
编辑C:\Windows\System32\drivers\etc下的hosts文件
192.168.1.11 hadoop11
192.168.1.12 hadoop12
192.168.1.13 hadoop13
9.关闭防火墙&新建用户&赋root权限
sudo systemctl status firewalld
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo useradd hduser
sudo passwd hadoop
visudo
root ALL
=(ALL
) ALL
hduser ALL
=(ALL
) NOPASSWD:ALL
su - hduser
10.新建常用目录
su - hduser
sudo mkdir /opt/software
sudo mkdir /opt/package
sudo chown -R hduser:hduser /opt/*
11.安装JDK
rpm -qa
| grep -i java
| xargs -n1
sudo rpm -e --nodep
通过MobaXterm将JDK安装包1.8导入/opt/package目录下
tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/software/
sudo vim /etc/profile.d/my_env.sh
export JAVA_HOME
=/opt/software/jdk1.8.0_212
export PATH
=$PATH:$JAVA_HOME/bin
source /etc/profile.d/my_env.sh
java -version
12.安装Hadoop
通过MobaXterm将JDK安装包1.8导入/opt/package目录下
tar -zxvf hadoop-3.1.3.tar.gz -C /opt/software/
sudo vim /etc/profile.d/my_env.sh
export HADOOP_HOME
=/opt/software/hadoop-3.1.3
export PATH
=$PATH:$HADOOP_HOME/bin
export PATH
=$PATH:$HADOOP_HOME/sbin
source /etc/profile.d/my_env.sh
hadoop version
二、基于VMware虚拟机NAT网络的分布式搭建
hadoop11hadoop12hadoop13
HDFSNameNode DataNodeDataNodeSecondaryNameNode DataNodeYARNNodeManagerResourceManager NodeManagerNodeManager
1.克隆hadoop11
1)VMware左侧菜单栏右击–>管理–>克隆
创建完整克隆
为克隆的系统取名hadoop12
2)同理克隆hadoop11得到hadoop13
2.配置克隆机hadoop12、hadoop13
3.配置SSH免密登录
ssh-keygen -t rsa
ssh-copy-id hadoop11
ssh-copy-id hadoop12
ssh-copy-id hadoop13
4.配置Haoop文件
cd $HADOOP_HOME/etc/hadoop
vim core-site.xml
<configuration
>
<!--指定HDFS中NameNode的地址 --
>
<property
>
<name
>fs.defaultFS
</name
>
<value
>hdfs://hadoop11:9820
</value
>
</property
>
<!-- 指定Hadoop运行时产生文件的存储目录 --
>
<property
>
<name
>hadoop.tmp.dir
</name
>
<value
>/opt/software/hadoop-3.1.3/data
</value
>
</property
>
<!-- 通过web界面操作hdfs的权限 --
>
<property
>
<name
>hadoop.http.staticuser.user
</name
>
<value
>hduser
</value
>
</property
>
<!-- 后面hive的兼容性配置 --
>
<property
>
<name
>hadoop.proxyuser.hduser.hosts
</name
>
<value
>*
</value
>
</property
>
<property
>
<name
>hadoop.proxyuser.hduser.groups
</name
>
<value
>*
</value
>
</property
>
</configuration
>
vim hdfs-site.xml
<configuration
>
<property
>
<name
>dfs.namenode.secondary.http-address
</name
>
<value
>hadoop13:9868
</value
>
</property
>
</configuration
>
vim yarn-site.xml
<configuration
>
<!-- Reducer获取数据的方式--
>
<property
>
<name
>yarn.nodemanager.aux-services
</name
>
<value
>mapreduce_shuffle
</value
>
</property
>
<!-- 指定YARN的ResourceManager的地址--
>
<property
>
<name
>yarn.resourcemanager.hostname
</name
>
<value
>hadoop12
</value
>
</property
>
<!-- 环境变量通过从NodeManagers的容器继承的环境属性,对于mapreduce应用程序,除了默认值 hadoop op_mapred_home应该被添加外。属性值 还有如下--
>
<property
>
<name
>yarn.nodemanager.env-whitelist
</name
> <value
>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
</value
>
</property
>
<!-- 解决Yarn在执行程序遇到超出虚拟内存限制,Container被kill --
>
<property
>
<name
>yarn.nodemanager.pmem-check-enabled
</name
>
<value
>false
</value
>
</property
>
<property
>
<name
>yarn.nodemanager.vmem-check-enabled
</name
>
<value
>false
</value
>
</property
>
<!-- 后面hive的兼容性配置 --
>
<property
>
<name
>yarn.scheduler.minimum-allocation-mb
</name
>
<value
>512
</value
>
</property
>
<property
>
<name
>yarn.scheduler.maximum-allocation-mb
</name
>
<value
>4096
</value
>
</property
>
<property
>
<name
>yarn.nodemanager.resource.memory-mb
</name
>
<value
>4096
</value
>
</property
>
</configuration
>
vim mapred-site.xml
<!-- 指定MR运行在Yarn上 --
>
<configuration
>
<property
>
<name
>mapreduce.framework.name
</name
>
<value
>yarn
</value
>
</property
>
</configuration
>
vim workers
hadoop11
hadoop12
hadoop13
5.新建脚本文件目录
sudo vim /etc/profile.d/my_env.sh
export PATH
=$PATH:/opt/sh
source /etc/profile.d/my_env.sh
6.同步配置文件到hadoop12、hadoop13
rsync -av /opt/software/hadoop-3.1.3/etc/hadoop/* hduser@hadoop12:/opt/software/hadoop-3.1.3/etc/hadoop
rsync -av /opt/software/hadoop-3.1.3/etc/hadoop/* hduser@hadoop13:/opt/software/hadoop-3.1.3/etc/hadoop
7.格式化NameNode
hdfs namenode -format
hdfs namenode -format
三、启动集群
1.在hadoop11上一键启动hdfs
start-dfs.sh
2.在hadoop12上一键启动yarn
start-yarn.sh
3.检查是否启动成功
jps
4.可选择分别启动
hdfs --daemon start/stop namenode/datanode/secondarynamenode
yarn --daemon start/stop resourcemanager/nodemanager
5.配置历史服务器
vim mapred-site.xml
<!-- 历史服务器端地址 --
>
<property
>
<name
>mapreduce.jobhistory.address
</name
>
<value
>hadoop102:10020
</value
>
</property
>
<!-- 历史服务器web端地址 --
>
<property
>
<name
>mapreduce.jobhistory.webapp.address
</name
>
<value
>hadoop102:19888
</value
>
</property
>
mapred --daemon start historyserver
6.配置日志服务器
vim yarn-site.xml
<!-- 开启日志聚集 --
>
<property
>
<name
>yarn.log-aggregation-enable
</name
>
<value
>true
</value
>
</property
>
<!-- 访问路径--
>
<property
>
<name
>yarn.log.server.url
</name
>
<value
>http://hadoop102:19888/jobhistory/logs
</value
>
</property
>
<!-- 保存的时间7天 --
>
<property
>
<name
>yarn.log-aggregation.retain-seconds
</name
>
<value
>604800
</value
>
</property
>
stop-yarn.sh
mapred --daemon stop historyserver
start-yarn.sh
yarn --daemon start timelineserver
mapred --daemon start historyserver
7.Web端查看
http://hadoop13:9868
cd $HADOOP_HOME/share/hadoop/hdfs/webapps/static
vim dfs-dust.js
Esc ,: ,set nu ,回车
//
'date_tostring' : function (v
) {
//
return moment
(Number
(v
)).format
('ddd MMM DD HH:mm:ss ZZ YYYY');
//
},
'date_tostring' : function (v
) {
return new Date
(Number
(v
)).toLocaleString
();
},
rsync -av
$HADOOP_HOME/share/hadoop/hdfs/webapps/static/* hduser@hadoop12:
$HADOOP_HOME/share/hadoop/hdfs/webapps/static
rsync -av
$HADOOP_HOME/share/hadoop/hdfs/webapps/static/* hduser@hadoop13:
$HADOOP_HOME/share/hadoop/hdfs/webapps/static
http://hadoop11:9870
http://hadoop12:8088
http://hadoop11:19888/jobhistory
8.时间与hadoop11同步
1)hadoop11上配置
su - root
sudo systemctl stop ntpd
sudo systemctl disable ntpd
vim /etc/ntp.conf
去除以上内容注释,更改如下:
restrict 192.168.1.0mask 255.255.255.0 nomodify notrap
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
为以上内容添加注释,更改如下:
添加如下内容
server 127.127.1.0
fudge 127.127.1.0 stratum 10
vim /etc/sysconfig/ntpd
增加如下内容:
SYNC_HWCLOCK
=yes
systemctl start ntpd
systemctl
enable ntpd
2)hadoop12、hadoop13上分别配置
crontab -e
增加如下内容:
*/10 * * * * /usr/sbin/ntpdate hadoop11
date -s
"2000-00-00 00:00:00"
date