hadoop安装和基本配置

tech2023-10-01  90

Hadoop安装 1.安装好jdk 2.解压Hadoop压缩包:tar -zxvf hadoop.tar.gz 3.配置环境变量

export JAVA_HOME=/opt/jdk1.8.0_221 export JRE_HOME=/opt/jdk1.8.0_221/jre export CLASSPATH=.:$JAVA_HOME/lib/rt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar export HADOOP_HOME=/opt/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_INSTALL=$HADOOP_HOME export PATH=:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

4.配置hadoop ${HADOOP_HOME}/etc/hadoop/*-site.xml a)Standalone | local 单机模式 i.没有守护进程、所有程序运行在同一JVM中,利于test和debug b)Pseduo distributed Mode 伪分布模式 i.core-site.xm

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.56.137:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> <property> <name>hadoop.native.lib</name> <value>false</value> <description>Should native hadoop libraries, if present, be used.</description> </property> </configuration>

ii.hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>lijia1:50090</value> </property> </configuration>

iii.Mapred-site.xml (CP)

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>HostName:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>HostName:19888</value> </property> </configuration>

iv.Yarn-site.xml

<configuration> <!-- reducer获取数据方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <!-- 指定YARN的ResourceManager的地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>HostName</value> </property> <!-- 日志聚集功能使用 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 日志保留时间设置7天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>

vi ./slaves 里面写自己的主机名:hostname hadoop-env.sh 写明JAVA_HOME地址

v.配置SSH(安全套接字处理) 1.目的是使用脚步启动远程服务器的启动,必须使用shell登陆远程服务,但每个登陆都需要输入密码就非常麻烦,所有需要配置无密配置,需要在NameNode上生成私钥,把公钥发给DataNode 2.步骤

a)生成秘钥对 i.ssh-keygen b)把公钥拷贝到秘钥库中 i.伪分布式复制到自己 1.cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

ii.完全分布式复制到DataNode

1.scp root@主机名:~/.ssh/id_rsa.pub ~/.ssh/id_rsa.pub 2.cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

vi.格式化NameNode

hdfs namenode –format

vii.启动hadoop 1.命令(sbin目录下)

./start-all.sh

2.开启历史服务

mr-jobhistory-daemon.sh start historyserver

3.jps命令,查看进程是否打开,正常打开后的结果应该如下

61361 Jps 10388 NameNode 10837 ResourceManager 10526 DataNode 10943 NodeManager 10198 SecondaryNameNode 62588 JobHistoryServer

如果有缺失,查看前面配置文件是否有错误,若配置无错误且启动正常,可以 stop-all.sh 命令所有命令 删除tmp文件夹,最好也删除logs文件夹 rm -rf 重新格式化:hadoop namenode -format 再次启动hadoop:stall-all.sh 还是缺失的话就去logs日志中查看具体报错信息,然后进行解决。

最新回复(0)