大数据组件安装

tech2024-07-31  66

第一次发,有点水。

最近在用flink做一些数据分析,下面是一些安装步骤

共三台服务器:

Jdk11

Kafka

Zookeeper

Flink

Es

Jdk11

Kafka

Zookeeper

Flink

Es

Redis

Jdk11

flume

kafka

zookeeper

flink(主)

ES

将所有压缩包根据上图上传到相应的服务器上;解压所有压缩包关闭防火墙

查看防火墙运行状态:firewall-cmd --state

关闭: systemctl stop firewalld

查看状态:systemctl status firewalld

开机禁用: systemctl disable firewalld

配置服务器域名(根据实际IP配置,后续组件的配置文件中涉及域名的,根据实际域名填写)

vi /etc/hosts  ##添加格式如下,有几台添加几台

配置jdk环境变量

修改/etc/hosts文件

vi /etc/profile  ##添加如下3行

export SOC_JAVA=/soc/jdk1.8.0_211/bin/java

export JAVA_HOME=/soc/jdk1.8.0_211

export PATH=$JAVA_HOME/bin:$PATH

##添加完成,保存退出之后执行如下命令

source /etc/profile

##执行完成后,可通过java verson命令验证是否安装完成

 

为所有文件赋予775权限:chmod –R 775 /soc/安装ES 给elasticsearch文件包赋予权限:

chmod 777 -R elasticsearch

修改elasticsearch相关配置:

(1)启动提示警告:max file descriptors [4096] for elasticsearch process likely too low, consider increasing to at least [65536]

        vi /etc/security/limits.conf 添加以下4行

* soft nofile 65536

* hard nofile 131072

             * soft nproc 2048

             * hard nproc 4096

(2)vi /etc/sysctl.conf

             vm.max_map_count=655360

##添加完成执行sysctl -p

(3)vi /soc/elasticsearch/config/jvm.options,找到设置内存的位置,并修改如下:

             -Xms32g

-Xmx32g

(4)新建用户elsearch:elasticsearch

        useradd -d /soc/elasticsearch -m elsearch

(5)vi/soc/elasticsearch/config/elasticsearch.yml

        a).找到cluster.name并设置为如下:

cluster.name: LAB

b).找到node.name并设置为如下:

node.name: MD-1

c).找到path.logs和path.data并设置为如下:

path.logs: /soc/elasticsearch/logs

path.data: /soc/elasticsearch/data

d).找到network.host并设置为如下(本地服务器IP):

network.host: soc-4

network.publish_host: soc-4

e).discovery.zen.ping.unicast.hosts: ["127.0.0.1","127.0.0.2","127.0.0.3"]

f).discovery.zen.minimum_master_nodes: 2

(6)启动es: (jps后进程名为Elasticsearch)

su - elsearch

cd /soc/elasticsearch/bin

./elasticsearch &

安装zookeeper: 修改/soc/zookeeper/conf/zoo.cfg

(1)找到dataDir和dataLogDir,修改为:

dataDir=/soc/zookeeper/data

dataLogDir=/soc/zookeeper/logs

(2)找到或在文末添加如下:

server.1=soc-3:2888:3888

server.2=soc-4:2888:3888

server.3=soc-5:2888:3888

启动zookeeper

nohup /soc/zookeeper/bin/zkServer.sh start >/dev/null 2>&1 &

修改/soc/zookeeper/data/ myid,若没有即创建

文件内容写:1(根据10-a-(2)设置的,1对应那台服务器那台就设置为1,其余2,3同理)

启动zookeeper:(jps后进程名为QuorumPeerMain)

nohup /soc/zookeeper/bin/zkServer.sh start >/dev/null 2>&1 &

安装kafka: 修改/soc/kafka/config/server.properties

(1)找到broker.id,修改为:

broker.id=1

(2)找到zookeeper.connect,修改ip为kafka所在的ip(多个IP逗号分隔):

zookeeper.connect= soc-3:2181,soc-3:2181

修改/soc/kafka/config/ zookeeper.properties

(1)找到dataDir,修改为:

dataDir=/soc/kafka/data/zookeeper

启动kafka:(jps后进程名为Kafka)

nohup /soc/kafka/bin/kafka-server-start.sh /soc/kafka/config/server.properties >/dev/null 2>&1 &

创建kafka主题(多个zookeeper ip逗号分隔):

/soc/kafka/bin/kafka-topics.sh --create --replication-factor 3 --partitions 6 --topic md01 --zookeeper soc-3:2181,soc-3:2181

安装flume: 复制/soc/flume/conf/ flume-conf.properties,并命名为flume-conf-log.properties

文中内容替换如下:其中sinks.k1.brokerList的IP为kafka所在IP,多个IP逗号分隔,a1.sinks.k1.topic为kafka创建的syslog主题名称。

a1.sources=r1

a1.sinks=k1

a1.channels=c1

 

#Describe/configure the source

a1.sources.r1.type=syslogudp

a1.sources.r1.channels=c1

a1.sources.r1.host=soc-3

a1.sources.r1.port=514

 

a1.sinks.k1.type=org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k1.topic=md01

a1.sinks.k1.brokerList=127.0.0.1:9092, 127.0.0.2:9092, 127.0.0.3:9092

a1.sinks.k1.batchSize=2

a1.sinks.k1.requiredAcks=1

a1.sinks.k1.channel=c1

 

a1.channels.c1.type=memory

a1.channels.c1.capacity= 1000000

a1.channels.c1.transactionCapacity= 100000

复制/soc/flume/conf/ flume-conf.properties,并命名为flume-conf-flow.properties

文中内容替换如下:其中sinks.k1.brokerList的IP为kafka所在IP,多个IP逗号分隔,a1.sinks.k1.topic为kafka创建的flow主题名称。

a2.sources=r2 r3 r4 r5

a2.sinks=k2

a2.channels=c2

 

a2.sources.r2.type=org.apache.flume.source.FlowSource

a2.sources.r2.channels=c2

a2.sources.r2.host=0.0.0.0

a2.sources.r2.port=9996

a2.sources.r2.ip=127.0.0.1

a2.sources.r2.rate=192.168.140.1-1-1,10.2.1.169-1-1

a2.sources.r2.bufferSize=102400

 

 

a2.sources.r3.type=org.apache.flume.source.FlowSource

a2.sources.r3.channels=c2

a2.sources.r3.host=0.0.0.0

a2.sources.r3.port=9995

a2.sources.r3.ip=127.0.0.1

a2.sources.r3.rate=192.168.140.1-1-1,10.2.1.169-1-1

a2.sources.r3.bufferSize=102400

 

a2.sources.r4.type=org.apache.flume.source.FlowSource

a2.sources.r4.channels=c2

a2.sources.r4.host=0.0.0.0

a2.sources.r4.port=9991

a2.sources.r4.ip=127.0.0.1

a2.sources.r4.rate=192.168.140.1-1-1,10.2.1.169-1-1

a2.sources.r4.bufferSize=102400

 

a2.sources.r5.type=org.apache.flume.source.FlowSource

a2.sources.r5.channels=c2

a2.sources.r5.host=0.0.0.0

a2.sources.r5.port=6343

a2.sources.r5.ip=127.0.0.1

a2.sources.r5.rate=192.168.140.1-1-1,10.2.1.169-1-1

a2.sources.r5.bufferSize=102400

 

a2.sinks.k2.type=org.apache.flume.sink.kafka.KafkaSink

a2.sinks.k2.topic=flow01

a2.sinks.k2.brokerList=soc-3:9092,soc-4:9092,soc-2:9092

a2.sinks.k2.batchSize=10

a2.sinks.k2.requiredAcks=1

a2.sinks.k2.channel=c2

 

a2.channels.c2.type=memory

a2.channels.c2.capacity=10000000

a2.channels.c2.transactionCapacity=10000

修改/soc/flume/conf/log4j.properties

找到flume.root.logger,flume.log.dir,flume.log.file修改如下:

flume.root.logger=INFO,LOGFILE

flume.log.dir=/soc/flume/logs

flume.log.file=flume.log

修改/soc/flume/bin/flume-ng

找到JAVA_OPTS,并修改如下:

  JAVA_OPTS="-Xmx2048m"

启动flume:(jps后进程名为Application)

nohup /soc/flume/bin/flume-ng agent -n a1 -c /soc/flume/conf -f /soc/flume/conf/flume-conf-log.properties >/dev/null 2>&1 &

nohup /soc/flume/bin/flume-ng agent -n a2 -c /soc/flume/conf -f /soc/flume/conf/flume-conf-flow.properties >/dev/null 2>&1 &

安装flink 修改/soc/flink/conf/master,添加flink主服务器的域名端口:

SOC-2:8081

修改/soc/flink/conf/slaves,添加flink从服务器的域名:

SOC-2

SOC-3

SOC-4

修改/soc/flink/conf/flink-conf.yaml,添加flink从服务器的域名:

jobmanager.rpc.address: SOC-2

 

taskmanager.numberOfTaskSlots: 4     //修改 32-4

parallelism.default: 4               //修改 16-4

停起flink(在master服务器上执行)

   cd /soc/flink/bin

   ./start-cluster.sh

   ./stop-cluster.sh

         访问webui  http://10.176.62.42:8081/

 

最新回复(0)