以防万一,为了避免kettle从Kafka同步数据到HDFS有问题,因此也测试了用Flume去采集Kafka中的数据然后同步到HDFS,算是一套备用方案
hadoop-3.1.3.tar.gz
apache-flume-1.9.0-bin.tar.gz
链接:https://pan.baidu.com/s/1iXX87v4coH_49Ik84isuRw?pwd=8zuc
提取码:8zuc
[root@hurys22 ~]# cd /opt/install/
[root@hurys22 install]# ls
apache-flume-1.9.0-bin.tar.gz hadoop-3.1.3.tar.gz kafka_2.13-3.0.0.tgz
[root@hurys22 install]# tar -zxf apache-flume-1.9.0-bin.tar.gz -C /opt/soft
[root@hurys22 install]# cd /opt/soft/
[root@hurys22 soft]# ls
apache-flume-1.9.0-bin hadoop313 hbase205 hive312 kafka213
[root@hurys22 soft]# mv apache-flume-1.9.0-bin flume190
[root@hurys22 soft]# ls
flume190 hadoop313 hbase205 hive312 kafka213
[root@hurys22 soft]# cd ./flume190/lib/
[root@hurys22 lib]# rm -rf guava-11.0.2.jar
[root@hurys22 lib]# cd /opt/soft/flume190/conf/
[root@hurys22 conf]# ls
flume-conf.properties.template flume-env.ps1.template flume-env.sh.template log4j.properties
[root@hurys22 conf]# cp flume-env.sh.template flume-env.sh
[root@hurys22 conf]# ls
flume-conf.properties.template flume-env.ps1.template flume-env.sh flume-env.sh.template log4j.properties
查看jdk安装路径
[root@hurys22 ~]# echo $JAVA_HOME
/usr/local/java
[root@hurys22 conf]# vi flume-env.sh
20 # Enviroment variables can be set here.
21
22 export JAVA_HOME=/usr/local/java
24 # Give Flume more memory and pre-allocate, enable remote monitoring via JMX
25 export JAVA_OPTS="-Xms8000m -Xmx8000m -Dcom.sun.management.jmxremote"
Xms8000m意思是:虚拟机内存设置8G, Xmx8000m意思是:一开始启动就动用最大内存8G
[root@hurys22 conf]# yum install -y nc
//测试一下,两个窗口的聊天
先在第一个窗口,登录监视窗口
[root@hurys22 conf]# nc -lk 44444
[root@hurys22 ~]# cd /opt/soft/flume190/conf/
[root@hurys22 conf]# yum install telnet-server
[root@hurys22 conf]# yum install telnet.*
Complete!
//测试端口 开启聊天窗口
[root@hurys22 conf]# telnet localhost 44444
在第二个窗口,输入 hello java
第一个窗口就会收到 hello java
在第一个窗口Ctrl+C
注意,两个窗口同时退出登录
总的来说,相比于Hadoop、Hive等工具的安装,Flume的安装还是挺简单的