下载的地址:
http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/hue-3.9.0-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/zookeeper-3.4.5-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/spark-1.6.0-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.14.2.tar.gz http://archive.cloudera.com/cdh5/cdh/5/flume-ng-1.6.0-cdh5.14.2.tar.gz
配置:
一: 初始化环境部署处理
============================
注:安装过程都使用root用户
CDH 安装在/opt。
============================
1.1 环境准备:
Centos7.3x64 所有主机名配置 配置好 所有机器的无密钥登陆
1.2 主机名配置
主机名 IP
master 192.168.9.80
slave1 192.168.9.20
slave2 192.168.9.220
配置/etc/hostname
分别是master和slave1和slave2
3台都配置/etc/hosts
加入:
192.168.9.80 master
192.168.9.20 slave1
192.168.9.220 slave2
1.3 关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
firewall-cmd
关闭iptables:
iptables -F
systemctl stop iptables.service
service iptables save
systemctl disable iptables.service
禁用 selinux :
vi /etc/selinux/config
将selinux=enforcing改成selinux=disabled
1.4 配置SSH免密登录
1.每台服务器下都输入命令 ssh-keygen -t rsa,生成 key,一律不输入密码,直接回车,/root 就会生成 .ssh 文件夹。
注意(.ssh文件默认隐藏,使用ls -al 即可查看)
2.在 192.168.9.80 服务器下,合并公钥到 authorized_keys 文件,进入 /root/.ssh 目录,通过 SSH 命令合并.
cat id_rsa.pub>> authorized_keys
ssh [email protected] cat ~/.ssh/id_rsa.pub>> authorized_keys
ssh [email protected] cat ~/.ssh/id_rsa.pub>> authorized_keys
3.把 192.168.9.80 服务器的 authorized_keys、known_hosts 复制到 192.168.9.20、192.168.9.220 服务器的 /root/.ssh 目录
scp authorized_keys [email protected]:/root/.ssh/
scp authorized_keys [email protected]:/root/.ssh/
scp known_hosts [email protected]:/root/.ssh/
scp known_hosts [email protected]:/root/.ssh/
到192.168.9.20下
scp ~/.ssh/authorized_keys slave2:~/.ssh/
到192.168.85.220下
scp ~/.ssh/authorized_keys master:~/.ssh/
scp ~/.ssh/authorized_keys slave1:~/.ssh/
1.5 所有机器配置JDK1.8
rpm -qa | grep java 如果有的话请先卸载
卸载命令:rpm –e --nodeps(忽略依赖) 安装包名
下载jdk-8u171-linux-x64.tar.gz
上传到目录/software
解压:tar xzvf jdk-8u171-linux-x64.tar.gz -C /opt/
配置环境变量
vim /etc/profile
export JAVA_HOME=/opt/jdk1.8.0_171
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
使环境变量生效
source /etc/profile
1.5 所有机器配置外部YUM源
备份/etc/yum.repos.d/CentOS-Base.repo
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup
将yum配置文件放入到yum.repos.d文件夹下
cp CentOS7-Base-163.repo /etc/yum.repos.d/
进入yum源配置文件所在文件夹
cd /etc/yum.repos.d
运行yum makecache生成缓存
yum makecache
更新系统(时间比较久,主要看个人网速)
yum -y update
安装ntp服务
yum install ntp
1.7 配置ntp服务器(master)
https://blog.csdn.net/worldchinalee/article/details/82109932
1.安装ntp
yum -y install ntp
2.在master查看是否启动ntp服务:
service ntpd status
3.同步时间
ntpdate pool.ntp.org
4.修改本地机器作为ntp服务器
cd /etc/
mv ntp.conf ntp.conf.bak
vim ntp.conf
# 修改后的内容如下:
# 系统时间与BIOS时间的偏差记录
driftfile /var/lib/ntp/drift
# restrict控制权限
# 对于默认的client拒绝所有的操作
restrict default kod nomodify notrap nopeer noquery
# 针对ipv6的默认client操作
restrict -6 default kod nomodify notrap nopeer noquery
# 允许本机访问
restrict 127.0.0.1
restrict -6 ::1
# 允许网段10.75.299访问
restrict 10.75.229.0 mask 255.255.255.0 nomodify notrap
# 本机作为ntp源
server 127.127.1.0
fudge 127.127.1.0 stratum 10
4.同步硬件时钟
vim /etc/sysconfig/ntpd
SYNC_HWCLOCK=yes
5.确认服务器状态
# pgrep ntpd
# netstat -tlunp|grep ntp
# ntpstat
# ntpq -p
6.配置客户端
先在客户端安装ntp服务:
yum install ntp ntpdate -y
ntpdate 192.168.9.80
每天晚上1点同步时间
crontab -e 00 01 * * * root /usr/sbin/ntpdate 192.168.9.80; /sbin/hwclock -w
1.8安装mysql(5.6.41)
1、先检查系统是否装有mysql
rpm -qa | grep mysql如果有的话请先卸载(yum remove mysql)
rpm -e --nodeps
这里返回空值,说明没有安装
2、下载mysql的repo源
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
补充:如果这里wget命令不能用则用yum安装wget
yum install wget
3.安装mysql-community-release-el7-5.noarch.rpm包
sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm
4.安装mysql
sudo yum install mysql-server
根据步骤安装就可以了,不过安装完成后,没有密码,需要重置密码。
安装后再次查看mysql
5.安装完毕后启用mysql服务
# systemctl start mysqld
# systemctl status mysqld
接下来登录重置密码:
mysql -u root -p
mysql > use mysql;
mysql > update user set password=password('123456') where user='root';
mysql > exit;
重启mysql服务后才生效 # systemctl restart mysqld
必要时加入以下命令行,为root添加远程连接的能力。链接密码为 “root”(不包括双引号)
mysql> grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;
查询数据库编码格式,确保是 UTF-8
show variables like "%char%";
set names utf8;
6.配置mysql
6.1 mysql安装在主节点上,使用mysql命令登录
6.2 新建一个scm用户,并赋予所有权限,密码是scm
mysql> grant all on *.* to 'scm'@'localhost' identified by 'temp' with grant option;
Query OK, 0 rows affected, 1 warning (0.00 sec)
7.查看安装的mysql版本:mysql -V
8.重启mysql命令:systemctl restart mysqld
9. 开机启动
systemctl enable mysqld.service
==========================================================================================================
二: CDH5.14.2 安装与配置
1、配置HADOOP
新建用户hadoop,从root用户获取/opt文件夹的权限,所有节点都要执行
useradd -m hadoop -s /bin/bash
passwd hadoop
chown -R hadoop /opt/
chown -R hadoop /software
解压Hadoop
tar -zxvf /software/hadoop-2.6.0-cdh5.14.2.tar.gz -C /opt/
配置环境
vim /etc/profile
添加:
export HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.14.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
环境变量生效:
source /etc/profile
echo $HADOOP_HOME
2.修改配置文件
2.1 core-site.xml
2.2 hdfs-site.xml
2.3 mapred-site.xml(正常情况下没有这个文件,可由 mapred-queues.xml.template 复制而来)
2.4 yarn-site.xml
2.5 slaves
slave2
2.6 hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_171
3.复制配置文件到其他节点
scp -r core-site.xml slave1:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r core-site.xml slave2:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r hdfs-site.xml slave1:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r hdfs-site.xml slave2:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r mapred-site.xml slave1:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r mapred-site.xml slave2:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r yarn-site.xml slave1:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r yarn-site.xml slave2:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r slaves slave1:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r slaves slave2:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r hadoop-env.sh slave1:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
scp -r hadoop-env.sh slave2:/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop/
4.安装启动zookeeper
4.1 解压到指定的目录:/opt
tar -zxvf /software/zookeeper-3.4.5-cdh5.14.2.tar.gz -C /opt
4.2 进入zookkeeper目录,创建存放数据的目录
[hadoop@hadoop ~]$ sudo mkdir /opt/zookeeper-3.4.5-cdh5.14.2/zkData
4.3 在conf目录里
[hadoop@hadoop ~]$ cp -a /opt/zookeeper-3.4.5-cdh5.14.2/conf/zoo_sample.cfg /opt/zookeeper-3.4.5-cdh5.14.2/conf/zoo.cfg
然后修改:dataDir=/opt/zookeeper-3.4.5-cdh5.14.2/zkData
创建日志存放目录:
[hadoop@hadoop ~]$ sudo mkdir /opt/zookeeper-3.4.5-cdh5.14.2/logs
[hadoop@hadoop ~]$ vi /opt/zookeeper-3.4.5-cdh5.14.2/libexec/zkEnv.sh
找到如下位置修改语句:ZOO_LOG_DIR="$ZOOKEEPER_HOME/logs"
4.4 配置环境变量
[hadoop@hadoop ~]$ vim /etc/profile
添加如下两行:
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.5-cdh5.14.2
export PATH=$PATH:$HADOOP_HOME:$ZOOKEEPER_HOME/bin
环境变量生效:
[hadoop@hadoop ~]$ source /etc/profile
4.5 复制zookeeper到其他节点
scp -r /opt/zookeeper-3.4.5-cdh5.14.2/ slave1:/opt/
scp -r /opt/zookeeper-3.4.5-cdh5.14.2/ slave2:/opt/
4.6 启动zookeeper
./zkServer.sh start
查看状态:
./zkServer.sh status
5.启动hadoop
5.1 启动所有节点上面的journalnode进程
sbin/hadoop-daemon.sh start journalnode
5.2 执行格式化并启动Namenode
bin/hdfs namenode -format //namenode 格式化
bin/hdfs zkfc -formatZK //格式化高可用
bin/hdfs namenode //启动namenode
注意:执行完上述命令后,程序就会在等待状态,只有执行完下一步时,利用按下ctrl+c来结束namenode进程。
5.3 此同时,需要在备节点(比如,slave1)上执行数据同步
bin/hdfs namenode -bootstrapStandby
5.4 然后关闭所有节点上面的journalnode进程
sbin/hadoop-daemon.sh stop journalnode
5.5 一键启动hdfs所有相关进程,只需在master节点执行:
sbin/start-dfs.sh
[root@master sbin]# jps
12928 Jps
12849 DFSZKFailoverController
12452 NameNode
11773 QuorumPeerMain
12671 JournalNode
5.6 验证是否启动成功
通过web界面查看namenode启动情况。
http://master:50070
注意:在浏览器输入以上网址时,需要先在本机的hosts目录下添加如下映射:
192.168.9.80 master
192.168.9.20 slave1
192.168.9.220 slave2
启动成功之后,查看关闭其中一个namenode ,然后在启动namenode 观察切换的状况
使用命令 kill -9 12452
5.7 启动YARN
1、在master节点上执行。
sbin/start-yarn.sh
2、在slave1节点上面执行。
sbin/yarn-daemon.sh start resourcemanager
6.关闭顺序
在slave1节点上面执行:
sbin/yarn-daemon.sh stop resourcemanager
在master节点上执行:
sbin/stop-yarn.sh
sbin/stop-dfs.sh
再次启动时要先启动zookeeper
======================================================================================================
三、HBase安装
1、将软件包上传到Linux系统指定目录下: /software
2、解压到指定的目录:/opt
[hadoop@hadoop ~]$ sudo tar -zxvf /software/hbase-1.2.0-cdh5.14.2.tar.gz -C /opt
3、配置环境变量
[hadoop@hadoop ~]$ vim /etc/profile
添加如下两行:
export HBASE_HOME=/opt/hbase-1.2.0-cdh5.14.2
export PATH=$PATH:$HBASE_HOME/bin
环境变量生效:
[hadoop@hadoop ~]$ source /etc/profile
4、编辑hbase-env.sh文件
[hadoop@hadoop ~]$ cd /opt/hbase-1.2.0-cdh5.7.1/conf/
[hadoop@hadoop ~]$ vi hbase-env.sh
export JAVA_HOME=/opt/jdk1.8.0_171
export HBASE_CLASSPATH=/opt/hadoop-2.6.0-cdh5.14.2/etc/hadoop
export HBASE_MANAGES_ZK=false
5、编辑hbase-site.xml 文件
[hadoop@hadoop ~]$ cd /opt/hbase-1.2.0-cdh5.14.2/conf/
[hadoop@hadoop ~]$ vi hbase-site.xml
修改regionservers文件,指定从节点机器
slave1
slave2
复制到从节点:
scp -r /opt/hbase-1.2.0-cdh5.14.2/ slave1:/opt/
scp -r /opt/hbase-1.2.0-cdh5.14.2/ slave2:/opt/
6、启动
(注意不要再standby的namenode上启动)
bin/hbase-daemon.sh start master
bin/hbase-daemon.sh start regionserver
或者直接
bin/start-hbase.sh
[hadoop@hadoop bin]$ sh start-hbase.sh
7、验证
[root@master bin]# jps
6565 Jps
5990 HMaster
5191 DFSZKFailoverController
4776 NameNode
5001 JournalNode
25565 QuorumPeerMain
5311 ResourceManager
[root@slave1 opt]# jps
19904 Jps
19009 NameNode
19667 HRegionServer
19194 DFSZKFailoverController
19083 JournalNode
14383 QuorumPeerMain
8、进入shell
[hadoop@hadoop ~]$ cd /opt/hbase-1.2.0-cdh5.14.2/bin
[hadoop@hadoop bin]$ hbase shell
9、进入hbase web
http://master:60010/master-status
=================================================================================================================
四、Hive安装(注:Hive只需在一个节点上安装)
1、将软件包上传到Linux系统指定目录下: /software
2、解压到指定的目录:/opt
[hadoop@hadoop ~]$ sudo tar -zxvf /software/hive-1.1.0-cdh5.14.2.tar.gz -C /opt
3、配置环境变量
[hadoop@hadoop ~]$ vi /etc/profile
添加如下两行:
export HIVE_HOME=/opt/hive-1.1.0-cdh5.14.2
export PATH=$PATH:$HIVE_HOME/bin
环境变量生效:
[hadoop@hadoop ~]$ source /etc/profile
4、编辑hive-site.xml 文件
[hadoop@hadoop ~]$ cd /opt/hive-1.1.0-cdh5.14.2/conf/
[hadoop@hadoop ~]$ vi hive-site.xml
#mysql 数据配置
#配置 Hive 临时文件存储地址
5.上传mysql JDBC的jar到hive的lib下,并修改权限
mysql-connector-java-5.1.44-bin.jar
6.修改hive-env.xml
修改如下内容:
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.14.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/hive-1.1.0-cdh5.14.2/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/opt/hive-1.1.0-cdh5.14.2/lib
7.重命名hive-log4j.properties (去掉.template)
$ mkdir logs
修改hive-log4j.properties
hive.log.dir=/opt/hive-1.1.0-cdh5.14.2/logs
/**
注意:启动hive之前要先启动mysql
为hive创建一个mysql用户,并且赋予权限
mysql> create user 'hive'@'%' identified by 'hive';
mysql> grant all on *.* to 'hive'@'master' identified by 'hive';
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'localhost' IDENTIFIED BY 'hive' WITH GRANT OPTION;
mysql> set password for 'hive'@'master'=password('hive');
mysql> flush privileges;
查看所有用户:
select user,host from mysql.user;
删除用户:
DROP USER 'hive'@'%'
*/
8.执行初始化Hive操作
schematool -dbType mysql -initSchema ## MySQL作为元数据库
9、启动
在第一个窗口中输入:hive --service metastore &
[hadoop@hadoop ~]$ cd /opt/hive-1.1.0-cdh5.14.2/bin
10、验证
$ bin/hive
hive> show databases;
OK
default
Time taken: 8.651 seconds, Fetched: 1 row(s)
11、置mysql开机启动
// 检查是否已经是开机启动
systemctl list-unit-files | grep mysqld
// 开机启动
systemctl enable mysqld.service
=====================================================================
五、flume 安装
说明:
Flume OG:Flume original generation 即Flume 0.9.x版本
Flume NG:Flume next generation ,即Flume 1.x版本。
1.下载flume的cdh 版本:flume-ng-1.6.0-cdh5.14.2.tar.gz
tar -zxvf flume-ng-1.6.0-cdh5.14.2.tar.gz -C /opt
2.修改配置文件
flume-env.sh 增加java 的环境
export JAVA_HOME=/opt/jdk1.8.0_171
3、配置环境变量
[hadoop@hadoop ~]$ vim /etc/profile
添加如下两行:
export FLUME_HOME=/opt/apache-flume-1.6.0-cdh5.14.2-bin
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$PATH:$FLUME_HOME/bin
环境变量生效:
[hadoop@hadoop ~]$ source /etc/profile
4.配置文件
cp flume-conf.properties.template flume-conf.properties
-------------------------------------------------------
在打印台打印:
agent.sources = source1
agent.channels = channel1
agent.sinks = sink1
# For each one of the sources, the type is defined
agent.sources.source1.type = netcat
agent.sources.source1.bind = localhost
agent.sources.source1.port = 44444
# The channel can be defined as follows.
agent.sources.source1.channels = channel1
# Each sink's type must be defined
agent.sinks.sink1.type = logger
#Specify the channel the sink should use
agent.sinks.sink1.channel = channel1
# Each channel's type is defined.
agent.channels.channel1.type = memory
agent.channels.channel1.transactionCapacity = 100
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.channel1.capacity = 1000
-------------------------------------------------------
直接写入到hdfs:
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'agent'
agent1.sources = source1 //定义source名此片为source1
agent1.channels = channel1 //定义channel名此片为channel1
agent1.sinks = sink1 //定义sink名此片为sink1
# For each one of the sources, the type is defined
agent1.sources.source1.type = exec //source的类型,exec
agent1.sources.source1.shell = /bin/bash -c
agent1.sources.source1.command = tail -n +0 -F /usr/local/nginx/logs/access.log //要采集的日志文及命令
agent1.sources.source1.channels = channel1
agent1.sources.source1.threads = 5;
# The channel can be defined as follows.
agent1.channels.channel1.type = memory
agent1.channels.channel1.capacity = 100
agent1.channels.channel1.transactionCapacity = 100
agent1.channels.channel1.keep-alive = 30
# Each sink's type must be defined
df
agent1.sinks.sink1.type = hdfs //sink的类型:hdfs
#Specify the channel the sink should use
agent1.sinks.sink1.channel = channel1
agent1.sinks.sink1.hdfs.path = hdfs://192.168.89.29:9000/flume //hdfs的api地址
agent1.sinks.sink1.hdfs.writeFormat = Text
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.hdfs.rollInterval = 0
agent1.sinks.sink1.hdfs.rollSize = 100
agent1.sinks.sink1.hdfs.rollCount = 0
agent1.sinks.sink1.hdfs.batchSize = 100
agent1.sinks.sink1.hdfs.txnEventMax = 100
agent1.sinks.sink1.hdfs.callTimeout = 60000
----------------------------------------------------------------------
直接写入到kafka:
flume2kafka by log4j:
a1.sources = r1
a1.channels = c1
a1.sinks = k1
a1.sources.r1.type = avro
a1.sources.r1.bind = master
a1.sources.r1.port = 44444
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = FlumeLog4jTopic1
a1.sinks.k1.kafka.bootstrap.servers = master:9092
a1.sinks.k1.kafka.batchSize=10
a1.channels.c1.type=memory
a1.sources.r1.channels=c1
a1.sinks.k1.channel = c1
flume-ng agent --conf conf --conf-file /opt/apache-flume-1.6.0-cdh5.14.2-bin/conf/flume2kafkabylog4j.properties --name a1 -Dflume.root.logger=INFO,console
Flume2KafkaAgent.sources=mysource
Flume2KafkaAgent.channels=mychannel
Flume2KafkaAgent.sinks=mysink
Flume2KafkaAgent.sources.mysource.type=spooldir
Flume2KafkaAgent.sources.mysource.channels=mychannel
Flume2KafkaAgent.sources.mysource.spoolDir=/tmp/logs
Flume2KafkaAgent.sinks.mysink.channel=mychannel
Flume2KafkaAgent.sinks.mysink.type=org.apache.flume.sink.kafka.KafkaSink
Flume2KafkaAgent.sinks.mysink.kafka.bootstrap.servers=master:9092,slave1:9093,slave2:9094
Flume2KafkaAgent.sinks.mysink.kafka.topic=FlumeKafkaSinkTopic1
Flume2KafkaAgent.sinks.mysink.kafka.batchSize=20
Flume2KafkaAgent.sinks.mysink.kafka.producer.requiredAcks=1
Flume2KafkaAgent.channels.mychannel.type=memory
Flume2KafkaAgent.channels.mychannel.capacity=30000
Flume2KafkaAgent.channels.mychannel.transactionCapacity=100
flume-ng agent --conf conf --conf-file /opt/apache-flume-1.6.0-cdh5.14.2-bin/conf/kafkasink.properties --name Flume2KafkaAgent -Dflume.root.logger=INFO,console
----------------------------------------------------------------------
写入到HDFS:
agent1.sources=source1
agent1.sinks=sink1
agent1.channels=channel1
#配置source1
agent1.sources.source1.type=spooldir
agent1.sources.source1.spoolDir=/root/data/
agent1.sources.source1.channels=channel1
agent1.sources.source1.fileHeader = false
#配置sink1
agent1.sinks.sink1.type=hdfs
agent1.sinks.sink1.hdfs.path=hdfs://master:9000/data
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
agent1.sinks.sink1.hdfs.fileType=DataStream
agent1.sinks.sink1.hdfs.writeFormat=TEXT
agent1.sinks.sink1.hdfs.rollSize=0
#不根据event个数产生新文件
agent1.sinks.sink1.hdfs.rollCount = 0
agent1.sinks.sink1.hdfs.rollInterval=0
agent1.sinks.sink1.hdfs.minBlockReplicas=1
agent1.sinks.sink1.hdfs.threadsPoolSize = 30
agent1.sinks.sink1.channel=channel1
#配置channel1
agent1.channels.channel1.type=file
agent1.channels.channel1.checkpointDir=/root/data/point
agent1.channels.channel1.dataDirs = /root/data/tmp
flume-ng agent --conf conf --conf-file /opt/apache-flume-1.6.0-cdh5.14.2-bin/conf/flume-hdfs.properties --name agent1 -Dflume.root.logger=INFO,console
5.启动agent
./bin/flume-ng agent --conf conf --conf-file /opt/apache-flume-1.6.0-cdh5.14.2-bin/conf/flume-conf.properties --name a1 -Dflume.root.logger=INFO,console
如果执行flume-ng 的时候执行报错:Caused by: java.lang.ClassNotFoundException: org.apache.flume.tools.GetJavaProperty
错误: 找不到或无法加载主类 org.apache.flume.tools.GetJavaProperty或者Error: Could not find or load main class org.apache.flume.tools.GetJavaProperty
一般来说是由于装了HBASE等工具的原因
将hbase的hbase.env.sh的一行配置注释掉
# Extra Java CLASSPATH elements. Optional.
#export HBASE_CLASSPATH=/home/hadoop/hbase/conf
2、或者将HBASE_CLASSPATH改为JAVA_CLASSPATH,配置如下
# Extra Java CLASSPATH elements. Optional.
export JAVA_CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
6.telnet 登陆处理:
重新复制一个窗口:
telnet localhost 44444
telnet安装:
安装方式:yum
[root@master ~]# yum list |grep telnet
telnet-server.x86_64 1:0.17-59.el7 @base
telnet.x86_64 1:0.17-59.el7 base
[root@localhost /]# yum install telnet-server.x86_64
[root@localhost /]]# yum install telnet.x86_64
[root@localhost /]# yum list |grep xinetd
xinetd.x86_64 2:2.3.15-12.el7 @base
[root@localhost /]# yum install xinetd.x86_64
安装完成后,将xinetd服务加入开机自启动:
systemctl enable xinetd.service
将telnet服务加入开机自启动:
systemctl enable telnet.socket
最后,启动以上两个服务即可:
systemctl start telnet.socket
systemctl start xinetd(或service xinetd start)
补充:source和channel之间的关系是多对多的关系
channel和sink之间的关系是一对多的关系
==============================================================================
六、kafka安装部署多个节点
1.下载Kafka并解压
tar -zxvf kafka_2.11-2.0.0.tgz -C /opt //其对应的 Scala 版本为 2.11:
2.配置环境变量
vim /etc/profile
export KAFKA_HOME=/opt/kafka_2.11-2.0.0/
export PATH=$PATH:$KAFKA_HOME/bin
环境变量生效:
[hadoop@hadoop ~]$ source /etc/profile
2.配置文件修改
vi server.properties
broker.id=1,其他两个为2,3
delete.topic.enable=true
listeners=PLAINTEXT://:9092 其他两个为9093,9094
zookeeper.connect=192.168.9.80:2181,192.168.9.20:2181,192.168.9.220:2181
3.复制到其他节点
scp -r /opt/kafka_2.11-2.0.0 slave1:/opt/
scp -r /opt/kafka_2.11-2.0.0 slave2:/opt/
[记得修改broker.id]
4.启动Kafka
进入kafka bin目录,敲入命令 ./kafka-server-start.sh -daemon ../config/server.properties
停止kafka
bin/kafka-server-stop.sh
5.检查测试
登录zookeeper(切换到zk的bin目录下),先连接zk:
[root@master bin]# ./zkCli.sh -server 192.168.9.80:2181
[zk: 192.168.9.80:2181(CONNECTED) 3] ls /brokers/ids
[1, 2, 3]
6.创建Topic
./bin/kafka-topics.sh --create --zookeeper 192.168.9.80:2181,192.168.9.20:2181,192.168.9.220:2181 --replication-factor 2 --partitions 2 --topic FlumeKafkaSinkTopic1
查看topic
./bin/kafka-topics.sh --zookeeper 192.168.9.80:2181,192.168.9.20:2181,192.168.9.220:2181 --list
信息查看
[root@master bin]#./kafka-topics.sh --zookeeper 192.168.9.80:2181,192.168.9.20:2181,192.168.9.220:2181 --describe --topic FlumeKafkaSinkTopic1
Topic:FlumeKafkaSinkTopic1 PartitionCount:2 ReplicationFactor:2 Configs:
Topic: FlumeKafkaSinkTopic1 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: FlumeKafkaSinkTopic1 Partition: 1 Leader: 2 Replicas: 2,3 Isr: 2,3
7.发送消息
bin/kafka-console-producer.sh --topic FlumeKafkaSinkTopic1 --broker-list 192.168.9.80:9092,192.168.9.20:9093,192.168.9.220:9094
8.消费消息
bin/kafka-console-consumer.sh --topic FlumeLog4jTopic1 --bootstrap-server 192.168.9.80:9092 --from-beginning
9.删除topic
kafka-topics.sh --delete --zookeeper 192.168.9.80:2181 --topic FlumeKafkaSinkTopic1
log4j.logger.org.example.MyClass = INFO,stdout,flume