分布式搭建-5 Flume搭建

环境:
    1..版本选择
    参考: hadoop集群版本兼容问题
          hadoop-2.5.0
          hive-0.13.1
          spark-1.2.0
          hbase-0.98.6
          flume-ng-1.7.0
          jdk-8u161-linux-x64.tar.gz

下载地址: http://flume.apache.org/download.html

组件分布:

 

qy01

qy02

qy03

HDFS

NameNode

DataNode

DataNode

DataNode

YARN

ResourceManager

NodeManager

NodeManager

NodeManager

Zookeeper

Zookeeper

Zookeeper

Zookeeper

Kafka

Kafka

Kafka

Kafka

Flume

Flume

Flume

Flume

Spark

Spark

 

 

Hive

Hive

 

 

Mysql

Mysql

 

 

(1)修改文件权限

chmod u+x apache-flume-1.7.0-bin.tar.gz 

解压文件,命令:

tar -zxvf apache-flume-1.7.0-bin.tar.gz -C /opt/modules/

修改文件名

mv apache-flume-1.7.0-bin/ flume-1.7.0

2.配置环境变量

sudo vim ~/.bashrc


#flume
export FLUME_HOME=/opt/modules/flume-1.7.0
export PATH=$PATH:${FLUME_HOME}/bin

 

5.安装flume的前提:

System Requirements

  1. Java Runtime Environment - Java 1.8 or later
  2. Memory - Sufficient memory for configurations used by sources, channels or sinks
  3. Disk Space - Sufficient disk space for configurations used by channels or sinks
  4. Directory Permissions - Read/Write permissions for directories used by agent

系统要求

  1. Java运行时环境 - Java 1.8或更高版本
  2. 内存 - 源,通道或接收器使用的配置的足够内存
  3. 磁盘空间 - 通道或接收器使用的配置的足够磁盘空间
  4. 目录权限 - 代理使用的目录的读/写权限

6.修改配置文件

官网配置参考:http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html

使用notepad++连接虚拟机:

将解压的flume分发到第二台主机

scp -r flume-1.7.0/ hadoop02:/opt/modules/

架构为:第二、三台主机从内容linux执行结果中获取数据放到 channels 中,推送的第一台主机。

 

1.先配置第二台主机

切换目录到 flume-1.7.0/conf

配置 flume-env.sh 

mv flume-env.sh.template flume-env.sh

打开  flume-env.sh  设置java路径

配置 flume-conf.properties

mv flume-conf.properties.template flume-conf.properties

打开 flume-conf.properties,删除所有内容,添加以下内容



agent2.sources = r1
agent2.channels = c1
agent2.sinks = s1


agent2.sources.r1.type = exec
agent2.sources.r1.command = tail -F /opt/datas/weblog-flume.log
agent2.sources.r1.channels = c1


agent2.channels.c1.type = memory
agent2.channels.c1.capacity = 10000
agent2.channels.c1.transactionCapacity = 10000
agent2.channels.c1.keep-alive = 5


agent2.sinks.s1.type = avro
agent2.sinks.s1.channel = c1
agent2.sinks.s1.hostname = hadoop01
agent2.sinks.s1.port = 5555

2.配置第三台主机

将第二台配置好的flume整个文件分发到第三台主机

scp -r flume-1.7.0/ hadoop03:/opt/modules/

配置 flume-conf.properties

打开 flume-conf.properties,删除所有内容,添加以下内容(主要修改进程名: agent2修改agent3

agent3.sources = r1
agent3.channels = c1
agent3.sinks = s1


agent3.sources.r1.type = exec
agent3.sources.r1.command = tail -F /opt/datas/weblog-flume.log
agent3.sources.r1.channels = c1


agent3.channels.c1.type = memory
agent3.channels.c1.capacity = 10000
agent3.channels.c1.transactionCapacity = 10000
agent3.channels.c1.keep-alive = 5


agent3.sinks.s1.type = avro
agent3.sinks.s1.channel = c1
agent3.sinks.s1.hostname = hadoop01
agent3.sinks.s1.port = 5555

 

2.配置第一台主机

一个源对应两个sink

agent1.sources = r1
agent1.channels = kafkaC hbaseC
agent1.sinks = kafkaSink hbaseSink


#************************flume+habse*************************

agent1.sources.r1.type = avro
agent1.sources.r1.channels = kafkaC hbaseC
agent1.sources.r1.bind = hadoop01
agent1.sources.r1.port = 5555
agent1.sources.r1.threads = 5

agent1.channels.hbaseC.type = memory
agent1.channels.hbaseC.capacity = 100000
agent1.channels.hbaseC.transactionCapacity = 100000
agent1.channels.hbaseC.keep-alive = 20

agent1.sinks.hbaseSink.type = asynchbase
agent1.sinks.hbaseSink.table = weblogs
agent1.sinks.hbaseSink.columnFamily = info
agent1.sinks.hbaseSink.serializer = org.apache.flume.sink.hbase.KfkAsyncHbaseEventSerializer
agent1.sinks.hbaseSink.channel = hbaseC
agent1.sinks.hbaseSink.serializer.payloadColumn = datatime,userid,searchname,retorder,cliorder,cliurl


#************************flume+kafka*************************

agent1.channels.kafkaC.type = memory
agent1.channels.kafkaC.capacity = 100000
agent1.channels.kafkaC.transactionCapacity = 100000
agent1.channels.kafkaC.keep-alive = 20

agent1.sinks.kafkaSink.channel = kafkaC
agent1.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafkaSink.brokerList = hadoop01:9092,hadoop02:9092,hadoop03:9092
agent1.sinks.kafkaSink.topic = weblogs
agent1.sinks.kafkaSink.zookeeperConnect = hadoop01:2181,hadoop02:2181,hadoop03:2181
agent1.sinks.kafkaSink.requiredAcks = 1
agent1.sinks.kafkaSink.batchSize = 1
agent1.sinks.kafkaSink.serializer.class = kafka.serializer.StringEncoder

 

你可能感兴趣的:(安装搭建)