日志采集框架--Flume

日志采集框架--Flume_第1张图片

日志收集框架–flume

webServer(源端) –> flume –> hdfs(目的地)

日志采集框架--Flume_第2张图片

flume框架核心组件

source: 日志来源

channel: 渠道,数据处理管道

sink:存储目的地(要下落的地方)

jdk下载安装

下载:jdk-8-linux-x64.tar.gz

上传:rz

解压:tar -zvxf jdk-8-linux-x64.tar.gz -C ~/soft_install/

配置配置文件:vi ~/.bash_profile

export JAVA_HOME = /root/soft_install/jdk1.8.0

export PATH = $JAVA_HOME/bin:$PATH

source ~/.bash_profile

检测:java -version

flume下载安装

一:

下载:http://archive.cloudera.com/cdh5/cdh/5/

上传:rz

解压:tar -zvxf flume-ng-1.6.0-cdh5.7.0.tar.gz -C ~/soft_install/

配置配置文件:vi ~/.bash_profile

>### export FLUME_HOME = /root/soft_install/apache-flume-1.6.0-cdh5.7.0-bin

export PATH = $FLUME_HOME/bin:$PATH

source ~/.bash_profile

二:

配置conf下配置文件:

cp flume-env.sh.template flume-env.sh

vi flume-env.sh 添加 JAVA_HOME = /root/soft_install/jdk1.8.0

检测:flume-ng version

启动flume配置文件

 flume-ng agent \
  --name avro-memory-logger \
  --conf $FLUME_HOME/conf \
  --conf-file $FLUME_HOME/conf/exampleB.conf \
  -Dflume.root.logger=INFO,console 

Event

Event: { headers:{} body: 69 20 6C 6F 76 65 20 6C 69 66 08 6E 66 65 69 66 i love lif.nfeif }

Event是flume中数据传输的基本单元

Event = 可选的header + bye array

flume核心就在于配置文件,新增一个配置文件,指定agent、source、channel、sink

关键是选择何种source、channel、sink

实战一:从指定的网络端口采集(获取)日志信息,并打印在控制台上

技术选型:netcat source + memory channel + logger sink

一: vi example.conf – 详见配置文件

二: 启动

  flume-ng agent \
  --name a1 \
  --conf $FLUME_HOME/conf \
  --conf-file $FLUME_HOME/conf/exampleB.conf \
  -Dflume.root.logger=INFO,console  

三:测试

另开一个窗口:telnet 192.168.145.128 44444 – 查询原窗口是否有日志信息打印

实战二:实时监控一个文件新增的内容

技术选型:exec source + memory channel + logger sink

一: vi example2.conf – 详见配置文件

二: 启动 – 最后一句是打印info级别的日志到控制台上

flume-ng agent \

–name a1 \
–conf FLUMEHOME/conf conffile FLUME_HOME/conf/example2.conf \
-Dflume.root.logger=INFO,console

三:测试

另开一个窗口:telnet 192.168.145.128 44444 – 查询原窗口是否有日志信息打印

实战二进阶–离线处理

将收到的日志信息保存到hdfs中

技术选型:exec source + memory channel + hdfs sink

example3.conf

日志采集过程

机器A监控一个文件,将结果 (avro) sink 到另一个节点

机器B采用(avro) source接受 机器A sink的数据

机器B可采用logger将数据打印在控制台,或者保存,或者(kafka)

example1.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.binf = hadoop01
a1.sources.r1.port = 44444

# Describe/ the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

example2.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F  /root/data/example2.txt
a1.sources.r1.shell = /bin/sh -c

# Describe/ the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

example3.conf

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F  /root/data/example2.txt
a1.sources.r1.shell = /bin/sh -c

# Describe/ the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://192.168.145.128:8020

# Use a channel which buffers events in memory
a1.channels.c1.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

exampleA.conf

# example exec-memory-avro
exec-memory-avro.sources = exec-source
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel

# Describe/configure the source
exec-memory-avro.sources.exec-source.type = exec
exec-memory-avro.sources.exec-source.command = tail -F  /root/data/exampleA.txt
exec-memory-avro.sources.exec-source.shell = /bin/sh -c

# Describe/ the sink
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = 192.168.145.128
exec-memory-avro.sinks.avro-sink.port = 44444 

# Use a channel which buffers events in memory
exec-memory-avro.channels.memory-channel.type = memory

# Bind the source and sink to the channel
exec-memory-avro.sources.exec-source.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel

exampleB.conf

# example avro-memory-logger
avro-memory-logger.sources = avro-source
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel

# Describe/configure the source
avro-memory-logger.sources.avro-source.type = avro
avro-memory-logger.sources.avro-source.bind = 192.168.145.128
avro-memory-logger.sources.avro-source.port = 44444

# Describe/ the sink
avro-memory-logger.sinks.logger-sink.type = logger

# Use a channel which buffers events in memory
avro-memory-logger.channels.memory-channel.type = memory

# Bind the source and sink to the channel
avro-memory-logger.sources.avro-source.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel

你可能感兴趣的:(大数据)