Flume安装与配置

Flume安装与配置

  • Flume介绍
  • 环境
  • 下载
  • 安装
  • 配置
  • 案例

Flume介绍

一款分布式的海量应用日志采集、聚合、传输的框架,支持配置多种数据发送方与接收方,具有高可用、高可靠的特性。比如可以同时从文件目录、log4j、http、avro、kafka等渠道收集日志,通过传输输出到kafka、hdfs、MySQL、文件等。在传输过程中可以对数据做简单处理。

flume的核心是agent,而agent包含source、channel、sink三个组件。

source:source组件是专门用来收集数据的,可以处理各种类型、各种格式的日志数据,包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy、自定义。

channel:source组件把数据收集来以后,临时存放在channel中,即channel组件在agent中是专门用来存放临时数据的——对采集到的数据进行简单的缓存,可以存放在memory、jdbc、file等等。

sink:sink组件是用于把数据发送到目的地的组件,目的地包括hdfs、logger、avro、thrift、ipc、file、null、hbase、solr、自定义

环境

操作系统:Centos 7
Hadoop版本:2.9.2
JDK版本:1.8.0_221
Flume版本:1.9.0
集群规划:

主机名 IP
master 192.168.1.121
slave1 192.168.1.122
slave2 192.168.1.123

下载

下载地址:https://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.9.0/
Flume安装与配置_第1张图片

安装

将其解压到/home/apps/目录下

[root@master dev]#  mkdir -p /home/apps/
[root@master dev]#  tar -zxvf apache-flume-1.9.0-bin.tar.gz -C /home/apps/
[root@master dev]#  mv /home/apps/apache-flume-1.9.0-bin /home/apps/flume

配置

进入flume的conf目录下,复制flume-env.sh.template为flume-env.sh,并修改

[root@master dev]# cd /home/apps/flume/conf
[root@master conf]# cp flume-env.sh.template flume-env.sh
[root@master conf]# vi flume-env.sh
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced
# during Flume startup.

# Enviroment variables can be set here.

export JAVA_HOME=/usr/local/jdk/jdk1.8.0_221

# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
# export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"

# Let Flume write raw event data and configuration information to its log files for debugging
# purposes. Enabling these flags is not recommended in production,
# as it may result in logging sensitive user information or encryption secrets.
# export JAVA_OPTS="$JAVA_OPTS -Dorg.apache.flume.log.rawdata=true -Dorg.apache.flume.log.printconfig=true "

# Note that the Flume conf directory is always included in the classpath.
#FLUME_CLASSPATH=""

修改配置

export JAVA_HOME=/usr/local/jdk/jdk1.8.0_221

查看是否配置成功

[root@master flume]# bin/flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: 511d868555dd4d16e6ce4fedc72c2d1454546707
Compiled by bessbd on Wed Oct 12 20:51:10 CEST 2016
From source with checksum 0d21b3ffdc55a07e1d08875872c00523

案例

新建一个flume-conf.properties文件

[root@master conf]# vi flume-conf.properties

#agent中各组件的名字
##表示agent中的source组件
a1.sources = r1
##表示的是下沉组件sink
a1.sinks = k1
##agent内部的数据传输通道channel,用于从source将数据传递到sink
a1.channels = c1
 
#描述和配置source组件:r1
##netcat用于监听一个端口的
a1.sources.r1.type = netcat
##配置的绑定地址,这个机器的hostname是master,所以下面也可以配置成master
a1.sources.r1.bind = master
##配置的绑定端口
a1.sources.r1.port = 44444
 
#描述和配置sink组件:k1
a1.sinks.k1.type = logger
 
##描述和配置channel组件,此处使用时内存缓存的方式
#下面表示的是缓存到内存中,如果是文件,可以使用file的那种类型
a1.channels.c1.type = memory
#表示用多大的空间
a1.channels.c1.capacity = 1000
#下面表示用事务的空间是多大
a1.channels.c1.transactionCapacity = 100
 
# 描述和配置source channel sink之间的连接关系,因为source和sink依赖channel来传递数据,所以要分别指定用的是哪个channel。
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
-- INSERT --

注:a1.sources.r1.bind = master (主机名或者IP地址)

测试

[root@master flume]# bin/flume-ng agent -c conf -f conf/flume-conf.properties -n a1 -Dflume.root.logger=INFO,console
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/apps/flume/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2021-01-23 11:24:23,067 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2021-01-23 11:24:23,072 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:134)] Reloading configuration file:conf/flume-conf.properties
2021-01-23 11:24:23,079 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: k1 Agent: a1
2021-01-23 11:24:23,079 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:k1
2021-01-23 11:24:23,079 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:k1
2021-01-23 11:24:23,090 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [a1]
2021-01-23 11:24:23,090 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels
2021-01-23 11:24:23,095 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel c1 type memory
2021-01-23 11:24:23,099 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel c1
2021-01-23 11:24:23,100 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source r1, type netcat
2021-01-23 11:24:23,108 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k1, type: logger
2021-01-23 11:24:23,111 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel c1 connected to [r1, k1]
2021-01-23 11:24:23,116 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:137)] Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@61ed1640 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
2021-01-23 11:24:23,129 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:144)] Starting Channel c1
2021-01-23 11:24:23,217 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
2021-01-23 11:24:23,218 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: c1 started
2021-01-23 11:24:23,220 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:171)] Starting Sink k1
2021-01-23 11:24:23,220 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:182)] Starting Source r1
2021-01-23 11:24:23,221 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting
2021-01-23 11:24:23,232 (lifecycleSupervisor-1-2) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:169)] Created 
serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.1.121:44444]

打开slave1节点并安装Telnet

[root@slave1 ~]# yun install -y telnet

使用Telnet连接master上的Flume

[root@slave1 ~]# telnet 192.168.1.121 44444
Trying 192.168.1.121...
Connected to 192.168.1.121.
Escape character is '^]'.
hello world
OK
hello flume
OK

返回master节点查看

2021-01-23 11:28:28,639 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64 0D             hello world. }
2021-01-23 11:28:33,747 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 20 66 6C 75 6D 65 0D             hello flume. }

结语:大数据Hadoop笔记 Flume 安装与配置

你可能感兴趣的:(大数据Hadoop,flume,hadoop,linux)