hadoop是一个分布式系统,跟hadoop配合的一般也是分布式系统,分布式系统带来的就是分布式日志,分布式日志带来1. 日志数量多 2. 日志数据量大, 所以无论是采集分布式的日志还是存储海量的日志到hadoop,都需要一个日志收集系统,这就是flume。不过其实关系也不是太大,日志方面没有太大需求的人其实可以跳过flume的学习
或者是这样的
$ yum install -y flume-ng flume-ng-agent flume-ng-doc
$ flume-ng help
$ vim /etc/flume-ng/conf/example.conf
# example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
$ flume-ng agent --conf conf --conf-file /etc/flume-ng/conf/example.conf --name a1 -Dflume.root.logger=INFO,console
$ curl -X GET http://localhost:44444?w=helloworld OK OK OK OK OK
14/07/04 11:39:12 INFO sink.LoggerSink: Event: { headers:{} body: 47 45 54 20 2F 3F 77 3D 68 65 6C 6C 6F 77 6F 72 GET /?w=hellowor } 14/07/04 11:39:12 INFO sink.LoggerSink: Event: { headers:{} body: 55 73 65 72 2D 41 67 65 6E 74 3A 20 63 75 72 6C User-Agent: curl } 14/07/04 11:39:12 INFO sink.LoggerSink: Event: { headers:{} body: 48 6F 73 74 3A 20 6C 6F 63 61 6C 68 6F 73 74 3A Host: localhost: } 14/07/04 11:39:12 INFO sink.LoggerSink: Event: { headers:{} body: 41 63 63 65 70 74 3A 20 2A 2F 2A 0D Accept: */*. } 14/07/04 11:39:12 INFO sink.LoggerSink: Event: { headers:{} body: 0D
以上是flume的最简单的例子,接下来介绍一个稍微复杂一点,但是真正实用的例子
# source, channel, sink definition agent.channels = mem-channel agent.sources = log4j-avro-source agent.sinks = hdfs-sink # Channel # Define a memory channel called mem-channel on agent agent.channels.mem-channel.type = memory # Source # Define an Avro source called log4j-avro-channel on agent and tell it # to bind to host1:12345. Connect it to channel mem-channel. agent.sources.log4j-avro-source.type = avro agent.sources.log4j-avro-source.bind = 192.168.1.126 agent.sources.log4j-avro-source.port = 12345 agent.sources.log4j-avro-source.channels = mem-channel # Sink # Define a logger sink that simply logs all events it receives # and connect it to the other end of the same channel. agent.sinks.hdfs-sink.type = hdfs agent.sinks.hdfs-sink.hdfs.path = hdfs://mycluster/flume/events/ agent.sinks.hdfs-sink.channel = mem-channel
service flume-ng-agent restart
sudo -u flume hdfs dfs -mkdir -p /flume/events/
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.crazycake</groupId> <artifactId>play-flume</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>play-flume</name> <url>http://maven.apache.org</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.17</version> </dependency> <dependency> <groupId>org.apache.flume.flume-ng-clients</groupId> <artifactId>flume-ng-log4jappender</artifactId> <version>1.4.0</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> </dependency> </dependencies> <build> <finalName>${project.artifactId}-${timestamp}</finalName> <plugins> <!-- compiler --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>2.5.1</version> <inherited>true</inherited> <configuration> <source>1.6</source> <target>1.6</target> </configuration> </plugin> <!-- jar --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-jar-plugin</artifactId> <version>2.4</version> <extensions>false</extensions> <inherited>true</inherited> </plugin> </plugins> </build> </project>
# Define the root logger with appender file log4j.rootLogger = DEBUG, stdout, flume # Redirect log messages to console log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n # Define the flume appender log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender log4j.appender.flume.Hostname = host1 log4j.appender.flume.Port = 12345 log4j.appender.flume.UnsafeMode = false log4j.appender.flume.layout=org.apache.log4j.PatternLayout
package org.crazycake.play_flume; import org.apache.log4j.Logger; public class FlumeLog { public static void main(String[] args) { Logger logger = Logger.getLogger(App.class); logger.info("hello world"); logger.info("My name is alex"); logger.info("How are you?"); } }