Flume+Kafka消费实时日志

环境安装

前提是已经安装好JDK1.8 Hadoop2.7  zookeeper3.4 scala2.12 

1安装Kafka

1)下载安装包

参考:http://blog.csdn.net/u014035172/article/details/68061463

首先,官网上下载最新版本的Kafka,解压到某个文件夹

2)配置环境:编辑里面的server.properties文件,主要配置如下信息:

 

broker.id=90  #即为kafka服务器起一个id,集群中这个值应保持不同,可以用ip最后一段
host.name=192.168.100.90  #绑定该服务器对应的ip,程序访问该broker时就填写该ip
zookeeper.connect=192.168.100.90:2181 #zookeeper地址

 

3)启动: 

nohup bin/kafka-server-start.sh config/server.properties >kafka.log 2>&1 &

 

 

bin/kafka-server-start.sh -daemon config/server.properties  &

#注:启动时请使用nohup或者-daemon方式启动,不然你关闭了你的终端,kafka服务也随着停了。其余kafka服务器的配置参考上面配置,请注意broker.id的配置,每台服务器应不同

 

4)测试是否安装成功(前提是已经启动了ZK)

 

Kafka为我们提供了一个console来做连通性测试,下面我们先运行producer:bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 这是相当于开启了一个producer的命令行。命令行的参数我们一会儿再解释。
接下来运行consumer,新启一个terminal:bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
执行完consumer的命令后,你可以在producer的terminal中输入信息,马上在consumer的terminal中就会出现你输的信息。有点儿像一个通信客户端。

如果你能看到5执行了,说明你单机版部署成功了。下面解释下两条命令中参数的意思

到这里Kafka的任务就完成了。

2.安装Flume

1)下载安装包

首先,官网上下载最新版本的Kafka,解压到某个文件夹。

2)配置环境

参考:http://www.cnblogs.com/the-tops/p/6008040.html

copy flume-conf.properties.template flume-conf.properties

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink


# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F /home/man***/proj/log/debug.log
agent.sources.seqGenSrc.channels = memoryChannel


# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel


# Each sink's type must be defined
agent.sinks.loggerSink.type = org.apache.flume.sink.kafka.KafkaSink //这里不要用插件包下面的
agent.sinks.loggerSink.topic= testlog //这个是topic,一会kafka消费者会用这个
agent.sinks.loggerSink.metadata.broker.list=localhost:9092
agent.sinks.loggerSink.kafka.bootstrap.servers = localhost:9092  //要定义,否则flume启动会报错,提示初始化bootstrap
agent.sinks.loggerSink.partition.key=0
agent.sinks.loggerSink.partition.class=org.apache.flume.plugins.SinglePartition
agent.sinks.loggerSink.serializer.class=kafka.serializer.StringEncoder
agent.sinks.loggerSink.request.required.acks=0
agent.sinks.loggerSink.max.message.size=20
agent.sinks.loggerSink.producer.type=sync
agent.sinks.loggerSink.custom.encoding=UTF-8

 

3)启动

bin/flume-ng agent --conf conf --conf-file conf/roomy.conf --name producer -Dflume.root.logger=INFO,console

如果没有报错,就启动了

2017-09-30 19:50:22,695 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: loggerSink started

这样就完成了flume的安装和配置。

 

2.准备实时日志

写一个简单的产生日志的程序。

 

public class LogGenetator implements Runnable {
	Logger logger = Logger.getLogger(LogGenetator.class);
	private int num;
	public LogGenetator(int num) {
		this.num = num;
	}
	public static void main(String[] args) {
		for (int i = 0; i < 4; i++) {
			new Thread(new LogGenetator(i)).start();
		}
	}
	public void run() {
		while (true) {
			logger.debug("Test infomation produced by " + Thread.currentThread().getName());
			try {
				Thread.sleep((long) (Math.random() * 1000));
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
}

 

log4j.properties日志的路径配置和上文中flume读取日志的地址一样。

打包运行,上面的程序就会不断的产生日志。

4.写一个消费者程序

 

public class LogConsumer {
	public static void main(String[] args) {
		// Create a consumer
		KafkaConsumer consumer;
		// Configure the consumer
		Properties properties = new Properties();
		// Point it to the brokers
		properties.setProperty("bootstrap.servers", "localhost:9092");
		// Set the consumer group (all consumers must belong to a group).
                //kafka的默认groupid,不配置会报错
		properties.setProperty("group.id", "test-consumer-group");
		// Set how to serialize key/value pairs
		properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
		properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
		// When a group is first created, it has no offset stored to start
		// reading from. This tells it to start
		// with the earliest record in the stream.
		properties.setProperty("auto.offset.reset", "earliest");
		consumer = new KafkaConsumer(properties);

		// 这是上边flume中配置的topic名字
		consumer.subscribe(Arrays.asList("testlog"));

		// Loop until ctrl + c
		int count = 0;
		while (true) {
			// Poll for records
			ConsumerRecords records = consumer.poll(20);
			// Did we get any?
			if (records.count() == 0) {
				System.out.println("records count is 0");
			} else {
				// Yes, loop over records
				for (ConsumerRecord record : records) {
					// Display record and count
					count += 1;
					System.out.println(count + ": " + record.value());
				}
			}
		}
	}
}

打包运行,就可以在控制台看到第三步的实时日志了。
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(hadoop)