环境安装
前提是已经安装好JDK1.8 Hadoop2.7 zookeeper3.4 scala2.12
1安装Kafka
1)下载安装包
参考:http://blog.csdn.net/u014035172/article/details/68061463
首先,官网上下载最新版本的Kafka,解压到某个文件夹
2)配置环境:编辑里面的server.properties文件,主要配置如下信息:
broker.id=90 #即为kafka服务器起一个id,集群中这个值应保持不同,可以用ip最后一段
host.name=192.168.100.90 #绑定该服务器对应的ip,程序访问该broker时就填写该ip
zookeeper.connect=192.168.100.90:2181 #zookeeper地址
3)启动:
nohup bin/kafka-server-start.sh config/server.properties >kafka.log 2>&1 &
或
bin/kafka-server-start.sh -daemon config/server.properties &
#注:启动时请使用nohup或者-daemon方式启动,不然你关闭了你的终端,kafka服务也随着停了。其余kafka服务器的配置参考上面配置,请注意broker.id的配置,每台服务器应不同
4)测试是否安装成功(前提是已经启动了ZK)
Kafka为我们提供了一个console来做连通性测试,下面我们先运行producer:bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test 这是相当于开启了一个producer的命令行。命令行的参数我们一会儿再解释。
接下来运行consumer,新启一个terminal:bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
执行完consumer的命令后,你可以在producer的terminal中输入信息,马上在consumer的terminal中就会出现你输的信息。有点儿像一个通信客户端。
如果你能看到5执行了,说明你单机版部署成功了。下面解释下两条命令中参数的意思
到这里Kafka的任务就完成了。
2.安装Flume
1)下载安装包
首先,官网上下载最新版本的Kafka,解压到某个文件夹。
2)配置环境
参考:http://www.cnblogs.com/the-tops/p/6008040.html
copy flume-conf.properties.template flume-conf.properties
agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink
# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F /home/man***/proj/log/debug.log
agent.sources.seqGenSrc.channels = memoryChannel
# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel
# Each sink's type must be defined
agent.sinks.loggerSink.type = org.apache.flume.sink.kafka.KafkaSink //这里不要用插件包下面的
agent.sinks.loggerSink.topic= testlog //这个是topic,一会kafka消费者会用这个
agent.sinks.loggerSink.metadata.broker.list=localhost:9092
agent.sinks.loggerSink.kafka.bootstrap.servers = localhost:9092 //要定义,否则flume启动会报错,提示初始化bootstrap
agent.sinks.loggerSink.partition.key=0
agent.sinks.loggerSink.partition.class=org.apache.flume.plugins.SinglePartition
agent.sinks.loggerSink.serializer.class=kafka.serializer.StringEncoder
agent.sinks.loggerSink.request.required.acks=0
agent.sinks.loggerSink.max.message.size=20
agent.sinks.loggerSink.producer.type=sync
agent.sinks.loggerSink.custom.encoding=UTF-8
3)启动
bin
/flume-ng
agent --conf conf --conf-
file
conf
/roomy
.conf --name producer -Dflume.root.logger=INFO,console
如果没有报错,就启动了
2017-09-30 19:50:22,695 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: loggerSink started
这样就完成了flume的安装和配置。
2.准备实时日志
写一个简单的产生日志的程序。
public class LogGenetator implements Runnable {
Logger logger = Logger.getLogger(LogGenetator.class);
private int num;
public LogGenetator(int num) {
this.num = num;
}
public static void main(String[] args) {
for (int i = 0; i < 4; i++) {
new Thread(new LogGenetator(i)).start();
}
}
public void run() {
while (true) {
logger.debug("Test infomation produced by " + Thread.currentThread().getName());
try {
Thread.sleep((long) (Math.random() * 1000));
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
log4j.properties日志的路径配置和上文中flume读取日志的地址一样。
打包运行,上面的程序就会不断的产生日志。
4.写一个消费者程序
public class LogConsumer {
public static void main(String[] args) {
// Create a consumer
KafkaConsumer consumer;
// Configure the consumer
Properties properties = new Properties();
// Point it to the brokers
properties.setProperty("bootstrap.servers", "localhost:9092");
// Set the consumer group (all consumers must belong to a group).
//kafka的默认groupid,不配置会报错
properties.setProperty("group.id", "test-consumer-group");
// Set how to serialize key/value pairs
properties.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
// When a group is first created, it has no offset stored to start
// reading from. This tells it to start
// with the earliest record in the stream.
properties.setProperty("auto.offset.reset", "earliest");
consumer = new KafkaConsumer(properties);
// 这是上边flume中配置的topic名字
consumer.subscribe(Arrays.asList("testlog"));
// Loop until ctrl + c
int count = 0;
while (true) {
// Poll for records
ConsumerRecords records = consumer.poll(20);
// Did we get any?
if (records.count() == 0) {
System.out.println("records count is 0");
} else {
// Yes, loop over records
for (ConsumerRecord record : records) {
// Display record and count
count += 1;
System.out.println(count + ": " + record.value());
}
}
}
}
}
打包运行,就可以在控制台看到第三步的实时日志了。