本文介绍使用logstash-kafka-elasticsearch组合进行数据交互。首先用logstash进行数据采集,然后数据通过kafka进入elasticsearch。
文档版本:
kafka_2.11-0.9.0.0
logstash-2.2.0
elasticsearch-2.2.0
下面介绍连接方法:
input {
stdin{}
}
filter {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:logdate} %{LOGLEVEL:loglevel}\s?(?
}
}
date {
match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS"]
target => "@timestamp"
}
mutate {
remove => ["logdate"]
}
}
output {
kafka{
topic_id => "l-k-e"
bootstrap_servers => "192.168.5.128:9092"
batch_size => 5
}
stdout{
codec => rubydebug
}
}
说明:红色部分为自己定义的正则表达式,这个配置是实现logstash收集一条日志信息,并把@timestamp(时间戳)字段改为日志中的时间,然后通过正则匹配来切割日志。这里的正则需要大家根据不同的日志自行开发,后面我会贴出我在上诉正则中所匹配的日志。当然logstash2.2.0版本还是提供了很多正则给我们的,具体文件在/logstash-2.2.0/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-2.0.2/patterns/grok-patterns
,当然我们也可以自己写。
这里我提供给大家两种消费,一种是一条一条数据消费进入elasticsearch,另一种是消费多条数据,进行缓存,然后消费到elasticsearch。想必大家也想到了,第二种方式可以当成是优化elasticsearch的一种方式。
首先介绍第一种消费:
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.message.MessageAndMetadata;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
public class KafkaConsumer2 extends Thread {
//建立elasticsearch连接
public static Client initElasticSearch() throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name",
"dahuaidan").build();
Client client = TransportClient.builder().settings(settings).build()
.addTransportAddress(
new InetSocketTransportAddress(InetAddress
.getByName("192.168.5.128"), 9300));
return client;
}
private final ConsumerConnector consumer;
private final String topic;
public KafkaConsumer2(String topic) {
consumer = kafka.consumer.Consumer
.createJavaConsumerConnector(createConsumerConfig());
this.topic = topic;
}
private static ConsumerConfig createConsumerConfig() {
Properties props = new Properties();
props.put("zookeeper.connect", KafkaProperties.zkConnect);
props.put("group.id", "groupid3");// KafkaProperties.groupId1);
props.put("zookeeper.session.timeout.ms", "40000");
props.put("zookeeper.sync.time.ms", "200");
props.put("auto.commit.interval.ms", "1000");
return new ConsumerConfig(props);
}
@Override
public void run() {
Map topicCountMap = new HashMap();
topicCountMap.put(topic, new Integer(1));
Map>> consumerMap = consumer
.createMessageStreams(topicCountMap);
KafkaStream stream = consumerMap.get(topic).get(0);
ConsumerIterator it = stream.iterator();
while (it.hasNext()) {
MessageAndMetadata next = it.next();
Client client;
try {
client = initElasticSearch();
client.prepareIndex().setIndex("es").setType("bulk")
.setSource(next.message()).execute().get();
} catch (UnknownHostException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//next可以带很多信息出来,下面是打印partition和message
System.out.println(next.partition() + new String(next.message()));
}
}
}
第二种消费:
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import com.teamsun.kafka.m001.KafkaProperties;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;
import kafka.message.MessageAndMetadata;
public class KafkaConsumer5 extends Thread {
//建立elasticsearch连接
public static Client init() throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name",
"dahuaidan").build();
Client client = TransportClient.builder().settings(settings).build()
.addTransportAddress(
new InetSocketTransportAddress(InetAddress
.getByName("192.168.5.128"), 9300));
return client;
}
private final ConsumerConnector consumer;
private final String topic;
public KafkaConsumer5(String topic) {
consumer = kafka.consumer.Consumer
.createJavaConsumerConnector(createConsumerConfig());
this.topic = topic;
}
private static ConsumerConfig createConsumerConfig() {
Properties props = new Properties();
props.put("zookeeper.connect", KafkaProperties.zkConnect);
props.put("group.id", KafkaProperties.groupId3);
props.put("zookeeper.session.timeout.ms", "40000");
props.put("zookeeper.sync.time.ms", "200");
props.put("auto.commit.interval.ms", "1000");
return new ConsumerConfig(props);
}
@Override
public void run() {
Map topicCountMap = new HashMap();
topicCountMap.put(topic, new Integer(1));
Map>> consumerMap = consumer
.createMessageStreams(topicCountMap);
KafkaStream stream = consumerMap.get(topic).get(0);
ConsumerIterator it = stream.iterator();
int i = 1;
Client client = null;
try {
client = KafkaConsumer5.init();
} catch (UnknownHostException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
while (it.hasNext()) {
MessageAndMetadata next = it.next();
BulkRequestBuilder bulkRequestBuilder = client.prepareBulk();
//这里可以多次使用add添加多条数据,然后统一发送到elasticsearch
bulkRequestBuilder.add(
client.prepareIndex().setIndex("es").setType("bulk")
.setId("" + i).setSource(next.message()))
.get();//这里注意提交,否则消息进不到ES的
//需要通过String进行转码,大家可以去掉String看看会打印出什么
System.out.println(new String(next.message()));
i++;
}
}
}
这里如果有什么问题可以查看相关文章:关于kafka的生产消费问题
启动logstash
./logstash -f logstash.conf
启动消费者
public class KafkaConsumerProducerTest {
public static void main(String[] args) {
KafkaProducer2 producerThread2 = new KafkaProducer2(KafkaProperties.topic);
producerThread2.start();
KafkaConsumer5 consumerThread5 = new KafkaConsumer5(KafkaProperties.topic);
consumerThread5.start();
}
}
手动输入日志信息(复制粘贴)
2016-08-24 18:05:39,830 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30002 milliseconds
然后在ES查看是否有数据导入就可以了,当然也可以通过命令查看kafka中的数据。建议大家多跑数据测一测。
我在网上看到一些教给初学者的一些producer和consumer的代码,有些代码复制贴过来是会出现问题的,不是说代码本身有问题,可能你简单的生产消费都正常,但是在这里进ES,有些代码就需要改进了,不然可能就会出现一些问题,比如进ES就会丢数据。我就遇到过丢数据丢了一半。
如果大家在上诉的都已经完美的执行了,那我上一段说的就是废话,大家不必在意。如果说进elasticsearch时数据出现问题,或者对此感兴趣的可以查看:
关于kafka-logstash-elasticsearch