[日志处理工作之一]整合elasticsearch,kibana,flume-ng,kafka实时采集tomcat日志

本文涉及的所有操作均在一个CentOS 6.5虚拟机内,部署成功后可供开发和测试使用


各程序版本:apache-flume-1.7.0  apache-tomcat-7.0.27  elasticsearch-1.5.2  kafka_2.11-0.8.2.1  kibana-4.0.2  scala-2.11


Step 1.关于flume:
apache flume当前版本为1.5.2,内置没有对kafka的支持,1.7.0版本会正式发布对kafka的支持。
把1.7版本源代码下载下来编译,编译过程中遇到下载ua-parser-1.3.0.pom失败,新增

  p2.jfrog.org
  http://p2.jfrog.org/libs-releases

后解决,编译成功。详情可参照
http://blog.csdn.net/yydcj/article/details/38824823
之后启动flume agent,参考脚本:
bin/flume-ng agent -n agent -c conf -f conf/case1-elasticsearch.conf -Dflume.root.logger=INFO,console


Step 2.启动zookeeper
kafka内置了zookeeper:
#bin/zookeeper-server-start.sh config/zookeeper.properties
启动kafka server:
#bin/kafka-server-start.sh config/server.properties


Step 3.start a kafka consumer to test
#bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
通过flume agent source传输数据就可以传输到kafka的consumer中了


Step 4.开启flume agent来订阅kafka中的消息
需要先向flume lib下导入zookeeper的jar包(在kafka安装目录lib下)


5.启动elasticsearch:
#bin/elasticsearch
要想让flume可以向elastic里写入数据,需要向flume的lib里引入
elasticsearch的jar包,lucene-core的jar包


6.配置好config下kibana.yml中的host、port、elasticsearch_url后启动kibana:
#bin/kibana


附flume的配置:
case1-flume-kafka.conf:
agent.sources=source1
agent.channels=channel1
agent.sinks=sink1


agent.sources.source1.type=netcat
agent.sources.source1.bind=localhost
agent.sources.source1.port=44444


agent.channels.channel1.type=memory
agent.channels.channel1.capacity=1000
agent.channels.channel1.transactionCapacity=100


agent.sources.source1.channels=channel1
agent.sinks.sink1.channel=channel1


agent.sinks.sink1.type=org.apache.flume.sink.kafka.KafkaSink
agent.sinks.sink1.topic=test
agent.sinks.sink1.brokerList=localhost:9092
agent.sinks.sink1.requestAcks=1
agent.sinks.sink1.batchSize=20




case1-kafka-flume.conf:
agent.sources=source2
agent.channels=channel2
agent.sinks=sink2


agent.sources.source2.type=org.apache.flume.source.kafka.KafkaSource
agent.sources.source2.zookeeperConnect=localhost:2181
agent.sources.source2.topic=test
agent.sources.source2.groupId=flume
agent.sources.source2.kafka.consumer.timeout.ms=100


agent.channels.channel2.type=memory
agent.channels.channel2.capacity=1000
agent.channels.channel2.transactionCapacity=100


agent.sources.source2.channels=channel2
agent.sinks.sink2.channel=channel2


agent.sinks.sink2.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.sink2.batchSize=100
agent.sinks.sink2.hostNames=172.168.0.10:9300
agent.sinks.sink2.indexName=flume
agent.sinks.sink2.indexType=bar_type
agent.sinks.sink2.clusterName=elasticsearch
agent.sinks.sink2.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer


7.使用tail -F实时监控tomcat产生的日志,tomcat log格式使用默认配置:
flume配置文件case-exec-elasticsearch.conf:
agent.sources=r1
agent.channels=channel2
agent.sinks=sink2


agent.sources.r1.type = exec
agent.sources.r1.command = tail -F /db2fs/opt/log-process/apache-tomcat-7.0.27/logs/localhost_access_log.2015-05-25.txt
agent.sources.r1.interceptors = i1
agent.sources.r1.interceptors.i1.type=regex_extractor
agent.sources.r1.interceptors.i1.regex = (?[0][:0-9]*|[0-9]+.[0-9]+.[0-9]+.[0-9]+)\\s-\\s-\\s\\[(.*)\\]\\s"(.*)"\\s([0-9]{3}|-)\\s([0-9]+|-)
agent.sources.r1.interceptors.i1.serializers=s1 s2 s3 s4 s5
agent.sources.r1.interceptors.i1.serializers.s1.name=IP
agent.sources.r1.interceptors.i1.serializers.s2.name=TIME
agent.sources.r1.interceptors.i1.serializers.s3.name=PROTOCAL
agent.sources.r1.interceptors.i1.serializers.s4.name=STATUS_CODE
agent.sources.r1.interceptors.i1.serializers.s5.name=BYTE_COUNT


agent.channels.channel2.type=memory
agent.channels.channel2.capacity=1000
agent.channels.channel2.transactionCapacity=100


agent.sources.r1.channels=channel2
agent.sinks.sink2.channel=channel2


agent.sinks.sink2.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.sink2.batchSize=100
agent.sinks.sink2.hostNames=9.115.42.108:9300
agent.sinks.sink2.indexName=tomcat
agent.sinks.sink2.indexType=bar_type
agent.sinks.sink2.clusterName=elasticsearch
agent.sinks.sink2.serializer=org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer


8.按照7所配置,监控日志文件即可。

你可能感兴趣的:(日志处理)