日志流实时监控
题目:基于Spark Streaming实现流处理
使用Spark Streaming、Kafka等实现日志流处理功能,可以捕获新应用日志的生成,采用流处理,对日志中存在的ERROR数据进行统计和告警
日志格式可以自己定义,可以参考正常log4j生成的日志格式。
题目分析如下:
功能要求:实现实时error级别日志的统计,并作出相应处理
框架要求:结合sparkstreaming与kafka
算法设计:模块一:日志生成代码
模块二:日志信息过滤
模块三:error级别日志量统计
方案设计如下:
一.平台搭建:
主要框架:虚拟机(linux系统)+kafka+zookeeper+spark
1. Kafka环境
Kafka安装流程:
1)在官网http://kafka.apache.org/downloads.html 下载kafka
tar -zxvf /home/maicaijian/ kafka_2.11-0.8.2.2.tgz -C /usr/
转到usr目录下: cd /usr
修改文件名:mv kafka_2.11-0.8 kafka
备份属性文件:cp /usr/kafka/config/server.properties
/usr/kafka/config/server.properties.bak
vi /usr/kafka/config/server.properties
Properties主要内容:
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
# The port the socket server listens on
port=9092
# Hostname the broker will bind to. If not set, the server will bind to all interfaces
#host.name=192.168.64.137
# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured. Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
advertised.host.name=192.168.64.137
2. zookeeper安装
su – maicaijian(切换到maicaijian用户)
tar -zxvf zookeeper-3.4.5.tar.gz(解压)
mv zookeeper-3.4.5 zookeeper(重命名文件夹zookeeper-3.4.5为zookeeper)
1、su – root(切换用户到root)
2、vi /etc/profile(修改文件)
3、添加内容:
export ZOOKEEPER_HOME=/home/maicaijian/zookeeper export PATH=$PATH:$ZOOKEEPER_HOME/bin |
4、重新编译文件:
source /etc/profile
5、注意:3台zookeeper都需要修改
6、修改完成后切换回maicaijian用户:
su - maicaijian
3)修改配置文件
1、用maicaijian用户操作
cd zookeeper/conf
cp zoo_sample.cfg zoo.cfg
2、vi zoo.cfg
3、添加内容:
dataDir=/home/maicaijian/zookeeper/data dataLogDir=/home/maicaijian/zookeeper/log server.1=slave1:2888:3888 (主机名, 心跳端口、数据端口) server.2=slave2:2888:3888 server.3=slave3:2888:3888 |
注:在次配置的是zookeeper集群模式,而在调试时,我用的是zookeeper的standalong 模式 |
4、创建文件夹:
cd /home/maicaijian/zookeeper/
mkdir -m 755 data
mkdir -m 755 log
5、在data文件夹下新建myid文件,myid的文件内容为:
cd data
vi myid
添加内容:
1 |
scp -r /home/maicaijian/zookeeper maicaijian@slave2:/home/maicaijian/
scp -r /home/maicaijian/zookeeper maicaijian@slave3:/home/maicaijian/
二.调试环境
1. 启动zookeeper服务:./zkServer.sh start
2. 启动kafka服务:./kafka-server-start.sh /usr/kafka/config/server.properties(启动时要注明属性文件)
3. 创建”log”topic:在服务器是使用./kafka-topic.sh --create --zookeeper master:9092 --replication 1 --partition 1 --topic log
4. 在服务器上使用kafka的producer和consumer命令验证topic“log”是否可用
三.代码编写
1. 模拟日志:日志格式:“时间戳 +error类型 +随机”
如下图:
1)利用kafka的API接口,编写kafka producer的代码:
package spark.kafka;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import java.util.*;
public class KafkaProducerSimple {
public static void main(String[] args) {
String TOPIC = "log";
Properties props = new Properties();
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("metadata.broker.list", "192.168.64.137:9092");
props.put("request.required.acks", "1");
props.put("partitioner.class", "cn.itcast.storm.kafka.MyLogPartitioner");
//
Producer
Random rm=new Random();
for (int messageNo = 1; messageNo < 1000; messageNo++) {
//
producer.send(new KeyedMessage
//producer.send(TOPIC,1,"2018"+Calendar.getInstance().getTimeInMillis()+" "+"error"+rm.nextInt(10)+" "+UUID.randomUUID());
}
}
}
package spark.kafka;
import kafka.producer.Partitioner;
import kafka.utils.VerifiableProperties;
import org.apache.log4j.Logger;
public class MyLogPartitioner implements Partitioner {
private static Logger logger = Logger.getLogger(MyLogPartitioner.class);
public MyLogPartitioner(VerifiableProperties props) {
}
public int partition(Object obj, int numPartitions) {
return Integer.parseInt(obj.toString())%numPartitions;
// return 1;
}
}
2.sparkstreaming的代码
为方便调试:设置了日志的输出级别:
package spark.kafka
import org.apache.log4j.{Logger, Level}
import org.apache.spark.Logging
object LoggerLevels extends Logging {
def setStreamingLogLevels() {
val log4jInitialized = Logger.getRootLogger.getAllAppenders.hasMoreElements
if (!log4jInitialized) {
logInfo("Setting log level to [WARN] for streaming example." +
" To override add a custom log4j.properties to the classpath.")
Logger.getRootLogger.setLevel(Level.WARN)
}
}
}
日志流处理代码:
package spark.kafka
import org.apache.spark.storage.StorageLevel
import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* Created by root on 2016/5/21.
*/
object KafkaWordCount {
val updateFunc = (iter: Iterator[(String, Seq[Int], Option[Int])]) => {
//iter.flatMap(it=>Some(it._2.sum + it._3.getOrElse(0)).map(x=>(it._1,x)))
iter.flatMap { case (x, y, z) => Some(y.sum + z.getOrElse(0)).map(i => (x, i)) }
}
def main(args: Array[String]) {
LoggerLevels.setStreamingLogLevels()
val Array(zkQuorum, group, topics, numThreads) =args
val sparkConf = new SparkConf().setAppName("KafkaWordCount").setMaster("local[2]")
val sc=new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Seconds(1))
ssc.checkpoint("c://ck3")
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val data = KafkaUtils.createStream(ssc,zkQuorum, group, topicMap, StorageLevel.MEMORY_AND_DISK)
//println(data.toString)
val wordsplit=data.map(_._2).flatMap(_.split(" "))
val words = wordsplit.filter(_.startsWith("error"))
val wordCounts = words.map((_, 1)).updateStateByKey(updateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true)
//wordCounts.saveAsTextFiles("C:\\kafka")
//val value=wordCounts.print(1)
val word=wordCounts.map(x=>
{val error_kind=x._1
val value=x._2
error_kind match {
case error1 => (s"number of $error_kind:"+value,"you have to do ..." )
case error2 => (s"number of $error_kind:"+value,"you have to do ..." )
case error3 => (s"number of $error_kind:"+value,"you have to do ..." )
case error4 => (s"number of $error_kind:"+value,"you have to do ..." )
case error5 => (s"number of $error_kind:"+value,"you have to do ..." )
case error6 => (s"number of $error_kind:"+value,"you have to do ..." )
case error7 => (s"number of $error_kind:"+value,"you have to do ..." )
case error8 => (s"number of $error_kind:"+value,"you have to do ..." )
case error9=> (s"number of $error_kind:"+value,"you have to do ..." )
case other=> (s"number of $error_kind:"+value,"you have to do ..." )
}
}
//使用match case 对错误类型进行匹配,作出相应的动作
word.print()
//wordCounts.print()
//words.print()
ssc.start()
ssc.awaitTermination()
}
}
四.运行的效果
在程序中,我设置了每隔一秒的时间,程序就重新统计日志信息
效果如下:
部分运行打印信息:
-------------------------------------------
Time: 1515163384000 ms
-------------------------------------------
(number of error2:498,you have to do ...)
(number of error8:476,you have to do ...)
(number of error6:444,you have to do ...)
(number of error4:453,you have to do ...)
(number of error0:499,you have to do ...)
(number of error9:470,you have to do ...)
(number of error3:443,you have to do ...)
(number of error7:461,you have to do ...)
(number of error5:446,you have to do ...)
(number of error1:419,you have to do ...)
-------------------------------------------
Time: 1515163385000 ms
-------------------------------------------
(number of error2:504,you have to do ...)
(number of error8:487,you have to do ...)
(number of error6:450,you have to do ...)
(number of error4:466,you have to do ...)
(number of error0:506,you have to do ...)
(number of error9:481,you have to do ...)
(number of error3:449,you have to do ...)
(number of error7:467,you have to do ...)
(number of error5:451,you have to do ...)
(number of error1:428,you have to do ...)
-------------------------------------------
Time: 1515163386000 ms
-------------------------------------------
(number of error2:515,you have to do ...)
(number of error8:498,you have to do ...)
(number of error6:467,you have to do ...)
(number of error4:475,you have to do ...)
(number of error0:521,you have to do ...)
(number of error9:497,you have to do ...)
(number of error3:465,you have to do ...)
(number of error7:478,you have to do ...)
(number of error5:463,you have to do ...)
(number of error1:447,you have to do ...)
-------------------------------------------
Time: 1515163387000 ms
-------------------------------------------
(number of error2:526,you have to do ...)
(number of error8:505,you have to do ...)
(number of error6:476,you have to do ...)
(number of error4:486,you have to do ...)
(number of error0:531,you have to do ...)
(number of error9:509,you have to do ...)
(number of error3:471,you have to do ...)
(number of error7:488,you have to do ...)
(number of error5:472,you have to do ...)
(number of error1:460,you have to do ...)
-------------------------------------------
Time: 1515163388000 ms
-------------------------------------------
(number of error2:531,you have to do ...)
(number of error8:515,you have to do ...)
(number of error6:491,you have to do ...)
(number of error4:499,you have to do ...)
(number of error0:540,you have to do ...)
(number of error9:521,you have to do ...)
(number of error3:477,you have to do ...)
(number of error7:496,you have to do ...)
(number of error5:479,you have to do ...)
(number of error1:470,you have to do ...)
-------------------------------------------
Time: 1515163389000 ms
-------------------------------------------
(number of error2:558,you have to do ...)
(number of error8:537,you have to do ...)
(number of error6:511,you have to do ...)
(number of error4:521,you have to do ...)
(number of error0:567,you have to do ...)
(number of error9:550,you have to do ...)
(number of error3:493,you have to do ...)
(number of error7:527,you have to do ...)
(number of error5:502,you have to do ...)
(number of error1:492,you have to do ...)
-------------------------------------------
Time: 1515163390000 ms
-------------------------------------------
(number of error2:589,you have to do ...)
(number of error8:568,you have to do ...)
(number of error6:553,you have to do ...)
(number of error4:557,you have to do ...)
(number of error0:596,you have to do ...)
(number of error9:592,you have to do ...)
(number of error3:531,you have to do ...)
(number of error7:560,you have to do ...)
(number of error5:533,you have to do ...)
(number of error1:515,you have to do ...)
-------------------------------------------
Time: 1515163391000 ms
-------------------------------------------
(number of error2:621,you have to do ...)
(number of error8:620,you have to do ...)
(number of error6:601,you have to do ...)
(number of error4:593,you have to do ...)
(number of error0:642,you have to do ...)
(number of error9:635,you have to do ...)
(number of error3:581,you have to do ...)
(number of error7:610,you have to do ...)
(number of error5:567,you have to do ...)
(number of error1:568,you have to do ...)
-------------------------------------------
Time: 1515163392000 ms
-------------------------------------------
(number of error2:649,you have to do ...)
(number of error8:649,you have to do ...)
(number of error6:628,you have to do ...)
(number of error4:622,you have to do ...)
(number of error0:673,you have to do ...)
(number of error9:663,you have to do ...)
(number of error3:610,you have to do ...)
(number of error7:635,you have to do ...)
(number of error5:602,you have to do ...)
(number of error1:595,you have to do ...)
-------------------------------------------
Time: 1515163393000 ms
-------------------------------------------
(number of error2:673,you have to do ...)
(number of error8:675,you have to do ...)
(number of error6:651,you have to do ...)
(number of error4:636,you have to do ...)
(number of error0:694,you have to do ...)
(number of error9:691,you have to do ...)
(number of error3:633,you have to do ...)
(number of error7:658,you have to do ...)
(number of error5:627,you have to do ...)
(number of error1:610,you have to do ...)
-------------------------------------------
Time: 1515163394000 ms
-------------------------------------------
(number of error2:723,you have to do ...)
(number of error8:719,you have to do ...)
(number of error6:697,you have to do ...)
(number of error4:680,you have to do ...)
(number of error0:729,you have to do ...)
(number of error9:718,you have to do ...)
(number of error3:672,you have to do ...)
(number of error7:707,you have to do ...)
(number of error5:663,you have to do ...)
(number of error1:650,you have to do ...)
-------------------------------------------
Time: 1515163395000 ms
-------------------------------------------
(number of error2:734,you have to do ...)
(number of error8:731,you have to do ...)
(number of error6:704,you have to do ...)
(number of error4:692,you have to do ...)
(number of error0:734,you have to do ...)
(number of error9:729,you have to do ...)
(number of error3:677,you have to do ...)
(number of error7:715,you have to do ...)
(number of error5:676,you have to do ...)
(number of error1:662,you have to do ...)
-------------------------------------------
Time: 1515163396000 ms
-------------------------------------------
(number of error2:744,you have to do ...)
(number of error8:746,you have to do ...)
(number of error6:714,you have to do ...)
(number of error4:705,you have to do ...)
(number of error0:747,you have to do ...)
(number of error9:748,you have to do ...)
(number of error3:690,you have to do ...)
(number of error7:732,you have to do ...)
(number of error5:692,you have to do ...)
(number of error1:673,you have to do ...)
-------------------------------------------
Time: 1515163397000 ms
-------------------------------------------
(number of error2:767,you have to do ...)
(number of error8:772,you have to do ...)
(number of error6:742,you have to do ...)
(number of error4:725,you have to do ...)
(number of error0:776,you have to do ...)
(number of error9:771,you have to do ...)
(number of error3:714,you have to do ...)
(number of error7:761,you have to do ...)
(number of error5:715,you have to do ...)
(number of error1:700,you have to do ...)
-------------------------------------------
Time: 1515163398000 ms
-------------------------------------------
(number of error2:802,you have to do ...)
(number of error8:804,you have to do ...)
(number of error6:769,you have to do ...)
(number of error4:755,you have to do ...)
(number of error0:808,you have to do ...)
(number of error9:805,you have to do ...)
(number of error3:742,you have to do ...)
(number of error7:802,you have to do ...)
(number of error5:750,you have to do ...)
(number of error1:729,you have to do ...)
-------------------------------------------
Time: 1515163399000 ms
-------------------------------------------
(number of error2:830,you have to do ...)
(number of error8:829,you have to do ...)
(number of error6:795,you have to do ...)
(number of error4:775,you have to do ...)
(number of error0:835,you have to do ...)
(number of error9:833,you have to do ...)
(number of error3:765,you have to do ...)
(number of error7:833,you have to do ...)
(number of error5:785,you have to do ...)
(number of error1:764,you have to do ...)
-------------------------------------------
Time: 1515163400000 ms
-------------------------------------------
(number of error2:877,you have to do ...)
(number of error8:862,you have to do ...)
(number of error6:835,you have to do ...)
(number of error4:829,you have to do ...)
(number of error0:868,you have to do ...)
(number of error9:867,you have to do ...)
(number of error3:806,you have to do ...)
(number of error7:886,you have to do ...)
(number of error5:827,you have to do ...)
(number of error1:796,you have to do ...)
本次学习主要路径:spark官网+kafka官网+spark视频教程
主要遇到的问题如下:1.failed to ...not find input
原因:sparkRDD需要action才开始计算
解决方法:程序加入action动作,通wordscount.print()
2. zookeeper连接问题:出现no broker found的情况
解决方法:配置zoo.cfg,配置address.host.name 变量,使其与broker中的一致
注:大家如果有问题,可以私信我