Spring Boot Web
工程https://start.spring.io/ 勾选
Lombok
、Spring Web
、Spring for Apache Kafka
。
POM
文件在原POM基础上添加JSON工具
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>gmallartifactId>
<groupId>com.simworgroupId>
<version>0.0.1-SNAPSHOTversion>
parent>
<modelVersion>4.0.0modelVersion>
<artifactId>loggerartifactId>
<properties>
<maven.compiler.source>8maven.compiler.source>
<maven.compiler.target>8maven.compiler.target>
properties>
<dependencies>
<dependency>
<groupId>com.alibabagroupId>
<artifactId>fastjsonartifactId>
dependency>
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-webartifactId>
dependency>
<dependency>
<groupId>org.springframework.kafkagroupId>
<artifactId>spring-kafkaartifactId>
dependency>
<dependency>
<groupId>org.projectlombokgroupId>
<artifactId>lombokartifactId>
<optional>trueoptional>
dependency>
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-testartifactId>
<scope>testscope>
<exclusions>
<exclusion>
<groupId>org.junit.vintagegroupId>
<artifactId>junit-vintage-engineartifactId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>org.springframework.kafkagroupId>
<artifactId>spring-kafka-testartifactId>
<scope>testscope>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-maven-pluginartifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombokgroupId>
<artifactId>lombokartifactId>
exclude>
excludes>
configuration>
plugin>
plugins>
build>
project>
- 将日志分流发送至Kafka 2. 将日志落盘
package com.simwor.gmall.controller;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
@RestController
@Slf4j
public class LoggerController {
@Autowired
private KafkaTemplate<String,String> kafkaTemplate;
@RequestMapping("/applog")
public String appLog(@RequestBody String applog) {
JSONObject jsonObject = JSON.parseObject(applog);
if(jsonObject.getString("start") != null && jsonObject.getString("start").length() > 0)
kafkaTemplate.send("gmall-start-log", applog);
else
kafkaTemplate.send("gmall-event-log", applog);
log.info(applog);
return applog;
}
}
logback.xml
<configuration>
<property name="LOG_HOME" value="/opt/applog/logs" />
<appender name="console" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%msg%npattern>
encoder>
appender>
<appender name="rollingFile" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${LOG_HOME}/app.logfile>
<rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
<fileNamePattern>${LOG_HOME}/app.%d{yyyy-MM-dd}.logfileNamePattern>
rollingPolicy>
<encoder>
<pattern>%msg%npattern>
encoder>
appender>
<logger name="com.simwor.gmall.controller.LoggerController"
level="INFO" additivity="false">
<appender-ref ref="rollingFile" />
<appender-ref ref="console" />
logger>
<root level="error" additivity="false">
<appender-ref ref="console" />
root>
configuration>
application.properties
#============== kafka ===================
# 指定kafka 代理地址,可以多个
spring.kafka.bootstrap-servers=simwor01:9092,simwor02:9092,simwor03:9092
# 指定消息key和消息体的编解码方式
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
日志生成器模拟
gmall-start-log
和gmall-event-log
的格式对 日志采集器 不断发出请求。
[omm@simwor01 mock-log]$ ll
-rw-r--r--. 1 omm omm 610 Jun 16 10:16 application.properties
-rw-r--r--. 1 omm omm 11114569 Jun 13 2020 gmall2020-mock-log-2020-05-10.jar
-rw-r--r--. 1 omm omm 3211 Jun 16 10:17 logback.xml
-rw-r--r--. 1 omm omm 493 Mar 19 2020 path.json
[omm@simwor01 mock-log]$ java -jar gmall2020-mock-log-2020-05-10.jar
...
{"common":{"ar":"110000","ba":"Xiaomi","ch":"web","md":"Xiaomi 9","mid":"mid_35","os":"Android 9.0","uid":"60","vc":"v2.1.134"},"start":{"entry":"notice","loading_time":9558,"open_ad_id":19,"open_ad_ms":8081,"open_ad_skip_ms":0},"ts":1623810190000}
{"common":{"ar":"110000","ba":"Xiaomi","ch":"web","md":"Xiaomi 9","mid":"mid_35","os":"Android 9.0","uid":"60","vc":"v2.1.134"},"displays":[{"display_type":"activity","item":"2","item_type":"activity_id","order":1},{"display_type":"query","item":"9","item_type":"sku_id","order":2},{"display_type":"query","item":"10","item_type":"sku_id","order":3},{"display_type":"query","item":"5","item_type":"sku_id","order":4},{"display_type":"query","item":"7","item_type":"sku_id","order":5},{"display_type":"query","item":"1","item_type":"sku_id","order":6},{"display_type":"query","item":"8","item_type":"sku_id","order":7},{"display_type":"promotion","item":"8","item_type":"sku_id","order":8},{"display_type":"query","item":"3","item_type":"sku_id","order":9},{"display_type":"promotion","item":"2","item_type":"sku_id","order":10}],"page":{"during_time":18544,"page_id":"home"},"ts":1623810199558}
...
可以配置生成日志的日期以及发起请求的地址。
[omm@simwor01 mock-log]$ head application.properties
#业务日期
mock.date=2021-06-16
#模拟数据发送模式
mock.type=http
#http模式下,发送的地址
mock.url=http://localhost:8080/applog
[omm@simwor01 mock-log]$
日志分发器指由Nginx将
日志生成器
的请求均匀地分发至多个后端日志采集器
。
[root@simwor01 conf.d]# pwd
/etc/nginx/conf.d
[root@simwor01 conf.d]# cat applog.conf
upstream applog {
server simwor01:8080;
server simwor02:8080;
server simwor03:8080;
}
server {
listen 80;
server_name localhost;
location / {
proxy_pass http://applog;
}
}
[root@simwor01 conf.d]#
[omm@simwor01 mock-log]$ head application.properties
#业务日期
mock.date=2021-06-16
#模拟数据发送模式
mock.type=http
#http模式下,发送的地址
mock.url=http://localhost/applog
[omm@simwor01 mock-log]$
#!/bin/bash
JAVA_BIN=/opt/module/jdk/bin/java
PROJECT=/opt/applog/logger
APPNAME=logger-0.0.1-SNAPSHOT.jar
case $1 in
"start")
{
for i in simwor01 simwor02 simwor03
do
echo "========: $i==============="
ssh $i "$JAVA_BIN -Xms32m -Xmx64m -jar $PROJECT/$APPNAME >/dev/null 2>&1 &"
done
echo "========NGINX==============="
sudo systemctl start nginx
};;
"stop")
{
echo "======== NGINX==============="
sudo systemctl stop nginx
for i in simwor01 simwor02 simwor03
do
echo "========: $i==============="
ssh $i "ps -ef|grep $APPNAME |grep -v grep|awk '{print \$2}'|xargs kill" >/dev/null 2>&1
done
};;
esac
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>gmallartifactId>
<groupId>com.simworgroupId>
<version>0.0.1-SNAPSHOTversion>
parent>
<modelVersion>4.0.0modelVersion>
<artifactId>realtimeartifactId>
<properties>
<spark.version>2.4.0spark.version>
<scala.version>2.11.8scala.version>
<kafka.version>1.0.0kafka.version>
<project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8project.reporting.outputEncoding>
<java.version>1.8java.version>
properties>
<dependencies>
<dependency>
<groupId>com.alibabagroupId>
<artifactId>fastjsonartifactId>
<version>1.2.56version>
dependency>
<dependency>
<groupId>org.elasticsearchgroupId>
<artifactId>elasticsearchartifactId>
<version>2.4.6version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-core_2.11artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-streaming_2.11artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>org.apache.kafkagroupId>
<artifactId>kafka-clientsartifactId>
<version>${kafka.version}version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-streaming-kafka-0-10_2.11artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>redis.clientsgroupId>
<artifactId>jedisartifactId>
<version>2.9.0version>
dependency>
<dependency>
<groupId>org.apache.phoenixgroupId>
<artifactId>phoenix-sparkartifactId>
<version>4.14.2-HBase-1.3version>
dependency>
<dependency>
<groupId>org.apache.sparkgroupId>
<artifactId>spark-sql_2.11artifactId>
<version>${spark.version}version>
dependency>
<dependency>
<groupId>io.searchboxgroupId>
<artifactId>jestartifactId>
<version>5.3.3version>
<exclusions>
<exclusion>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-apiartifactId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>net.java.dev.jnagroupId>
<artifactId>jnaartifactId>
<version>4.5.2version>
dependency>
<dependency>
<groupId>org.codehaus.janinogroupId>
<artifactId>commons-compilerartifactId>
<version>2.7.8version>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<groupId>net.alchim31.mavengroupId>
<artifactId>scala-maven-pluginartifactId>
<version>3.4.6version>
<executions>
<execution>
<goals>
<goal>compilegoal>
<goal>testCompilegoal>
goals>
execution>
executions>
plugin>
<plugin>
<groupId>org.apache.maven.pluginsgroupId>
<artifactId>maven-assembly-pluginartifactId>
<version>3.0.0version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependenciesdescriptorRef>
descriptorRefs>
configuration>
<executions>
<execution>
<id>make-assemblyid>
<phase>packagephase>
<goals>
<goal>singlegoal>
goals>
execution>
executions>
plugin>
plugins>
build>
project>
config.properties
# Kafka配置
kafka.broker.list=simwor01:9092,simwor02:9092,simwor03:9092
# Redis配置
redis.host=simwor01
redis.port=6379
package com.simwor.realtime.util
import java.io.InputStreamReader
import java.util.Properties
object PropertiesUtil {
def main(args: Array[String]): Unit = {
val properties: Properties = PropertiesUtil.load("config.properties")
println(properties.getProperty("kafka.broker.list"))
}
def load(propertieName:String): Properties ={
val prop=new Properties();
prop.load(new InputStreamReader(Thread.currentThread().getContextClassLoader.getResourceAsStream(propertieName) , "UTF-8"))
prop
}
}
package com.simwor.realtime.util
import java.util.Properties
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.TopicPartition
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
object MyKafkaUtil {
private val properties: Properties = PropertiesUtil.load("config.properties")
val broker_list = properties.getProperty("kafka.broker.list")
// kafka消费者配置
var kafkaParam = collection.mutable.Map(
"bootstrap.servers" -> broker_list,//用于初始化链接到集群的地址
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
//用于标识这个消费者属于哪个消费团体
"group.id" -> "gmall_consumer_group",
//如果没有初始化偏移量或者当前的偏移量不存在任何服务器上,可以使用这个配置属性
//可以使用这个配置,latest自动重置偏移量为最新的偏移量
"auto.offset.reset" -> "latest",
//如果是true,则这个消费者的偏移量会在后台自动提交,但是kafka宕机容易丢失数据
//如果是false,会需要手动维护kafka偏移量
"enable.auto.commit" -> (true: java.lang.Boolean)
)
// 创建DStream,返回接收到的输入数据
// LocationStrategies:根据给定的主题和集群地址创建consumer
// LocationStrategies.PreferConsistent:持续的在所有Executor之间分配分区
// ConsumerStrategies:选择如何在Driver和Executor上创建和配置Kafka Consumer
// ConsumerStrategies.Subscribe:订阅一系列主题
def getKafkaStream(topic: String,ssc:StreamingContext ): InputDStream[ConsumerRecord[String,String]]={
val dStream = KafkaUtils.createDirectStream[String,String](ssc, LocationStrategies.PreferConsistent,ConsumerStrategies.Subscribe[String,String](Array(topic),kafkaParam ))
dStream
}
def getKafkaStream(topic: String,ssc:StreamingContext,groupId:String): InputDStream[ConsumerRecord[String,String]]={
kafkaParam("group.id")=groupId
val dStream = KafkaUtils.createDirectStream[String,String](ssc, LocationStrategies.PreferConsistent,ConsumerStrategies.Subscribe[String,String](Array(topic),kafkaParam ))
dStream
}
def getKafkaStream(topic: String,ssc:StreamingContext,offsets:Map[TopicPartition,Long],groupId:String): InputDStream[ConsumerRecord[String,String]]={
kafkaParam("group.id")=groupId
val dStream = KafkaUtils.createDirectStream[String,String](ssc, LocationStrategies.PreferConsistent,ConsumerStrategies.Subscribe[String,String](Array(topic),kafkaParam,offsets))
dStream
}
}
package com.simwor.realtime.app
import com.alibaba.fastjson.{JSON, JSONObject}
import com.simwor.realtime.bean.DauInfo
import com.simwor.realtime.util.{MyEsUtil, MyKafkaUtil, RedisUtil}
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import java.text.SimpleDateFormat
import java.util.Date
import scala.collection.mutable.ListBuffer
object DauApp {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName("dau_app").setMaster("local[4]")
val ssc = new StreamingContext(sparkConf, Seconds(5))
// 消费Kafka启动日志
val recordInputStream: InputDStream[ConsumerRecord[String, String]] = MyKafkaUtil.getKafkaStream("gmall-start-log", ssc)
val jsonObjectDataStream = recordInputStream.map(record => {
val jsonString = record.value()
val jsonObject = JSON.parseObject(jsonString)
val timestamp = jsonObject.getLong("ts")
val simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH")
val dateHourString = simpleDateFormat.format(new Date(timestamp))
val dateHour = dateHourString.split(" ")
jsonObject.put("dt", dateHour(0))
jsonObject.put("hr", dateHour(1))
jsonObject
})
// Redis日志去重,计算日活
//...
//ElasticSearch 最终存储
...
ssc.start()
ssc.awaitTermination()
}
}
package com.simwor.realtime.util
import redis.clients.jedis.{Jedis, JedisPool, JedisPoolConfig}
object RedisUtil {
var jedisPool:JedisPool=null
def getJedisClient: Jedis = {
if(jedisPool==null){
// println("开辟一个连接池")
val config = PropertiesUtil.load("config.properties")
val host = config.getProperty("redis.host")
val port = config.getProperty("redis.port")
val jedisPoolConfig = new JedisPoolConfig()
jedisPoolConfig.setMaxTotal(100) //最大连接数
jedisPoolConfig.setMaxIdle(20) //最大空闲
jedisPoolConfig.setMinIdle(20) //最小空闲
jedisPoolConfig.setBlockWhenExhausted(true) //忙碌时是否等待
jedisPoolConfig.setMaxWaitMillis(500)//忙碌时等待时长 毫秒
jedisPoolConfig.setTestOnBorrow(true) //每次获得连接的进行测试
jedisPool=new JedisPool(jedisPoolConfig,host,port.toInt)
}
// println(s"jedisPool.getNumActive = ${jedisPool.getNumActive}")
// println("获得一个连接")
jedisPool.getResource
}
}
// Redis日志去重,计算日活
val filteredDStream: DStream[JSONObject] = jsonObjectDataStream.mapPartitions { jsonObjItr =>
val originalList = jsonObjItr.toList
val filteredList = new ListBuffer[JSONObject]()
val jedisClient = RedisUtil.getJedisClient
println("Before Filter : " + originalList.size)
for(jsonObj <- originalList) {
val dt = jsonObj.getString("dt")
val mid = jsonObj.getJSONObject("common").getString("mid")
val dauKey = "dau:" + dt
val exists = jedisClient.sadd(dauKey, mid)
jedisClient.expire(dauKey, 3600*24)
if (exists == 1L)
filteredList += jsonObj
}
println("After Filter : " + filteredList.size)
jedisClient.close()
filteredList.toIterator
}
PUT _template/gmall_dau_info_template
{
"index_patterns": ["gmall_dau_info*"],
"settings": {
"number_of_shards": 3
},
"aliases" : {
"{index}-query": {},
"gmall_dau_info-query":{}
},
"mappings": {
"properties":{
"mid":{
"type":"keyword"
},
"uid":{
"type":"keyword"
},
"ar":{
"type":"keyword"
},
"ch":{
"type":"keyword"
},
"vc":{
"type":"keyword"
},
"dt":{
"type":"keyword"
},
"hr":{
"type":"keyword"
},
"mi":{
"type":"keyword"
},
"ts":{
"type":"date"
}
}
}
}
package com.simwor.realtime.bean
case class DauInfo(
mid:String,
uid:String,
ar:String,
ch:String,
vc:String,
var dt:String,
var hr:String,
var mi:String,
ts:Long)
package com.simwor.realtime.util
import io.searchbox.client.config.HttpClientConfig
import io.searchbox.client.{JestClient, JestClientFactory}
import io.searchbox.core.{Bulk, Index, Search}
import org.elasticsearch.index.query.{BoolQueryBuilder, MatchQueryBuilder}
import org.elasticsearch.search.builder.SearchSourceBuilder
object MyEsUtil {
def bulkDoc(sourceList: List[Any], indexName: String): Unit = {
val jestClient = getClient
val bulkBuilder = new Bulk.Builder
for(source <- sourceList) {
val index = new Index.Builder(source).index(indexName).`type`("_doc").build()
bulkBuilder.addAction(index)
}
jestClient.execute(bulkBuilder.build())
jestClient.close()
}
/* ElasticSearch Connection Factory */
def getClient:JestClient ={
if(factory==null) build();
factory.getObject
}
def build(): Unit ={
factory = new JestClientFactory
factory.setHttpClientConfig(new HttpClientConfig.Builder("http://simwor01:9200")
.multiThreaded(true)
.maxTotalConnection(20)
.connTimeout(10000).readTimeout(1000).build())
}
private var factory: JestClientFactory = null;
}
//ElasticSearch 最终存储
filteredDStream.foreachRDD { rdd =>
rdd.foreachPartition { jsonItr =>
val list = jsonItr.toList
val dt = new SimpleDateFormat("yyyy-MM-dd").format(new Date())
val dauList = list.map { startupJsonObj =>
val dtHr: String = new SimpleDateFormat("yyyy-MM-dd HH:mm").format(new Date(startupJsonObj.getLong("ts")))
val dtHrArr: Array[String] = dtHr.split(" ")
val dt = dtHrArr(0)
val timeArr = dtHrArr(1).split(":")
val hr = timeArr(0)
val mi = timeArr(1)
val commonJSONObj: JSONObject = startupJsonObj.getJSONObject("common")
DauInfo(commonJSONObj.getString("mid"),
commonJSONObj.getString("uid"),
commonJSONObj.getString("mid"),
commonJSONObj.getString("ch"),
commonJSONObj.getString("vc"),
dt, hr, mi,
startupJsonObj.getLong("ts"))
}
MyEsUtil.bulkDoc(dauList, "gmall_dau_info_" + dt)
}
}
Kafka 支持事务性提交但不支持事务性消费,ES支持幂等性提交但不支持事务。
通过手工保存Kafka偏移量 + ES幂等性提交,即可达成
精准一次性消费
。
OffsetManager
package com.simwor.realtime.util
import org.apache.kafka.common.TopicPartition
import org.apache.spark.streaming.kafka010.OffsetRange
import java.util
object OffsetManager {
// 获取偏移量
def getOffset(topicName: String, groupId: String): Map[TopicPartition, Long] = {
// Redis
// type -> hash
// key -> offset:[topic]:[groupid]
// field -> partition_id
// value -> offset
val jedisClient = RedisUtil.getJedisClient
val offsetMap: util.Map[String, String] = jedisClient.hgetAll("offset:" + topicName + ":" + groupId)
import scala.collection.JavaConversions._
val kafkaOffsetMapMap: Map[TopicPartition, Long] = offsetMap.map { case (partitionId, offset) =>
(new TopicPartition(topicName, partitionId.toInt), offset.toLong)
}.toMap
jedisClient.close()
kafkaOffsetMapMap
}
//写入偏移量
def saveOffset(topicName: String, groupId: String, offsetRanges: Array[OffsetRange]): Unit = {
val jedisClient = RedisUtil.getJedisClient
val offsetMap: util.Map[String, String] = new util.HashMap()
for(offset <- offsetRanges) {
val partition: Int = offset.partition
val untilOffset: Long = offset.untilOffset
offsetMap.put(partition.toString, untilOffset.toString)
println("partition := " + partition + " -- " + offset.fromOffset + " --> " + untilOffset)
}
if(offsetMap != null && offsetMap.size() > 0)
jedisClient.hmset("offset:" + topicName + ":" + groupId, offsetMap)
jedisClient.close()
}
}
DauApp
package com.simwor.realtime.app
import com.alibaba.fastjson.{JSON, JSONObject}
import com.simwor.realtime.bean.DauInfo
import com.simwor.realtime.util.{MyEsUtil, MyKafkaUtil, OffsetManager, RedisUtil}
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{HasOffsetRanges, OffsetRange}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import java.text.SimpleDateFormat
import java.util.Date
import scala.collection.mutable.ListBuffer
object DauApp {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName("dau_app").setMaster("local[4]")
val ssc = new StreamingContext(sparkConf, Seconds(5))
// ***************** 读取Kafka偏移量
val topicName = "gmall-start-log"
val groupId = "gmall-start-group"
val kafkaOffsetMap = OffsetManager.getOffset(topicName, groupId)
var recordInputStream: InputDStream[ConsumerRecord[String, String]] = null
if(kafkaOffsetMap != null && kafkaOffsetMap.size > 0)
recordInputStream = MyKafkaUtil.getKafkaStream("gmall-start-log", ssc, kafkaOffsetMap, groupId)
else
recordInputStream = MyKafkaUtil.getKafkaStream("gmall-start-log", ssc)
// ***************** 获得偏移结束点
var offsetRanges: Array[OffsetRange] = Array.empty[OffsetRange]
val startupInputGetOffsetDstream: DStream[ConsumerRecord[String, String]] = recordInputStream.transform { rdd =>
offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
}
...
//ElasticSearch 最终存储
filteredDStream.foreachRDD { rdd =>
...
// ***************** 提交Kafka偏移量
OffsetManager.saveOffset(topicName, groupId, offsetRanges)
}
ssc.start()
ssc.awaitTermination()
}
}
MyEsUtil
def bulkDoc(sourceList: List[(String, DauInfo)], indexName: String): Unit = {
val jestClient = getClient
val bulkBuilder = new Bulk.Builder
for((id, source) <- sourceList) {
// ************ 指定ID,重复出现时只更新不新建
val index = new Index.Builder(source).index(indexName).`type`("_doc").id(id).build()
bulkBuilder.addAction(index)
}
jestClient.execute(bulkBuilder.build())
jestClient.close()
}
DauApp
//ElasticSearch 最终存储
filteredDStream.foreachRDD { rdd =>
rdd.foreachPartition { jsonItr =>
val list = jsonItr.toList
val dt = new SimpleDateFormat("yyyy-MM-dd").format(new Date())
val dauList: List[(String, DauInfo)] = list.map { startupJsonObj =>
val dtHr: String = new SimpleDateFormat("yyyy-MM-dd HH:mm").format(new Date(startupJsonObj.getLong("ts")))
val dtHrArr: Array[String] = dtHr.split(" ")
val dt = dtHrArr(0)
val timeArr = dtHrArr(1).split(":")
val hr = timeArr(0)
val mi = timeArr(1)
val commonJSONObj: JSONObject = startupJsonObj.getJSONObject("common")
val dauInfo = DauInfo(commonJSONObj.getString("mid"),
commonJSONObj.getString("uid"),
commonJSONObj.getString("mid"),
commonJSONObj.getString("ch"),
commonJSONObj.getString("vc"),
dt, hr, mi,
startupJsonObj.getLong("ts"))
// **************** 返回值必须加上文档的id,这里使用mid
(dauInfo.mid, dauInfo)
}
MyEsUtil.bulkDoc(dauList, "gmall_dau_info_" + dt)
}
Stack Management
-> Index Patterns
-> Create Index Pattern
Visualize
Create new visualize
-> New Vertical Bar / Choose a source
-> gmall_dau_info_2021*
Refresh
Update
-> Save
Dashboard
-> Create new dashboard
-> Add
Share
-> Embed Code
-> Saved Object
<html>
<head>
<meta charset="utf-8">
<title>Simwortitle>
head>
<body>
<h1>Daily Active Usersh1>
<iframe src="http://simwor01:5601/app/kibana#/dashboard/39adc0a0-d4f0-11eb-8ddb-af39ee8ef270?embed=true&_g=(filters%3A!()%2CrefreshInterval%3A(pause%3A!t%2Cvalue%3A0)%2Ctime%3A(from%3Anow%2Fw%2Cto%3Anow%2Fw))" height="600" width="800">iframe>
body>
html>
接口 | 路径 | 返回结果 |
---|---|---|
总数 | http://publisher:8070/realtime-total?date=2019-02-01 | [{“id”:“dau”,“name”:“新增日活”,“value”:1200},{“id”:“new_mid”,“name”:“新增设备”,“value”:233} ] |
分时统计 | http://publisher:8070/realtime-hour?id=dau&date=2019-02-01 | {“yesterday”:{“11”:383,“12”:123,“17”:88,“19”:200 }, “today”:{“12”:38,“13”:1233,“17”:123,“19”:688 }} |
Spring 版本POM中调成
2.1.15.RELEASE
,添加一些其它工具包。
<dependency>
<groupId>org.apache.commonsgroupId>
<artifactId>commons-lang3artifactId>
<version>3.10version>
dependency>
<dependency>
<groupId>com.google.guavagroupId>
<artifactId>guavaartifactId>
<version>29.0-jreversion>
dependency>
<dependency>
<groupId>com.alibabagroupId>
<artifactId>fastjsonartifactId>
<version>1.2.68version>
dependency>
<dependency>
<groupId>io.searchboxgroupId>
<artifactId>jestartifactId>
<version>5.3.3version>
<exclusions>
<exclusion>
<groupId>org.slf4jgroupId>
<artifactId>slf4j-apiartifactId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>net.java.dev.jnagroupId>
<artifactId>jnaartifactId>
<version>4.5.2version>
dependency>
<dependency>
<groupId>org.codehaus.janinogroupId>
<artifactId>commons-compilerartifactId>
<version>2.7.8version>
dependency>
<dependency>
<groupId>org.elasticsearchgroupId>
<artifactId>elasticsearchartifactId>
<version>2.4.6version>
dependency>
application.properties
spring.elasticsearch.jest.uris=http://simwor01:9200,http://simwor02:9200,http://simwor03:9200
server.port=8070
package com.simwor.publisher.service;
import java.util.Map;
public interface EsService {
public Long getDauTotal(String date);
public Map getDauHour(String data);
}
package com.simwor.publisher.service.impl;
import com.google.gson.JsonElement;
import com.google.gson.JsonObject;
import com.simwor.publisher.service.EsService;
import io.searchbox.client.JestClient;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import io.searchbox.core.search.aggregation.TermsAggregation;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.TermsBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@Service
public class EsServiceImpl implements EsService {
@Autowired
JestClient jestClient;
@Override
public Long getDauTotal(String date) {
Long totalResult = 0L;
String indexName = "gmall_dau_info_" + date + "-query";
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(new MatchAllQueryBuilder());
Search search = new Search.Builder(searchSourceBuilder.toString())
.addIndex(indexName)
.addType("_doc")
.build();
try {
SearchResult searchResult = jestClient.execute(search);
JsonObject jsonObject = searchResult.getJsonObject();
JsonElement jsonElement = jsonObject.get("hits").getAsJsonObject().get("total").getAsJsonObject().get("value");
totalResult = jsonElement.getAsLong();
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("ElasticSearch 查询异常");
}
return totalResult;
}
@Override
public Map getDauHour(String date) {
Map<String, Long> results = new HashMap<>();
String indexName = "gmall_dau_info_" + date + "-query";
//构造查询语句
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
TermsBuilder termsBuilder = AggregationBuilders.terms("groupby_hr").field("hr").size(24);
searchSourceBuilder.aggregation(termsBuilder);
Search search = new Search.Builder(searchSourceBuilder.toString())
.addIndex(indexName)
.addType("_doc")
.build();
try {
//执行并封装返回结果
SearchResult searchResult = jestClient.execute(search);
List<TermsAggregation.Entry> buckets = searchResult.getAggregations().getTermsAggregation("groupby_hr").getBuckets();
for(TermsAggregation.Entry bucket : buckets)
results.put(bucket.getKey(), bucket.getCount());
} catch (IOException e) {
e.printStackTrace();
}
return results;
}
}
package com.simwor.publisher.controller;
import com.alibaba.fastjson.JSON;
import com.simwor.publisher.service.EsService;
import org.apache.commons.lang3.time.DateUtils;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.*;
@RestController
public class PublisherController {
@Autowired
private EsService esService;
@GetMapping("realtime-total")
public String realtimeTotal(@RequestParam("date") String dt) {
List<Map<String, Object>> resultList = new ArrayList<>();
Map<String, Object> dauMap = new HashMap<>();
dauMap.put("id", "dau");
dauMap.put("name", "新增日活");
dauMap.put("value", esService.getDauTotal(dt));
resultList.add(dauMap);
Map<String, Object> midMap = new HashMap<>();
midMap.put("id", "new_mid");
midMap.put("name", "新增设备");
midMap.put("value", 233);
resultList.add(midMap);
return JSON.toJSONString(resultList);
}
@GetMapping("realtime-hour")
public String realTimeHour(@RequestParam("id") String id,
@RequestParam("date") String dt) {
Map<String, Map<String, Long>> resultMap = new HashMap<>();
Map dauHourToday = esService.getDauHour(dt);
Map dauHourYesterday = esService.getDauHour(getYesterday(dt));
resultMap.put("today", dauHourToday);
resultMap.put("yesterday", dauHourYesterday);
return JSON.toJSONString(resultMap);
}
private String getYesterday(String today) {
SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd");
String yesterday = "";
try {
Date todayDate = simpleDateFormat.parse(today);
Date yesterdayDate = DateUtils.addDays(todayDate, -1);
yesterday = simpleDateFormat.format(yesterdayDate);
} catch (ParseException e) {
e.printStackTrace();
}
return yesterday;
}
}
GET gmall_dau_info_2021-06-22-query/_search
{
"aggs": {
"groupby_hr": {
"terms": {
"field": "hr",
"size": 24
}
}
}
}
"aggregations" : {
"groupby_hr" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "21",
"doc_count" : 50
}
]
}
}
本章介绍两款 MySQL 数据变化实时监控工具:Canal 和 Maxwell。
Canal 通过模拟 MySQL 的主从复制
备机的行为
来实时监控数据变化
。
mysql> create database gmall_db;
mysql> use gmall_db;
mysql> source /opt/appdb/gmall_db.sql
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%' IDENTIFIED BY 'ABcd12#$..';
mysql>
[omm@simwor01 ~]$ sudo vi /etc/my.cnf
[omm@simwor01 ~]$ tail -4 /etc/my.cnf
server-id= 1
log-bin=mysql-bin
binlog_format=row
binlog-do-db=gmall_db
[omm@simwor01 ~]$ sudo systemctl restart mysqld
[omm@simwor01 mysql]$ pwd
/var/lib/mysql
[omm@simwor01 mysql]$ ll mysql-bin*
-rwxr-xr-x. 1 mysql mysql 154 Jun 29 11:11 mysql-bin.000001
-rwxr-xr-x. 1 mysql mysql 19 Jun 29 11:11 mysql-bin.index
[omm@simwor01 mysql]$
[omm@simwor01 appdb]$ java -jar gmall2020-mock-db-2020-05-18.jar
--------开始生成数据--------
--------开始生成用户数据--------
共有10名用户发生变更
共生成0名用户
--------开始生成收藏数据--------
共生成收藏100条
--------开始生成购物车数据--------
共生成购物车274条
--------开始生成订单数据--------
共优惠券200张
共生成订单14条
共有9订单参与活动条
--------开始生成支付数据--------
状态更新14个订单
共有8订单完成支付
--------开始生成退单数据--------
状态更新8个订单
共生成退款2条
--------开始生成评价数据--------
共生成评价8条
[omm@simwor01 appdb]$
[omm@simwor01 mysql]$ ll mysql-bin*
-rwxr-xr-x. 1 mysql mysql 220806 Jun 29 11:16 mysql-bin.000001
-rwxr-xr-x. 1 mysql mysql 19 Jun 29 11:11 mysql-bin.index
[omm@simwor01 mysql]$
一个 Canal Server 可以监控多个 MySQL。
[omm@simwor01 soft]$ pwd
/opt/soft
[omm@simwor01 soft]$ mkdir /opt/module/canal
[omm@simwor01 soft]$ tar -zxf canal.deployer-1.1.4.tar.gz -C /opt/module/canal
[omm@simwor01 soft]$ ll /opt/module/canal
total 4
drwxrwxr-x. 2 omm omm 76 Jun 29 11:22 bin
drwxrwxr-x. 5 omm omm 123 Jun 29 11:22 conf
drwxrwxr-x. 2 omm omm 4096 Jun 29 11:22 lib
drwxrwxr-x. 2 omm omm 6 Sep 2 2019 logs
[omm@simwor01 soft]$
[omm@simwor01 conf]$ vi canal.properties
[omm@simwor01 conf]$ grep canal.mq.servers canal.properties
canal.mq.servers = simwor01:9092,simwor02:9092,simwor03:9092
[omm@simwor01 conf]$ grep serverMode canal.properties
canal.serverMode = kafka
[omm@simwor01 conf]$
[omm@simwor01 example]$ pwd
/opt/module/canal/conf/example
[omm@simwor01 example]$ vi instance.properties
[omm@simwor01 example]$ grep canal.instance.master.address instance.properties
canal.instance.master.address=simwor01:3306
[omm@simwor01 example]$ grep canal.instance.db instance.properties
canal.instance.dbUsername=canal
canal.instance.dbPassword=ABcd12#$..
[omm@simwor01 example]$ grep canal.mq.topic instance.properties
canal.mq.topic=GMALL_DB_CANAL
[omm@simwor01 example]$
# 启动 Canal
[omm@simwor01 canal]$ bin/startup.sh
# 生成数据
[omm@simwor01 appdb]$ pwd
/opt/appdb
[omm@simwor01 appdb]$ java -jar gmall2020-mock-db-2020-05-18.jar
# 观察 Kafka topic
[omm@simwor01 bin]$ ./kafka-console-consumer.sh --bootstrap-server simwor01:9092 --topic GMALL_DB_CANAL --from-beginning
...
^CProcessed a total of 1582 messages
[omm@simwor01 bin]$
通过 Canal 可以实时监测数据变化,现要求不同表的数据变化记录到不同的 Kafka topic 中。如下修改
user_info
表就会推送到ODS_USER_INFO
主题中:
BaseDbCanal
业务分流代码package com.simwor.realtime.ods
import com.alibaba.fastjson.JSON
import com.simwor.realtime.util.{MyKafkaSink, MyKafkaUtil, OffsetManager}
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.{DStream, InputDStream}
import org.apache.spark.streaming.kafka010.{HasOffsetRanges, OffsetRange}
import org.apache.spark.streaming.{Seconds, StreamingContext}
object BaseDbCanal {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName("base_db_canal_app").setMaster("local[4]")
val ssc = new StreamingContext(sparkConf, Seconds(5))
// ***************** 读取Kafka偏移量
val topicName = "GMALL_DB_CANAL"
val groupId = "gmall-canal-group"
val kafkaOffsetMap = OffsetManager.getOffset(topicName, groupId)
var recordInputStream: InputDStream[ConsumerRecord[String, String]] = null
if(kafkaOffsetMap != null && kafkaOffsetMap.size > 0)
recordInputStream = MyKafkaUtil.getKafkaStream(topicName, ssc, kafkaOffsetMap, groupId)
else
recordInputStream = MyKafkaUtil.getKafkaStream(topicName, ssc)
// ***************** 获得偏移结束点
var offsetRanges: Array[OffsetRange] = Array.empty[OffsetRange]
val startupInputGetOffsetDstream: DStream[ConsumerRecord[String, String]] = recordInputStream.transform { rdd =>
offsetRanges = rdd.asInstanceOf[HasOffsetRanges].offsetRanges
rdd
}
// ***************** 将 Kafka 数据转成 JSON 对象
val jsonObjDstream = startupInputGetOffsetDstream.map { record =>
val jsonString = record.value()
val jsonObj = JSON.parseObject(jsonString)
jsonObj
}
// ***************** 解析对象数据分流回推至 Kafka
jsonObjDstream.foreachRDD { rdd =>
//推回 Kafka
rdd.foreach { jsonObj =>
// 根据表名生长 topic 名
val tableName = jsonObj.getString("table")
val topic = "ODS_" + tableName.toUpperCase()
// 将数据分流推到 Kafka
val jsonArr = jsonObj.getJSONArray("data")
import scala.collection.JavaConversions._
for( item <- jsonArr)
MyKafkaSink.send(topic, item.toString)
}
}
// ***************** 提交Kafka偏移量
OffsetManager.saveOffset(topicName, groupId, offsetRanges)
ssc.start()
ssc.awaitTermination()
}
}
MyKafkaSink
实用类package com.simwor.realtime.util
import java.util.Properties
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
object MyKafkaSink {
private val properties: Properties = PropertiesUtil.load("config.properties")
val broker_list = properties.getProperty("kafka.broker.list")
var kafkaProducer: KafkaProducer[String, String] = null
def createKafkaProducer: KafkaProducer[String, String] = {
val properties = new Properties
properties.put("bootstrap.servers", broker_list)
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
properties.put("enable.idompotence",(true: java.lang.Boolean))
var producer: KafkaProducer[String, String] = null
try
producer = new KafkaProducer[String, String](properties)
catch {
case e: Exception =>
e.printStackTrace()
}
producer
}
def send(topic: String, msg: String): Unit = {
if (kafkaProducer == null) kafkaProducer = createKafkaProducer
kafkaProducer.send(new ProducerRecord[String, String](topic, msg))
}
def send(topic: String,key:String, msg: String): Unit = {
if (kafkaProducer == null) kafkaProducer = createKafkaProducer
kafkaProducer.send(new ProducerRecord[String, String](topic,key, msg))
}
}
[omm@simwor01 soft]$ tar -zxf maxwell-1.25.0.tar.gz -C /opt/module/
[omm@simwor01 soft]$ ln -s /opt/module/maxwell-1.25.0/ /opt/module/maxwell
[omm@simwor01 soft]$ ll -d /opt/module/max*
lrwxrwxrwx. 1 omm omm 27 Jun 30 10:23 /opt/module/maxwell -> /opt/module/maxwell-1.25.0/
drwxrwxr-x. 4 omm omm 200 Jun 30 10:23 /opt/module/maxwell-1.25.0
[omm@simwor01 soft]$
mysql> CREATE DATABASE maxwell;
mysql> GRANT ALL ON maxwell.* TO 'maxwell'@'%' IDENTIFIED BY 'Abcd12#$..';
mysql> GRANT SELECT ,REPLICATION SLAVE , REPLICATION CLIENT ON *.* TO maxwell@'%';
[omm@simwor01 maxwell]$ cp config.properties.example config.properties
[omm@simwor01 maxwell]$ vi config.properties
[omm@simwor01 maxwell]$ head -15 config.properties
log_level=info
producer=kafka
kafka.bootstrap.servers=simwor01:9092,simwor02:9092,simwor03:9092
kafka_topic=GMALL_DB_MAXWELL
# database | table | primary_key | random | column
producer_partition_by=primary_key
# mysql login info
host=simwor01
user=maxwell
password=Abcd12#$..
client_id=maxwell_1
[omm@simwor01 maxwell]$
启动maxwell -> 生成模拟数据 -> Kafka 消费验证
BaseDbMaxwell
业务代码package com.simwor.realtime.ods
...
object BaseDbMaxwell {
...
// ***************** 读取Kafka偏移量
val topicName = "GMALL_DB_MAXWELL"
val groupId = "gmall-maxwell-group"
...
// ***************** 解析对象数据分流回推至 Kafka
jsonObjDstream.foreachRDD { rdd =>
//推回 Kafka
rdd.foreach { jsonObj =>
// 根据表名生长 topic 名
val tableName = jsonObj.getString("table")
val topic = "ODS_" + tableName.toUpperCase()
// 将数据分流推到 Kafka
val jsonString = jsonObj.getString("data")
MyKafkaSink.send(topic, jsonString)
}
}
...
}