1前面Mysql开启了Bin_log,canal实时的去监听然后发送到kafka的example中,现在用Spark-streaming实时的去消费将这些信息打印出来
pom依赖:
org.apache.kafka
kafka_2.11
1.1.0
org.apache.spark
spark-streaming-kafka-0-10_2.11
2.3.0
compile
com.alibaba.otter
canal.client
1.1.3
spark-streaming代码如下
object KafkaTest {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[2]").setAppName("Spark-Kafkatest1")
val ssc = new StreamingContext(conf, Seconds(1))
val kafkaParam = Map(
"bootstrap.servers" -> "192.168.240.131:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "con-consumer-group",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
);
var stream: InputDStream[ConsumerRecord[String, String]] =
KafkaUtils.createDirectStream[String, String](
ssc, LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](Array( "example"),
kafkaParam))
stream.map(s=>("id:"+s.key(),"value:"+s.value())).foreachRDD(
rdd=>rdd.foreachPartition(
line=>{
line.foreach{message=>println(message._1,message._2)}
}
)
)
ssc.start()
ssc.awaitTermination()
}
}
实时监控Mysql中数据库表的变化信息,在mysql中进行操作:
我们可以在控制台上实时看到以下信息
(id:null,value:{"data":[{"id":"6","name":"liutao","adress":"hangzhou"}],"database":"fth","es":1556004862000,"id":23,"isDdl":false,
"mysqlType":{"id":"int","name":"varchar(100)","adress":"varchar(100)"},"old":null,"pkNames":null,"sql":"",
"sqlType":{"id":4,"name":12,"adress":12},"table":"user1","ts":1556004862943,"type":"INSERT"})
(id:null,value:{"data":[{"id":"6","name":"liutao","adress":"hangzhou"},{"id":"6","name":"liutao","adress":"hangzhou"}],
"database":"fth","es":1556005117000,"id":25,"isDdl":false,"mysqlType":{"id":"int","name":"varchar(100)","adress":"varchar(100)"},
"old":null,"pkNames":null,"sql":"","sqlType":{"id":4,"name":12,"adress":12},"table":"user1","ts":1556005117173,"type":"DELETE"})
实际项目中,参数都在配置文件中写好,这里就是做一个测试。在spark-streaming中还要做offset的维护,我前面的公司是将offset放在mysql中维护的