【Flink实时数仓】数据仓库项目实战 《二》 数据采集到kafka【ODS】

文章目录

  • 【Flink实时数仓】数据仓库项目实战 《二》 数据采集到Kafka【ODS】
    • 模拟数据采集模块:
      • -------------------------1.用户行为数据采集模块
      • -------------------------2.业务数据采集模块

【Flink实时数仓】数据仓库项目实战 《二》 数据采集到Kafka【ODS】

模拟数据采集模块:

-------------------------1.用户行为数据采集模块

用户行为日志
用户行为日志,一般是没有历史数据的,故日志只需要准备当前一天的数据。具体操作如下:
(1)启动 Kafka。

[root@hadoop102 clickhouse-server]# zk.sh start
---------- zookeeper hadoop102 启动 ------------
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
---------- zookeeper hadoop103 启动 ------------
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
---------- zookeeper hadoop104 启动 ------------
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@hadoop102 clickhouse-server]# kf.sh start
 --------启动 hadoop102 Kafka-------
 --------启动 hadoop103 Kafka-------
 --------启动 hadoop104 Kafka-------
[root@hadoop102 clickhouse-server]# f1.sh start
--------启动 hadoop102 采集flume-------
--------启动 hadoop103 采集flume-------
[root@hadoop102 clickhouse-server]# jpsall
=============== hadoop102 ===============
2349 QuorumPeerMain
2973 Application
3389 Jps
2767 Kafka
=============== hadoop103 ===============
2451 Kafka
3076 Jps
2026 QuorumPeerMain
2668 Application
=============== hadoop104 ===============
2006 QuorumPeerMain
2424 Kafka
2664 Jps

(2)启动一个命令行 Kafka 消费者,消费 topic_log 主题的数据。

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic topic_log

(3)修改两个日志服务器(hadoop102、hadoop103)中的
/opt/module/applog/application.yml配置文件,将mock.date参数改为2022-02-21。
(4)执行日志生成脚本lg.sh。
(5)观察命令行 Kafka 消费者是否消费到数据。

{"common":{"ar":"420000","ba":"Redmi","ch":"oppo","is_new":"0","md":"Redmi k30","mid":"mid_3080364","os":"Android 11.0","uid":"814","vc":"v2.1.134"},"page":{"during_time":5478,"last_page_id":"home","page_id":"search"},"ts":1669194666000}
{"common":{"ar":"420000","ba":"Redmi","ch":"oppo","is_new":"0","md":"Redmi k30","mid":"mid_3080364","os":"Android 11.0","uid":"814","vc":"v2.1.134"},"displays":[{"display_type":"query","item":"25","item_type":"sku_id","order":1,"pos_id":2},{"display_type":"query","item":"26","item_type":"sku_id","order":2,"pos_id":1},{"display_type":"query","item":"19","item_type":"sku_id","order":3,"pos_id":3},{"display_type":"query","item":"22","item_type":"sku_id","order":4,"pos_id":3},{"display_type":"promotion","item":"11","item_type":"sku_id","order":5,"pos_id":2}],"page":{"during_time":17094,"item":"苹果手机","item_type":"keyword","last_page_id":"search","page_id":"good_list"},"ts":1669194667000}
{"common":{"ar":"420000","ba":"Redmi","ch":"oppo","is_new":"0","md":"Redmi k30","mid":"mid_3080364","os":"Android 11.0","uid":"814","vc":"v2.1.134"},"displays":[{"display_type":"query","item":"24","item_type":"sku_id","order":1,"pos_id":2},{"display_type":"promotion","item":"26","item_type":"sku_id","order":2,"pos_id":3},{"display_type":"promotion","item":"7","item_type":"sku_id","order":3,"pos_id":1},{"display_type":"query","item":"26","item_type":"sku_id","order":4,"pos_id":2},{"display_type":"query","item":"5","item_type":"sku_id","order":5,"pos_id":2},{"display_type":"query","item":"14","item_type":"sku_id","order":6,"pos_id":1},{"display_type":"promotion","item":"32","item_type":"sku_id","order":7,"pos_id":1}],"page":{"during_time":8237,"item":"13","item_type":"sku_id","last_page_id":"good_list","page_id":"good_detail","source_type":"activity"},"ts":1669194668000}

-------------------------2.业务数据采集模块

(1)使用Maxwell同步mysql数据需要开启binlog日志

sudo vim /etc/my.cnf
##启用binlog的数据库,需根据实际情况作出修改   **配置完成后要重启mysql**
binlog-do-db=gmall
binlog-do-db=gmall-config
全量同步执行操作
$1  表名
$MAXWELL_HOME/bin/maxwell-bootstrap --database gmall --table $1 --config $MAXWELL_HOME/config.properties

(2)启动Maxwell。

[root@hadoop102 clickhouse-server]# mxw.sh start
启动Maxwell
Redirecting STDOUT to /opt/module/maxwell/bin/../logs/MaxwellDaemon.out
Using kafka version: 1.0.0
[root@hadoop102 clickhouse-server]# jpsall
=============== hadoop102 ===============
3537 Maxwell
3601 Jps
2349 QuorumPeerMain
2973 Application
2767 Kafka
=============== hadoop103 ===============
2451 Kafka
3971 Jps
2026 QuorumPeerMain
2668 Application
3566 ConsoleConsumer
=============== hadoop104 ===============
2006 QuorumPeerMain
2424 Kafka
2814 Jps

(3)启动一个命令行 Kafka 消费者,消费 topic_db 主题的数据。

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic topic_db 

(4)添加数据查看kafka消费详情。
【Flink实时数仓】数据仓库项目实战 《二》 数据采集到kafka【ODS】_第1张图片

[root@hadoop103 kafka]# bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic topic_db

#insert

{"database":"gmall","table":"base_trademark","type":"insert","ts":1669022846,"xid":513,"commit":true,"data":{"id":12,"tm_name":"金克拉","logo_url":null}}

#update

{"database":"gmall","table":"base_trademark","type":"update","ts":1669022852,"xid":533,"commit":true,"data":{"id":12,"tm_name":"金克拉","logo_url":"/static/default.jpg"},"old":{"logo_url":null}}

#delete

^[a{"database":"gmall","table":"base_trademark","type":"delete","ts":1669023098,"xid":1081,"commit":true,"data":{"id":12,"tm_name":"金克拉","logo_url":"/static/default.jpg"}}

你可能感兴趣的:(kafka,flink,数据仓库)