视频地址:尚硅谷大数据项目《在线教育之采集系统》_哔哩哔哩_bilibili
目录
P004
P006
P007
P009
P010
P017
P025
P026
P027
P028
P030
将数据以图形图表的方式展示出来!
数据埋点
所谓埋点就是在应用中特定的流程收集一些信息,用来跟踪应用使用的状况,后续用来进一步优化产品或是提供运营的数据支撑,包括访问数(Visits),访客数(Visitor),停留时长(Time On Site),页面浏览数(Page Views)和跳出率(Bounce Rate)。这样的信息收集可以大致分为两种:页面统计(track this virtual page view),统计操作行为(track this button by an event)。
数据埋点是什么?设置埋点的意义是什么? - 知乎
埋点:嵌入到程序中捕捉到用户的行为,将捕捉到的行为发送到用户服务器当中,进而写入到用户行为数据库(mysql...)中进行存储。
在线url网址编码、解码器-BeJSON.com
- ODS
- DIM
- DWD
- DWS
- ADS
用户画像:给用户打标签,挖掘潜在客户。
框架版本,Apache开源、CDH商用、HDP商用,都有一整套的大数据开发框架。
flume:读取用户行为数据,采集日志文件。
datax:读取mysql的数据。
kafka:解耦,消息中间件。
mysql:存储业务数据
hdfs:存储数仓数据
hbase、redis:响应速度快,存储实时响应的结果。
MongoDB:存储json格式的数据、响应速度快。
[atguigu@node001 ~]$ cd /opt/module/data_mocker/01-onlineEducation
[atguigu@node001 01-onlineEducation]$ java -jar edu2021-mock-2022-06-18.jar
SLF4J: Class path contains multiple SLF4J bindings.
{"common":{"ar":"14","ba":"Xiaomi","ch":"oppo","is_new":"1","md":"Xiaomi 10 Pro ","mid":"mid_233","os":"Android 9.0","sc":"1","sid":"50d36ad5-fc85-45c6-9267-e661ce120226","uid":"505","vc":"v2.1.134"},"displays":[{"display_type":"query","item":"9","item_type":"course_id","order":1,"pos_id":5},{"display_type":"query","item":"8","item_type":"course_id","order":2,"pos_id":5},{"display_type":"query","item":"9","item_type":"course_id","order":3,"pos_id":5},{"display_type":"promotion","item":"4","item_type":"course_id","order":4,"pos_id":5},{"display_type":"recommend","item":"7","item_type":"course_id","order":5,"pos_id":1},{"display_type":"query","item":"3","item_type":"course_id","order":6,"pos_id":3},{"display_type":"query","item":"6","item_type":"course_id","order":7,"pos_id":2},{"display_type":"promotion","item":"2","item_type":"course_id","order":8,"pos_id":1},{"display_type":"promotion","item":"9","item_type":"course_id","order":9,"pos_id":2}],"page":{"during_time":7934,"item":"17614","item_type":"order_id","last_page_id":"course_detail","page_id":"order"},"ts":1645499594528}
---演算中...---
---演算完成 ---
[atguigu@node001 01-onlineEducation]$
#! /bin/bash
ssh node001 "cd /opt/module/data_mocker/01-onlineEducation/;java -jar edu2021-mock-2022-06-18.jar 1>/dev/null 2>&1 &"
[atguigu@node001 bin]$ cd ~
[atguigu@node001 ~]$ ll
总用量 36
drwxrwxr-x. 2 atguigu atguigu 4096 5月 15 21:01 bin
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 公共
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 模板
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 视频
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 图片
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 文档
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 下载
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 音乐
drwxr-xr-x. 2 atguigu atguigu 4096 5月 10 11:00 桌面
[atguigu@node001 ~]$ cd bin
[atguigu@node001 bin]$ ll
总用量 24
-rwxrwxrwx. 1 atguigu atguigu 136 5月 15 21:07 jpsall
-rwxrwxrwx 1 atguigu atguigu 668 5月 15 21:17 kf.sh
-rwxrwxrwx. 1 atguigu atguigu 1150 5月 15 21:19 myhadoop.sh
-rwxrwxrwx 1 atguigu atguigu 141 5月 15 21:01 xcall
-rwxrwxr-x. 1 atguigu atguigu 733 5月 9 20:43 xsync
-rwxrwxrwx. 1 atguigu atguigu 574 5月 15 21:06 zk.sh
[atguigu@node001 bin]$ vim mock.sh
[atguigu@node001 bin]$ chmod 777 mock.sh
[atguigu@node001 bin]$ mock.sh
[atguigu@node001 bin]$ jps
4921 jar
4942 Jps
[atguigu@node001 bin]$ cd /opt/module/data_mocker/01-onlineEducation/log/
[atguigu@node001 log]$ ll
总用量 10240
-rw-rw-r-- 1 atguigu atguigu 10483421 7月 25 11:27 app.log
[atguigu@node001 log]$
#! /bin/bash
for host in node001 node002 node003
do
echo =============== $host ===============
ssh $host jps
# ssh $host "$*"
done
/opt/module/hadoop/hadoop-3.1.3/share/hadoop/hdfs/webapps/static
时间戳转化,将时间戳格式化为具体的时间。
'date_tostring' : function (v) {
// return moment(Number(v)).format('ddd MMM DD HH:mm:ss ZZ YYYY');
return Number(v).toLocaleString;
},
[atguigu@node001 kafka_2.12-3.0.0]$ bin/kafka-console-producer.sh --bootstrap-server node001:9092 --topic first
>aaa
>bbb
>
[atguigu@node002 zookeeper-3.5.7]$ cd /opt/module/kafka/kafka_2.12-3.0.0/
[atguigu@node002 kafka_2.12-3.0.0]$ bin/kafka-server-start.sh -daemon config/server.properties
[atguigu@node002 kafka_2.12-3.0.0]$ bin/kafka-console-consumer.sh --bootstrap-server node001:9092 --topic first
[2023-07-25 16:52:16,870] WARN [Consumer clientId=consumer-console-consumer-1125-1, groupId=console-consumer-1125] Error while fetching metadata with correlation id 2 : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
aaa
bbb