Centos | Hadoop | Hive | Tez | Mysql | Sqoop | Azkaban | Presto | |
---|---|---|---|---|---|---|---|---|
版本 | 7 | 2.7.7 | 1.2.1 | 0.9.1 | 5.7.28 | 1.4.6 | 2.5.0 | 0.196 |
Hadoop | Hive&Tez | Mysql | Sqoop | Azkaban | Presto | |
---|---|---|---|---|---|---|
node01 | √ | √ | √ | |||
node02 | √ | √ | √ | √ | ||
node03 | √ | √ | √ | √ | √ |
1、安装并准备3台CentOS7.2虚拟机,主机名命名为node01、node02、node03
2、上传自动化安装脚本automaticDeploy.zip到虚拟机node01中
3、解压automaticDeploy.zip到/home/hadoop/目录下
unzip automaticDeploy.zip -d /home/hadoop/
4、更改frames.txt文件,配置组件的安装节点信息
# 通用环境
jdk-8u144-linux-x64.tar.gz true
azkaban-sql-script-2.5.0.tar.gz true
# Node01
hadoop-2.7.7.tar.gz true node01
# Node02
mysql-rpm-pack-5.7.28 true node02
azkaban-executor-server-2.5.0.tar.gz true node02
azkaban-web-server-2.5.0.tar.gz true node02
presto-server-0.196.tar.gz true node02
# Node03
apache-hive-1.2.1-bin.tar.gz true node03
apache-tez-0.9.1-bin.tar.gz true node03
sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz true node03
yanagishima-18.0.zip true node03
# Muti
apache-flume-1.7.0-bin.tar.gz true node01,node02,node03
zookeeper-3.4.10.tar.gz true node01,node02,node03
kafka_2.11-0.11.0.2.tgz true node01,node02,node03
5、编辑configs.txt文件,配置mysql、keystore密码信息
# Mysql相关配置
mysql-root-password DBa2020* END
mysql-hive-password DBa2020* END
mysql-drive mysql-connector-java-5.1.26-bin.jar END
# azkaban相关配置
azkaban-mysql-user root END
azkaban-mysql-password DBa2020* END
azkaban-keystore-password 123456 END
6、编辑host_ip.txt文件,添加3个虚拟机节点信息
192.168.0.200 node01 root 123456
192.168.0.201 node02 root 123456
192.168.0.202 node03 root 123456
7、对/home/hadoop/automaticDeploy/下的hadoop、systems所有脚本添加执行权限
chmod +x /home/hadoop/automaticDeploy/hadoop/* /home/hadoop/automaticDeploy/systems/*
8、执行systems/batchOperate.sh脚本,完成环境初始化
/home/hadoop/automaticDeploy/systems/batchOperate.sh
9、根据安装需要,执行hadoop目录下对应的组件安装脚本
/home/hadoop/automaticDeploy/hadoop/installHadoop.sh
10、将自动化脚本分发到其他两个节点,并分别执行batchOperate.sh和组件安装脚本
scp -r automaticDeploy [email protected]:/home/hadoop/
scp -r automaticDeploy [email protected]:/home/hadoop/
11、在所有虚拟机节点source环境变量文件
source /etc/profile
12、启动hadoop环境,并检查是否启动成功
hadoop namenode -format
start-all.sh
整体开发流程
1、业务数据生成
2、ETL数据导入
3、创建ODS层,并完成HDFS数据接入
4、创建DWD层,并完成ODS层数据导入
5、创建DWS层,导入DWD层数据
6、创建ADS层,完成复购率计算
7、编写脚本,将ADS层的数据导出到Mysql中,供业务查询
8、使用Azkaban调度器,实现脚本自动化运行
export MYSQL_PWD=DBa2020*
mysql -uroot -e "create database mall;"
mysql -uroot mall < {pathToSQL}
use mall;
#生成日期2020-06-10日数据、订单300个、用户200个、商品sku300个、不删除数据
CALL init_data('2020-06-10',300,200,300,FALSE);
mkdir –p /home/warehouse/shell
cd /home/warehouse/shell
vim sqoop_import.sh
chmod +x /home/warehouse/shell/sqoop_import.sh
./sqoop_import.sh all 2020-06-10
hive --service hiveserver2 &
hive --service metastore &
vim /home/warehouse/sql/ods_ddl.sql
hive -f /home/warehouse/sql/ods_ddl.sql
vim /home/warehouse/shell/ods_db.sh
chmod +x /home/shell/warehouse/ods_db.sh
ods_db.sh 2020-06-10
DWD层分析
vim /home/warehouse/sql/dwd_ddl.sql
hive -f /home/warehouse/sql/dwd_ddl.sql
vim /home/warehouse/shell/dwd_db.sh
chmod +x /home/warehouse/shell/dwd_db.sh
./dwd_db.sh 2020-06-10
select * from dwd_sku_info where dt='2020-06-10' limit 2;
DWS层分析
vim /home/warehouse/sql/dws_ddl.sql
hive -f /home/warehouse/sql/dws_ddl.sql
vim /home/warehouse/shell/dws_db.sh
chmod +x /home/warehouse/shell/dws_db.sh
./dws_db.sh 2020-06-10
select * from dws_user_action where dt='2020-06-10' limit 2;
select * from dws_sale_detail_daycount where dt='2020-06-10' limit 2;
ADS层分析
vim /home/warehouse/sql/ads_sale_ddl.sql
hive -f /home/warehouse/sql/ads_sale_ddl.sql
vim /home/warehouse/shell/ads_sale.sh
chmod +x /home/warehouse/shell/ads_sale.sh
/home/warehouse/shell/ads_sale.sh 2020-06-10
select * from ads_sale_tm_category1_stat_mn limit 2;
vim /home/warehouse/sql/mysql_sale_ddl.sql
export MYSQL_PWD=DBa2020*
mysql -uroot mall < /home/warehouse/sql/mysql_sale_ddl.sql
vim /home/warehouse/shell/sqoop_export.sh
chmod +x /home/warehouse/shell/sqoop_export.sh
/home/warehouse/shell/sqoop_export.sh all
SELECT * FROM ads_sale_tm_category1_stat_mn;
CALL init_data('2020-06-12',300,200,300,FALSE);
azkaban-executor-start.sh
cd /opt/app/azkaban/server
azkaban-web-start.sh
useExecutor node03
dt 2020-06-12
vim /home/warehouse/sql/ads_gmv_ddl.sql
hive -f /home/warehouse/sql/ads_gmv_ddl.sql
vim /home/warehouse/shell/ads_gmv.sh
chmod +x /home/warehouse/shell/ads_gmv.sh
/home/warehouse/shell/ads_gmv.sh 2020-06-10
select * from ads_gmv_sum_day;
CALL init_data('2020-06-12',300,200,300,FALSE);
azkaban-executor-start.sh
cd /opt/app/azkaban/server
azkaban-web-start.sh
useExecutor node03
dt 2020-06-12
参考资料
阿里云:https://www.alipan.com/s/zuK576wnz2n