Hadoop数据经Hive汇总计算之后导出到Mysql



1、程序路径及文件目录

[hadoop@emr-worker-8 sdk-dataproc]$ pwd
/home/hadoop/sdk-dataproc
[hadoop@emr-worker-8 sdk-dataproc]$ ls
corn-err.txt  hive2mysql  manual-sdk-ctl.sh  nohup.out  proc  sdk-ctl.log  sdk-ctl.sh
[hadoop@emr-worker-8 sdk-dataproc]$ cd hive2mysql/
[hadoop@emr-worker-8 hive2mysql]$ ls
sdk-data  sdk-hive2mysql.sh
[hadoop@emr-worker-8 hive2mysql]$ cd ../proc/
[hadoop@emr-worker-8 proc]$ ls
sdk_anchor_live_proc.sh             sdk_room_anchor_online_max.sh  sdk_room_loginuser_visitor.sh  sdk_room_point.sh            x-sdk_launch_live.sh
sdk_appsource_audience_register.sh  sdk_room_gift_point.sh         sdk_room_messge_send.sh        sdk_room_user_online_max.sh
[hadoop@emr-worker-8 proc]$

2、总控调度程序
批量定时调度版本

[hadoop@emr-worker-8 sdk-dataproc]$ crontab -l
8 8 * * * sh /home/hadoop/sdk-dataproc/sdk-ctl.sh >>/home/hadoop/sdk-dataproc/corn-err.txt 2>&1
[hadoop@emr-worker-8 sdk-dataproc]$ cat sdk-ctl.sh 
#!/bin/bash
export yesterday=`date -d last-day +%Y-%m-%d`

#sdk data proc ......
echo `date "+%Y-%m-%d %H:%M:%S"`,shell script exec start ................................................................ >> /home/hadoop/sdk-dataproc/sdk-ctl.log
for proc_script in /home/hadoop/sdk-dataproc/proc/sdk*.sh;
 do
   echo `date "+%Y-%m-%d %H:%M:%S"`,$proc_script $yesterday data proc start ... >> /home/hadoop/sdk-dataproc/sdk-ctl.log
   /bin/sh $proc_script $yesterday
   echo `date "+%Y-%m-%d %H:%M:%S"`,$proc_script $yesterday data proc finished ! >> /home/hadoop/sdk-dataproc/sdk-ctl.log
   echo -e "" >> /home/hadoop/sdk-dataproc/sdk-ctl.log
 done

/bin/sh /home/hadoop/sdk-dataproc/hive2mysql/sdk-hive2mysql.sh $yesterday

echo -e "\n\n\n" >> /home/hadoop/sdk-dataproc/sdk-ctl.log

手动调度版本

[hadoop@emr-worker-8 sdk-dataproc]$ cat manual-sdk-ctl.sh 
#!/bin/bash
#export yesterday=`date -d last-day +%Y-%m-%d`

for yesterday in 2016-10-18 2016-10-19 2016-10-20 2016-10-21 2016-10-22 2016-10-23 2016-10-24 2016-10-25;
do
#sdk data proc ......
echo `date "+%Y-%m-%d %H:%M:%S"`,shell script exec start ................................................................ >> /home/hadoop/sdk-dataproc/sdk-ctl.log
for proc_script in /home/hadoop/sdk-dataproc/proc/sdk*.sh;
 do
   echo `date "+%Y-%m-%d %H:%M:%S"`,$proc_script $yesterday data proc start ... >> /home/hadoop/sdk-dataproc/sdk-ctl.log
   /bin/sh $proc_script $yesterday
   echo `date "+%Y-%m-%d %H:%M:%S"`,$proc_script $yesterday data proc finished ! >> /home/hadoop/sdk-dataproc/sdk-ctl.log
   echo -e "" >> /home/hadoop/sdk-dataproc/sdk-ctl.log
 done

/bin/sh /home/hadoop/sdk-dataproc/hive2mysql/sdk-hive2mysql.sh $yesterday

echo -e "\n\n\n" >> /home/hadoop/sdk-dataproc/sdk-ctl.log

done

3、Hive数据处理的程序脚本

[hadoop@emr-worker-8 proc]$ cat sdk_room_gift_point.sh 
#!/bin/bash

#echo -e "Please enter your etl date (YYYY-MM-DD):"
#read p_static_date

#var get ...
source /etc/profile
p_static_date=$1

#proc hql ...
/usr/lib/hive-current/bin/hive -e "
insert overwrite table sdk_room_gift_point
select xappkey,room_id,gift_point,static_date from sdk_room_gift_point where static_date <> '$p_static_date';

insert into table sdk_room_gift_point(xappkey,room_id,gift_point,static_date)
select xappkey,room_id,sum(gift_point) gift_point ,'$p_static_date' static_date
from data_chushou_open_gift_record  
where state=0 and substr(created_time,1,10)='$p_static_date' 
group by xappkey,room_id;
"
[hadoop@emr-worker-8 proc]$ cat sdk_room_messge_send.sh 
#!/bin/bash

#echo -e "Please enter your etl date (YYYY-MM-DD):"
#read p_static_date

#var get ...
source /etc/profile
p_static_date=$1

#proc hql ...
/usr/lib/hive-current/bin/hive -e "
insert overwrite table sdk_room_messge_send
select xappkey,roomid,messge_send_cnt,pt,static_date from sdk_room_messge_send where static_date <> '$p_static_date';

insert into table sdk_room_messge_send(xappkey,roomid,messge_send_cnt,pt,static_date)
select xappkey,roomid,count(*) messge_send_cnt,pt,'$p_static_date' static_date from t_log_chushou_message_send_v7 where pt_day='$p_static_date' and xappkey !='' group by xappkey,roomid,pt;
"
[hadoop@emr-worker-8 proc]$ 

4、Hive表向Mysql传送

[hadoop@emr-worker-8 hive2mysql]$ cat sdk-hive2mysql.sh 
#!/bin/bash
#export yesterday=`date -d last-day +%Y-%m-%d`

#var get ...
source /etc/profile
yesterday=$1

#sdk data hive to mysql proc ......
for tab_name in `/usr/lib/hive-current/bin/hive -e "show tables 'sdk_room*';show tables 'sdk_appsource_audience_register';show tables 'sdk_anchor_new';show tables 'sdk_launch_live';"`;
do
echo `date "+%Y-%m-%d %H:%M:%S"`,$tab_name $yesterday hive to mysql start ... >> /home/hadoop/sdk-dataproc/sdk-ctl.log
/usr/lib/hive-current/bin/hive -e "select * from $tab_name where static_date='$yesterday';" > /home/hadoop/sdk-dataproc/hive2mysql/sdk-data/$tab_name.dat
/usr/bin/mysql -N -hMysqlhost  -P3306 -uhadoop  -pMysqlpassword -e "use funnyai_data;delete from static_$tab_name where static_date='$yesterday';load data local infile \"/home/hadoop/sdk-dataproc/hive2mysql/sdk-data/$tab_name.dat\" into table static_$tab_name; "
echo `date "+%Y-%m-%d %H:%M:%S"`,$tab_name $yesterday hive to mysql finished ! >> /home/hadoop/sdk-dataproc/sdk-ctl.log
done

5、说明
该程序就是将Hdfs中的数据经Hive运算之后,再转储到Mysql数据库上,供报表及其他展示所用。实际上,程序还有很多的优化空间,比如数据库访问敏感信息处理、程序的目录结构清晰合理化方面。

你可能感兴趣的:(Shell,Solution)