现有商户每日交易汇总数据文件merch_trade_stat.txt,如下:(三列数据以','分隔,分别是商户ID、交易日期、日交易金额)
[root@node1 ~]$ more merch_trade_day_stat.txt
1,2017-07-01,100
1,2017-07-02,200
1,2017-07-03,300
1,2017-07-04,400
1,2017-07-05,500
1,2017-07-06,600
1,2017-07-07,40
1,2017-07-08,800
1,2017-07-09,900
1,2017-07-10,50
1,2017-07-11,100
1,2017-07-12,80
1,2017-07-13,300
1,2017-07-14,400
1,2017-07-15,500
2,2017-07-01,100
2,2017-07-02,200
2,2017-07-03,300
2,2017-07-04,60
2,2017-07-05,500
2,2017-07-06,600
2,2017-07-07,40
2,2017-07-08,800
2,2017-07-09,900
2,2017-07-10,50
2,2017-07-11,100
2,2017-07-12,80
2,2017-07-13,300
2,2017-07-14,400
2,2017-07-15,500
计算出每个商户日交易金额不小于100的最大连续天数:
hive> CREATE TABLE merch_trade_day_stat(
> merch_id string COMMENT '商户ID',
> date_key string COMMENT '交易日期',
> tx_amt int COMMENT '日交易金额'
> ) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',';
OK
Time taken: 0.375 seconds
hive> load data local inpath 'merch_trade_day_stat.txt' into table merch_trade_day_stat;
Loading data to table default.merch_trade_day_stat
Table default.merch_trade_day_stat stats: [numFiles=1, totalSize=443]
OK
Time taken: 0.45 seconds
hive> select * from merch_trade_day_stat;
OK
1 2017-07-01 100
1 2017-07-02 200
1 2017-07-03 300
1 2017-07-04 400
1 2017-07-05 500
1 2017-07-06 600
1 2017-07-07 40
1 2017-07-08 800
1 2017-07-09 900
1 2017-07-10 50
1 2017-07-11 100
1 2017-07-12 80
1 2017-07-13 300
1 2017-07-14 400
1 2017-07-15 500
2 2017-07-01 100
2 2017-07-02 200
2 2017-07-03 300
2 2017-07-04 60
2 2017-07-05 500
2 2017-07-06 600
2 2017-07-07 40
2 2017-07-08 800
2 2017-07-09 900
2 2017-07-10 50
2 2017-07-11 100
2 2017-07-12 80
2 2017-07-13 300
2 2017-07-14 400
2 2017-07-15 500
Time taken: 0.069 seconds, Fetched: 30 row(s)
hive> select a.merch_id,
> max(a.continue_days) as max_continue_days
> from(select a.merch_id,
> count(a.date_key) as continue_days
> from(select merch_id,
> date_key,
> date_sub(date_key, row_number() over(partition by merch_id order by date_key)) as tmp_date
> from merch_trade_day_stat
> where tx_amt >= 100
> ) a
> group by a.merch_id, a.tmp_date
> ) a
> group by a.merch_id;
Query ID = bd_20170810104913_29569c95-1110-4ed4-906e-b09ba6712ac7
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1499860627544_0066, Tracking URL = http://ali-bj01-tst-cluster-004.xiweiai.cn:8088/proxy/application_1499860627544_0066/
Kill Command = /mnt/bd/software/hadoop/hadoop-2.6.2/bin/hadoop job -kill job_1499860627544_0066
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-10 10:49:18,583 Stage-1 map = 0%, reduce = 0%
2017-08-10 10:49:24,792 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.26 sec
2017-08-10 10:49:29,929 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1499860627544_0066
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1499860627544_0067, Tracking URL = http://ali-bj01-tst-cluster-004.xiweiai.cn:8088/proxy/application_1499860627544_0067/
Kill Command = /mnt/bd/software/hadoop/hadoop-2.6.2/bin/hadoop job -kill job_1499860627544_0067
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-08-10 10:49:40,581 Stage-2 map = 0%, reduce = 0%
2017-08-10 10:49:44,691 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.74 sec
2017-08-10 10:49:49,826 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 1.93 sec
MapReduce Total cumulative CPU time: 1 seconds 930 msec
Ended Job = job_1499860627544_0067
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.85 sec HDFS Read: 9593 HDFS Write: 321 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 1.93 sec HDFS Read: 6039 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 780 msec
OK
1 6
2 3
Time taken: 37.05 seconds, Fetched: 2 row(s)