╭⌒若隐_RowYet

Hive从入门到放弃——玩一玩Hive的数据分析开窗函数（十五）

背景

Hive顶着大数据数据仓库的头衔，那么数据分析是在常用不过的功能了，所以，而数据分析钟，聚合函数和开窗函数是最重要的两环，聚合函数在博客Hive从入门到放弃——Hive 用户内置函数简介（十一）已经介绍，而开窗函数，因为器运用的灵活性，这里单独介绍一下；
有些新手会混淆聚合函数和开窗函数的区别，聚合函数：目是是按照某些维度聚会，计算需求需要的度量值的统计指标；开窗函数:选取的维度本身有不同的值，这些值作为一个分组，计算这些组内的度量值的统计指标；如一个记录全校不同班级的语文成绩的数据表，聚合函数基本用来统计全校平均分，总分，最高分，最低分等，每个班平均分，总分，最高分，最低分等；而开窗函数不仅可以做到校平均分，总分，最高分，最低分等，每个班平均分，总分，最高分，最低分等，还可以每班成绩排序，全校成绩排序，从一班开始累计人数等分析，更好的实现不同维度组的度量值的统计指标，下面进入开窗函数的正题；

数据表准备

这里我们新建一个表ods_sale_order_producttion_amount（ods产品每天销售总额），其中建表语句，数据加载，数据预览如下；

-- DDL建表
CREATE TABLE `ods_sale_order_producttion_amount`(
  `month_key` string COMMENT '月份', 
  `date_key` string COMMENT '日期', 
  `production_amount` decimal(18,2) COMMENT '产品总值')
COMMENT 'ods产品每天销售总额'
PARTITIONED BY ( 
  `event_year` int, 
  `event_week` int, 
  `event_day` string, 
  `event_hour` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://dw-test-cluster/hive/warehouse/ods/sale/ods_sale_order_producttion_amount'
TBLPROPERTIES ('parquet.compression'='snappy')

-- 装入数据
insert into ods_sale_order_producttion_amount partition(event_year=2020,event_week=30,event_day='20200731',event_hour='00')
select  '202005','20200501',199000.00
union all 
select  '202005','20200502',185000.00
union all 
select  '202005','20200503',199000.00
union all 
select  '202005','20200504',138500.00
union all 
select  '202005','20200505',196540.00
union all 
select  '202005','20200506',138500.00
union all 
select  '202005','20200507',159840.00
union all 
select  '202005','20200508',189462.00
union all
select  '202005','20200509',200000.00
union all
select  '202005','20200510',198540.00
union all 
select  '202006','20200601',189000.00
union all 
select  '202006','20200602',185000.00
union all 
select  '202006','20200603',189000.00
union all 
select  '202006','20200604',158500.00
union all 
select  '202006','20200605',200140.00
union all 
select  '202006','20200606',158500.00
union all 
select  '202006','20200607',198420.00
union all 
select  '202006','20200608',158500.00
union all
select  '202006','20200609',200100.00
union all
select  '202006','20200610',135480.00

-- 数据预览
month_key       date_key        production_amount       event_year      event_week      event_day       event_hour
202005  20200501        199000.00       2020    30      20200731        00
202005  20200502        185000.00       2020    30      20200731        00
202005  20200503        199000.00       2020    30      20200731        00
202005  20200504        138500.00       2020    30      20200731        00
202005  20200505        196540.00       2020    30      20200731        00
202005  20200506        138500.00       2020    30      20200731        00
202005  20200507        159840.00       2020    30      20200731        00
202005  20200508        189462.00       2020    30      20200731        00
202005  20200509        200000.00       2020    30      20200731        00
202005  20200510        198540.00       2020    30      20200731        00
202006  20200601        189000.00       2020    30      20200731        00
202006  20200602        185000.00       2020    30      20200731        00
202006  20200603        189000.00       2020    30      20200731        00
202006  20200604        158500.00       2020    30      20200731        00
202006  20200605        200140.00       2020    30      20200731        00
202006  20200606        158500.00       2020    30      20200731        00
202006  20200607        198420.00       2020    30      20200731        00
202006  20200608        158500.00       2020    30      20200731        00
202006  20200609        200100.00       2020    30      20200731        00
202006  20200610        135480.00       2020    30      20200731        00
Time taken: 0.233 seconds, Fetched: 20 row(s)

开窗分组排序函数

ROW_NUMBER:
用法：ROW_NUMBER() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
返回值：返回根据分区列的每一组数据按照排序列的顺序或者倒序排序的排名结果，即使排序列的值一样，组内排名也有先后，最终的组内排名值等于最终的组内参与排名的行总数；
RANK
用法：RANK() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
返回值：返回根据分区列的每一组数据按照排序列的顺序或者倒序排序的排名结果，排序列的值一样，组内排名一样，但是下一个排名会跳过上组排名重复的个数，可能出现间断的名次，即最终的组内排名值等于最终的组内参与排名的行总数；
DENSE_RANK
用法：DENSE_RANK() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
返回值：返回根据分区列的每一组数据按照排序列的顺序或者倒序排序的排名结果，排序列的值一样，组内排名一样，但是下一个排名不会跳过上组排名重复的个数，即最终的组内排名小于等于最终的组内参与排名的行总数；
CUME_DIST
用法：CUME_DIST() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
返回值：返回根据分区列的每一组数据按照排序列的顺序或者倒序排序的排名百分比占比，取值为数学区间（0,1](大于0，小于等于1)，排序列的值一样，组内百分比排名一样，值的计算上等于小于等于该列组内RANK()值的个数/组内总行数，，即最终的组内排名百分比总是等于1.0；
PERCENT_RANK
用法：PERCENT_RANK() OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
返回值：返回根据分区列的每一组数据按照排序列的顺序或者倒序排序的排名百分比占比，取值为数学区间[0,1](大于等于0，小于等于1)，而且百分比排名总是从0开始排的，排序列的值一样，组内百分比排名一样，而且一定是从等于0开始往后排的，值的计算上等于(组内当前行的RANK值-1)/(组内总行数-1)，即最终的组内排名百分比不一定等于1.0,当组内最大的RANK值也发生了重复的时候，则不会等于1.0，否则等于1.0；
NTILE
用法：NTILE(num) OVER([PARTITION BY col1.col2,…] order by col1.col2,… ASC|DESC)
返回值：用于将分组数据按照顺序切分成num片，返回根据分区列的每一组数据按照排序列的顺序或者倒序排序的结果平均的分配到num个切片中，如果分配不均匀，如将组内的10行数按分组排序规则分配给7个切片，则余数为3，这三个余下的分片会平均从较小的分片里面开始平摊，即分片1,2,3能分到两行数据，剩下的能分到一行，返回的值时该行数据对应的分片值，分片从1开始计算；
运用：常用来获取1/n时间段内的最大，最小收益等，如按照时间分组排序，平均分成3份，获取每组内前m段时间内的最好的业绩，则只需要取到NTILE()处理后的返回值的小于等于m即可；
ROW_NUMBER() ,RANK() ,DENSE_RANK(),CUME_DIST()，PERCENT_RANK()，NTILE() 的Hive Cli实现效果：如下；

hive> set hive.cli.print.header=true;
hive> select
    >
    >         ROW_NUMBER() OVER(PARTITION BY month_key order by production_amount DESC)   row_number_result
    >        ,RANK() OVER(PARTITION BY month_key order by production_amount DESC)         rank_result
    >        ,DENSE_RANK() OVER(PARTITION BY month_key order by production_amount DESC)   dense_rank_result
    >        ,month_key
    >        ,date_key
    >        ,production_amount
    > from ods_sale_order_producttion_amount where event_day='20200731';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200721163446_108a897b-ddf1-46ae-9bd8-a81a2afec8c9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0123, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0123/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0123
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-21 16:35:03,990 Stage-1 map = 0%,  reduce = 0%
2020-07-21 16:35:12,456 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.76 sec
2020-07-21 16:35:17,720 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.95 sec
MapReduce Total cumulative CPU time: 6 seconds 950 msec
Ended Job = job_1592876386879_0123
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.95 sec   HDFS Read: 14952 HDFS Write: 970 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 950 msec
OK
Time taken: 32.507 seconds, Fetched: 20 row(s)

为了方便看结果，我把结果拉到表格内来格式规范点，看起来舒服点，具体结果如表1；

表1 ROW_NUMBER() ,RANK() ,DENSE_RANK(),CUME_DIST()，PERCENT_RANK()，NTILE()排序结果

row_number_result	rank_result	dense_rank_result	cume_dist_result	percent_rank_result	ntile2_result	ntile3_result	ntile4_result	ntile7_result	month_key	date_key	production_amount
1	1	1	0.1	0.0	1	1	1	1	202006	20200605	200140
2	2	2	0.2	0.1111111111111111	1	1	1	1	202006	20200609	200100
3	3	3	0.3	0.2222222222222222	1	1	1	2	202006	20200607	198420
4	4	4	0.5	0.3333333333333333	1	1	2	2	202006	20200601	189000
5	4	4	0.5	0.3333333333333333	1	2	2	3	202006	20200603	189000
6	6	5	0.6	0.5555555555555556	2	2	2	3	202006	20200602	185000
7	7	6	0.9	0.6666666666666666	2	2	3	4	202006	20200604	158500
8	7	6	0.9	0.6666666666666666	2	3	3	5	202006	20200606	158500
9	7	6	0.9	0.6666666666666666	2	3	4	6	202006	20200608	158500
10	10	7	1.0	1.0	2	3	4	7	202006	20200610	135480
1	1	1	0.1	0.0	1	1	1	1	202005	20200509	200000
2	2	2	0.3	0.1111111111111111	1	1	1	1	202005	20200501	199000
3	2	2	0.3	0.1111111111111111	1	1	1	2	202005	20200503	199000
4	4	3	0.4	0.3333333333333333	1	1	2	2	202005	20200510	198540
5	5	4	0.5	0.4444444444444444	1	2	2	3	202005	20200505	196540
6	6	5	0.6	0.5555555555555556	2	2	2	3	202005	20200508	189462
7	7	6	0.7	0.6666666666666666	2	2	3	4	202005	20200502	185000
8	8	7	0.8	0.7777777777777778	2	3	3	5	202005	20200507	159840
9	9	8	1.0	0.8888888888888888	2	3	4	6	202005	20200504	138500
10	9	8	1.0	0.8888888888888888	2	3	4	7	202005	20200506	138500

开窗基本集合函数

COUNT
用法：COUNT(*) OVER([PARTITION BY col1.col2,…] )
返回值：返回根据分区列的每一组数据的总行数；
SUM
用法：SUM(coln) OVER([PARTITION BY col1.col2,…] )
返回值：返回根据分区列的每一组数据的关于coln的汇总结果；
MIN
用法：MIN(coln) OVER([PARTITION BY col1.col2,…] )
返回值：返回根据分区列的每一组数据的关于coln的最小值；
MAX
用法：MAX(coln) OVER([PARTITION BY col1.col2,…] )
返回值：返回根据分区列的每一组数据的关于coln的最大值；
AVG
用法：AVG(coln) OVER([PARTITION BY col1.col2,…] )
返回值：返回根据分区列的每一组数据的关于coln的平均值；
COUNT,SUM,MIN.MAX,AVG的Hive Cli实现：如下；

hive> set hive.cli.print.header=true;
hive> select month_key,date_key,production_amount
    >       ,count(3) over(partition by month_key) as `每月份数`
    >       ,sum(production_amount) OVER(partition by month_key) as `每月总和`
    >       ,min(production_amount) OVER(partition by month_key) as `每月最小收入`
    >       ,max(production_amount) OVER(partition by month_key) as `每月最大收入`
    >       ,avg(production_amount) OVER(partition by month_key) as `每月平均收入`
    > from dw.ods_sale_order_producttion_amount
    > ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729105357_e57ed4d0-f210-431c-8889-5a138cbb0032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0139, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0139/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0139
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 10:54:10,089 Stage-1 map = 0%,  reduce = 0%
2020-07-29 10:54:19,185 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.93 sec
2020-07-29 10:54:25,548 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.35 sec
MapReduce Total cumulative CPU time: 8 seconds 350 msec
Ended Job = job_1592876386879_0139
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.35 sec   HDFS Read: 18912 HDFS Write: 1807 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 350 msec
OK
month_key       date_key        production_amount       每月份数        每月总和        每月最小收入    每月最大收入    每月平均收入
202005  20200501        199000.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200510        198540.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200509        200000.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200508        189462.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200507        159840.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200506        138500.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200505        196540.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200504        138500.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200503        199000.00       10      1804382.00      138500.00       200000.00       180438.200000
202005  20200502        185000.00       10      1804382.00      138500.00       200000.00       180438.200000
202006  20200610        135480.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200609        200100.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200608        158500.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200607        198420.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200606        158500.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200605        200140.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200604        158500.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200603        189000.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200602        185000.00       10      1772640.00      135480.00       200140.00       177264.000000
202006  20200601        189000.00       10      1772640.00      135480.00       200140.00       177264.000000
Time taken: 29.364 seconds, Fetched: 20 row(s)

LEAD
用法：LEAD(col,n,DEFAULT) 用于统计窗口内往下第n行值,第一个参数为列名，第二个参数为往下第n行（可选，默认为1），第三个参数为默认值（当往下第n行为NULL时候，取默认值，如不指定，则为NULL）
返回值：回根据分区列的每一组统计窗口内往下第n行值；
LAG
用法：LAG(col,n,DEFAULT)，用于统计窗口内往上第n行值，第一个参数为列名，第二个参数为往上第n行（可选，默认为1），第三个参数为默认值（当往上第n行为NULL时候，取默认值，如不指定，则为NULL）
返回值：返回根据分区列的每一组统计窗口内往上第n行值；
FIRST_VALUE
用法：FIRST_VALUE(coln) over (partition by col1,col2 order by col1,col2)
返回值：返回根据分区排序列的每一组统计窗口内第1行值；
LAST_VALUE
用法：LAST_VALUE(coln) over (partition by col1,col2 order by col1,col2)
返回值：返回根据分区排序列的每一组统计窗口内截止到当前行最后一行值；
LEAD，LAG，FIRST_VALUE，LAST_VALUE的Hive Cli实现:

hive> set hive.cli.print.header=true;
hive> select month_key,date_key,production_amount
    >       ,FIRST_VALUE(production_amount) OVER(partition by month_key order by date_key) as `月初数据`
    >       ,LAST_VALUE(production_amount) OVER(partition by month_key)  as `总月末数据`
    >       ,LAST_VALUE(production_amount) OVER(partition by month_key order by date_key)  as `截至当前月末数据`
    >       ,LAG(date_key)  OVER(partition by month_key order by date_key)             as `LAG默认参数`
    >       ,LAG(date_key,1)  OVER(partition by month_key order by date_key) as `LAG取date_key值向上移动1位`
    >       ,LAG(date_key,2,'19000101')  OVER(partition by month_key order by date_key) as `LAG取date_key值向上移动2位，取到null用19000101代替`
    >       ,LEAD(date_key)   OVER(partition by month_key order by date_key)               as `LEAD默认参数`
    >       ,LEAD(date_key,1)  OVER(partition by month_key order by date_key)   as `LEAD取date_key值向上移动1位`
    >       ,LEAD(date_key,2,'19000101')   OVER(partition by month_key order by date_key)  as `LEAD取date_key值向上移动1位，取到null用19000101代替`
    > from dw.ods_sale_order_producttion_amount
    > ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729113052_9e57c0fe-5c46-4589-9406-f394787885bc
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0144, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0144/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0144
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 11:30:59,931 Stage-1 map = 0%,  reduce = 0%
2020-07-29 11:31:06,206 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.27 sec
2020-07-29 11:31:12,498 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.62 sec
MapReduce Total cumulative CPU time: 7 seconds 620 msec
Ended Job = job_1592876386879_0144
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0145, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0145/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0145
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 11:31:24,159 Stage-2 map = 0%,  reduce = 0%
2020-07-29 11:31:30,442 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.7 sec
2020-07-29 11:31:36,698 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 4.97 sec
MapReduce Total cumulative CPU time: 4 seconds 970 msec
Ended Job = job_1592876386879_0145
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.62 sec   HDFS Read: 20982 HDFS Write: 2084 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 4.97 sec   HDFS Read: 21609 HDFS Write: 2499 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 590 msec
OK
month_key       date_key        production_amount       月初数据        总月末数据      截至当前月末数据        lag默认参数     lag取date_key值向上移动1位      lag取date_key值向上移动2位，取到null用19000101代替   lead默认参数    lead取date_key值向上移动1位     lead取date_key值向上移动1位，取到null用19000101代替
202005  20200501        199000.00       199000.00       185000.00       199000.00       NULL    NULL    19000101        20200502        20200502        20200503
202005  20200510        198540.00       199000.00       185000.00       198540.00       20200509        20200509        20200508        NULL    NULL    19000101
202005  20200509        200000.00       199000.00       185000.00       200000.00       20200508        20200508        20200507        20200510        20200510        19000101
202005  20200508        189462.00       199000.00       185000.00       189462.00       20200507        20200507        20200506        20200509        20200509        20200510
202005  20200507        159840.00       199000.00       185000.00       159840.00       20200506        20200506        20200505        20200508        20200508        20200509
202005  20200506        138500.00       199000.00       185000.00       138500.00       20200505        20200505        20200504        20200507        20200507        20200508
202005  20200505        196540.00       199000.00       185000.00       196540.00       20200504        20200504        20200503        20200506        20200506        20200507
202005  20200504        138500.00       199000.00       185000.00       138500.00       20200503        20200503        20200502        20200505        20200505        20200506
202005  20200503        199000.00       199000.00       185000.00       199000.00       20200502        20200502        20200501        20200504        20200504        20200505
202005  20200502        185000.00       199000.00       185000.00       185000.00       20200501        20200501        19000101        20200503        20200503        20200504
202006  20200610        135480.00       189000.00       189000.00       135480.00       20200609        20200609        20200608        NULL    NULL    19000101
202006  20200609        200100.00       189000.00       189000.00       200100.00       20200608        20200608        20200607        20200610        20200610        19000101
202006  20200608        158500.00       189000.00       189000.00       158500.00       20200607        20200607        20200606        20200609        20200609        20200610
202006  20200607        198420.00       189000.00       189000.00       198420.00       20200606        20200606        20200605        20200608        20200608        20200609
202006  20200606        158500.00       189000.00       189000.00       158500.00       20200605        20200605        20200604        20200607        20200607        20200608
202006  20200605        200140.00       189000.00       189000.00       200140.00       20200604        20200604        20200603        20200606        20200606        20200607
202006  20200604        158500.00       189000.00       189000.00       158500.00       20200603        20200603        20200602        20200605        20200605        20200606
202006  20200603        189000.00       189000.00       189000.00       189000.00       20200602        20200602        20200601        20200604        20200604        20200605
202006  20200602        185000.00       189000.00       189000.00       185000.00       20200601        20200601        19000101        20200603        20200603        20200604
202006  20200601        189000.00       189000.00       189000.00       189000.00       NULL    NULL    19000101        20200602        20200602        20200603
Time taken: 45.744 seconds, Fetched: 20 row(s)

LAG和LEAD默认的第二个参数是1，第三个参数是null
OVER(partition by col1 order by col2),加了更细力度order by的列后，就可以做到截止到目前列的数据，这个在扩展中可以详细讲；

开窗基本集合函数扩展

另一种情况的运用，就是累计每个月截止到目前行的业绩总和，平均业绩，最大业绩等等，这里以求总和为例子，就是我们经常说的MTD业绩，如按照业绩从小到大排列，求截至目前行的总值，具体如下图所示：

图1 根据RANGE或者ROWS累计开窗

这里可以结合RANGE或者ROWS的来使用，RANGE和ROWS还是有点区别的，只不过说这个列子不明显，可参考下图的一个列子，RANGE会把相同partition和order by的值显示为一样的最终结果，但是ROWS不会，Hive Cli代码和效果如下；


hive> set hive.cli.print.header=true;
hive> select month_key,date_key,production_amount
  >      ,sum(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月总和`
  >      ,sum(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月总和`
  >
  >      ,min(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月最小收入`
  >      ,max(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月最大收入`
  >      ,avg(production_amount) OVER(partition by month_key order by production_amount RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `RANGE每月平均收入`
  >      ,min(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月最小收入`
  >      ,max(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月最大收入`
  >      ,avg(production_amount) OVER(partition by month_key order by production_amount ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as `ROWS每月平均收入`
  > from dw.ods_sale_order_producttion_amount
  > ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729151835_4e818a96-4447-4fad-aae5-8e14371e2ba6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0152, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0152/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0152
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:18:43,001 Stage-1 map = 0%,  reduce = 0%
2020-07-29 15:18:53,853 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.91 sec
2020-07-29 15:19:01,158 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.56 sec
MapReduce Total cumulative CPU time: 8 seconds 560 msec
Ended Job = job_1592876386879_0152
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.56 sec   HDFS Read: 22462 HDFS Write: 2646 SUCCESS
Total MapReduce CPU Time Spent: 8 seconds 560 msec
OK
month_key       date_key        production_amount       range每月总和   rows每月总和    range每月最小收入       range每月最大收入       range每月平均收入       rows每月最小收入        rows每月最大收入    rows每月平均收入
202005  20200504        138500.00       277000.00       138500.00       138500.00       138500.00       138500.000000   138500.00       138500.00       138500.000000
202005  20200506        138500.00       277000.00       277000.00       138500.00       138500.00       138500.000000   138500.00       138500.00       138500.000000
202005  20200507        159840.00       436840.00       436840.00       138500.00       159840.00       145613.333333   138500.00       159840.00       145613.333333
202005  20200502        185000.00       621840.00       621840.00       138500.00       185000.00       155460.000000   138500.00       185000.00       155460.000000
202005  20200508        189462.00       811302.00       811302.00       138500.00       189462.00       162260.400000   138500.00       189462.00       162260.400000
202005  20200505        196540.00       1007842.00      1007842.00      138500.00       196540.00       167973.666667   138500.00       196540.00       167973.666667
202005  20200510        198540.00       1206382.00      1206382.00      138500.00       198540.00       172340.285714   138500.00       198540.00       172340.285714
202005  20200501        199000.00       1604382.00      1405382.00      138500.00       199000.00       178264.666667   138500.00       199000.00       175672.750000
202005  20200503        199000.00       1604382.00      1604382.00      138500.00       199000.00       178264.666667   138500.00       199000.00       178264.666667
202005  20200509        200000.00       1804382.00      1804382.00      138500.00       200000.00       180438.200000   138500.00       200000.00       180438.200000
202006  20200610        135480.00       135480.00       135480.00       135480.00       135480.00       135480.000000   135480.00       135480.00       135480.000000
202006  20200604        158500.00       610980.00       293980.00       135480.00       158500.00       152745.000000   135480.00       158500.00       146990.000000
202006  20200606        158500.00       610980.00       452480.00       135480.00       158500.00       152745.000000   135480.00       158500.00       150826.666667
202006  20200608        158500.00       610980.00       610980.00       135480.00       158500.00       152745.000000   135480.00       158500.00       152745.000000
202006  20200602        185000.00       795980.00       795980.00       135480.00       185000.00       159196.000000   135480.00       185000.00       159196.000000
202006  20200601        189000.00       1173980.00      984980.00       135480.00       189000.00       167711.428571   135480.00       189000.00       164163.333333
202006  20200603        189000.00       1173980.00      1173980.00      135480.00       189000.00       167711.428571   135480.00       189000.00       167711.428571
202006  20200607        198420.00       1372400.00      1372400.00      135480.00       198420.00       171550.000000   135480.00       198420.00       171550.000000
202006  20200609        200100.00       1572500.00      1572500.00      135480.00       200100.00       174722.222222   135480.00       200100.00       174722.222222
202006  20200605        200140.00       1772640.00      1772640.00      135480.00       200140.00       177264.000000   135480.00       200140.00       177264.000000
Time taken: 26.882 seconds, Fetched: 20 row(s)

rows和range后面的常用参数如下表2：

表2 rows和range后面的常用参数解析


UNBOUNDED PRECEDING	The window starts at the first row of the partition.
UNBOUNDED FOLLOWING	The window ends at the last row of the partition.
CURRENT ROW	window begins at the current row or ends at the current row
n PRECEDING or n FOLLOWING	The window starts or ends n rows before or after the current row.for example,ROWS BETWEEN Unbounded preceding AND 1 Preceding 1,means that the window goes from the first row of the partition to the row that stands (in the ordered set) immediatly before the current row…

OLAP开窗函数

Hive开窗函数也支持OLAP，根据不同维度上钻和下钻的指标统计，达到数据分析的效果，常用的OLAP函数如下；

GROUPING SETS GROUPING__ID
返回值：在一个GROUP BY查询中，根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL GROUPING__ID，表示结果属于哪一个分组集合；
Hive Cli语句和效果：如下；

hive> set hive.cli.print.header=true;
hive> select month_key,date_key
    >       ,sum(production_amount) `生产总值`
    >       ,GROUPING__ID
    > from     dw.ods_sale_order_producttion_amount
    > group by  month_key,date_key
    > grouping sets(month_key,date_key)
    > order by GROUPING__ID
    > ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729154417_180a7883-ed66-49e3-a52e-bdebfbf61507
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0156, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0156/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0156
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:44:24,616 Stage-1 map = 0%,  reduce = 0%
2020-07-29 15:44:32,981 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.61 sec
2020-07-29 15:44:38,197 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.62 sec
MapReduce Total cumulative CPU time: 7 seconds 620 msec
Ended Job = job_1592876386879_0156
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0157, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0157/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0157
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 15:44:49,818 Stage-2 map = 0%,  reduce = 0%
2020-07-29 15:44:56,078 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.34 sec
2020-07-29 15:45:02,288 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 4.51 sec
MapReduce Total cumulative CPU time: 4 seconds 510 msec
Ended Job = job_1592876386879_0157
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.62 sec   HDFS Read: 11977 HDFS Write: 796 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 4.51 sec   HDFS Read: 7133 HDFS Write: 877 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 130 msec
OK
month_key       date_key        生产总值        grouping__id
202005  NULL    1804382.00      1
202006  NULL    1772640.00      1
NULL    20200601        189000.00       2
NULL    20200610        135480.00       2
NULL    20200609        200100.00       2
NULL    20200608        158500.00       2
NULL    20200607        198420.00       2
NULL    20200606        158500.00       2
NULL    20200605        200140.00       2
NULL    20200604        158500.00       2
NULL    20200603        189000.00       2
NULL    20200602        185000.00       2
NULL    20200510        198540.00       2
NULL    20200509        200000.00       2
NULL    20200508        189462.00       2
NULL    20200507        159840.00       2
NULL    20200506        138500.00       2
NULL    20200505        196540.00       2
NULL    20200504        138500.00       2
NULL    20200503        199000.00       2
NULL    20200502        185000.00       2
NULL    20200501        199000.00       2
Time taken: 45.797 seconds, Fetched: 22 row(s)

结果分析：统计不同层次维度的度量值，第1列是按照month_key进行分组，第2列是按照date_key进行分组，统计各自的生产总值，注意GROUPING__ID表组别，是两个_隔开，根据grouping sets(month_key,date_key)来标识顺序，1是代表month_key，2是代表date_key。

CUBE
返回值：根据GROUP BY的维度的所有组合进行聚合；
Hive Cli语句和效果：如下；

hive> set hive.cli.print.header=true;
hive> select month_key,date_key
    >       ,sum(production_amount) `生产总值`
    >       ,GROUPING__ID
    > from     dw.ods_sale_order_producttion_amount
    > group by  month_key,date_key
    > WITH CUBE
    > ORDER BY GROUPING__ID;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729155154_4b3bf262-89b1-4834-a6b9-a8ed0a9eafda
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0159, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0159/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0159
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:52:14,628 Stage-1 map = 0%,  reduce = 0%
2020-07-29 15:52:20,895 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.17 sec
2020-07-29 15:52:27,147 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.0 sec
MapReduce Total cumulative CPU time: 7 seconds 0 msec
Ended Job = job_1592876386879_0159
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0161, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0161/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0161
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 15:53:01,847 Stage-2 map = 0%,  reduce = 0%
2020-07-29 15:53:08,105 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.92 sec
2020-07-29 15:53:14,350 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 5.13 sec
MapReduce Total cumulative CPU time: 5 seconds 130 msec
Ended Job = job_1592876386879_0161
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.0 sec   HDFS Read: 12026 HDFS Write: 1599 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 5.13 sec   HDFS Read: 7936 HDFS Write: 1708 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 130 msec
OK
month_key       date_key        生产总值        grouping__id
202005  20200501        199000.00       0
202006  20200610        135480.00       0
202006  20200608        158500.00       0
202006  20200607        198420.00       0
202006  20200606        158500.00       0
202006  20200605        200140.00       0
202006  20200604        158500.00       0
202006  20200603        189000.00       0
202006  20200602        185000.00       0
202006  20200601        189000.00       0
202006  20200609        200100.00       0
202005  20200510        198540.00       0
202005  20200509        200000.00       0
202005  20200508        189462.00       0
202005  20200507        159840.00       0
202005  20200506        138500.00       0
202005  20200505        196540.00       0
202005  20200504        138500.00       0
202005  20200503        199000.00       0
202005  20200502        185000.00       0
202005  NULL    1804382.00      1
202006  NULL    1772640.00      1
NULL    20200610        135480.00       2
NULL    20200609        200100.00       2
NULL    20200608        158500.00       2
NULL    20200607        198420.00       2
NULL    20200606        158500.00       2
NULL    20200605        200140.00       2
NULL    20200604        158500.00       2
NULL    20200603        189000.00       2
NULL    20200602        185000.00       2
NULL    20200601        189000.00       2
NULL    20200510        198540.00       2
NULL    20200509        200000.00       2
NULL    20200508        189462.00       2
NULL    20200507        159840.00       2
NULL    20200506        138500.00       2
NULL    20200505        196540.00       2
NULL    20200504        138500.00       2
NULL    20200503        199000.00       2
NULL    20200502        185000.00       2
NULL    20200501        199000.00       2
NULL    NULL    3577022.00      3
Time taken: 80.686 seconds, Fetched: 43 row(s)

结果分析：根据GROUP BY的维度的所有组合进行聚合，统计不同维度的值；

ROLLUP
返回值：是CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合，即只会返回group by 后面第一个维度的所有聚合情况；
Hive Cli语句和效果：如下，这里分别以month_key，date_key写在group by第一个，就一目了然了；

-- month_key在最左侧
hive> set hive.cli.print.header=true;
hive> select month_key,date_key
    >       ,sum(production_amount) `生产总值`
    >       ,GROUPING__ID
    > from     dw.ods_sale_order_producttion_amount
    > group by  month_key,date_key
    > WITH ROLLUP
    > ORDER BY GROUPING__ID;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729155836_b7c2669b-9763-469f-8337-03cc5b71f68e
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0163, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0163/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0163
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 15:58:56,098 Stage-1 map = 0%,  reduce = 0%
2020-07-29 15:59:03,410 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.12 sec
2020-07-29 15:59:08,635 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.25 sec
MapReduce Total cumulative CPU time: 7 seconds 250 msec
Ended Job = job_1592876386879_0163
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0165, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0165/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0165
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 15:59:42,370 Stage-2 map = 0%,  reduce = 0%
2020-07-29 15:59:48,660 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.13 sec
2020-07-29 15:59:53,878 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 4.1 sec
MapReduce Total cumulative CPU time: 4 seconds 100 msec
Ended Job = job_1592876386879_0165
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.25 sec   HDFS Read: 12022 HDFS Write: 959 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 4.1 sec   HDFS Read: 7296 HDFS Write: 988 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 350 msec
OK
month_key       date_key        生产总值        grouping__id
202006  20200610        135480.00       0
202006  20200609        200100.00       0
202006  20200608        158500.00       0
202006  20200607        198420.00       0
202006  20200606        158500.00       0
202006  20200605        200140.00       0
202006  20200604        158500.00       0
202006  20200603        189000.00       0
202006  20200602        185000.00       0
202006  20200601        189000.00       0
202005  20200510        198540.00       0
202005  20200509        200000.00       0
202005  20200508        189462.00       0
202005  20200507        159840.00       0
202005  20200506        138500.00       0
202005  20200505        196540.00       0
202005  20200504        138500.00       0
202005  20200503        199000.00       0
202005  20200502        185000.00       0
202005  20200501        199000.00       0
202005  NULL    1804382.00      1
202006  NULL    1772640.00      1
NULL    NULL    3577022.00      3
Time taken: 78.751 seconds, Fetched: 23 row(s)

--date_key在最左侧

hive> set hive.cli.print.header=true;
hive> select month_key,date_key
    >       ,sum(production_amount) `生产总值`
    >       ,GROUPING__ID
    > from     dw.ods_sale_order_producttion_amount
    > group by  date_key,month_key
    > WITH ROLLUP
    > ORDER BY GROUPING__ID;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20200729160309_ad0a688d-edd1-4017-ab14-3407c90d2054
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0166, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0166/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0166
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-07-29 16:03:19,204 Stage-1 map = 0%,  reduce = 0%
2020-07-29 16:03:25,534 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.87 sec
2020-07-29 16:03:30,752 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.77 sec
MapReduce Total cumulative CPU time: 6 seconds 770 msec
Ended Job = job_1592876386879_0166
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0167, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0167/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job  -kill job_1592876386879_0167
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-07-29 16:03:41,661 Stage-2 map = 0%,  reduce = 0%
2020-07-29 16:03:47,991 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 3.47 sec
2020-07-29 16:03:53,245 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 5.59 sec
MapReduce Total cumulative CPU time: 5 seconds 590 msec
Ended Job = job_1592876386879_0167
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.77 sec   HDFS Read: 11941 HDFS Write: 1539 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 5.59 sec   HDFS Read: 7876 HDFS Write: 1638 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 360 msec
OK
month_key       date_key        生产总值        grouping__id
202006  20200610        135480.00       0
202006  20200609        200100.00       0
202006  20200608        158500.00       0
202006  20200607        198420.00       0
202006  20200606        158500.00       0
202006  20200605        200140.00       0
202006  20200604        158500.00       0
202006  20200603        189000.00       0
202006  20200602        185000.00       0
202006  20200601        189000.00       0
202005  20200510        198540.00       0
202005  20200509        200000.00       0
202005  20200508        189462.00       0
202005  20200507        159840.00       0
202005  20200506        138500.00       0
202005  20200505        196540.00       0
202005  20200504        138500.00       0
202005  20200503        199000.00       0
202005  20200502        185000.00       0
202005  20200501        199000.00       0
NULL    20200505        196540.00       1
NULL    20200510        198540.00       1
NULL    20200610        135480.00       1
NULL    20200502        185000.00       1
NULL    20200609        200100.00       1
NULL    20200509        200000.00       1
NULL    20200608        158500.00       1
NULL    20200504        138500.00       1
NULL    20200607        198420.00       1
NULL    20200508        189462.00       1
NULL    20200606        158500.00       1
NULL    20200605        200140.00       1
NULL    20200507        159840.00       1
NULL    20200604        158500.00       1
NULL    20200503        199000.00       1
NULL    20200603        189000.00       1
NULL    20200506        138500.00       1
NULL    20200602        185000.00       1
NULL    20200501        199000.00       1
NULL    20200601        189000.00       1
NULL    NULL    3577022.00      3
Time taken: 44.948 seconds, Fetched: 41 row(s)

结果分析：以最左侧的维度为主，从该维度进行层级聚合，即只会返回group by 后面第一个维度的所有聚合情况；

开窗高级集合函数

窗口函数还支持复杂的数学分析函数，如相关和线性回归函数，标准差和方差函数等，由于不太常用，这里就不一一列举了，给一个传送门，用到的时候可以去参考函数形式，改写成HiveQL就行了,传送门：复杂窗口函数；

你可能感兴趣的:(Hadoop,Hive)

ArgoWorkflow教程(五)---Workflow 的多种触发模式：手动、定时任务与事件触发 devopscicd云原生容器
上一篇我们分析了argo-workflow中的archive，包括流水线GC、流水线归档、日志归档等功能。本篇主要分析Workflow中的几种触发方式，包括手动触发、定时触发、Event事件触发等。1.概述ArgoWorkflows的流水线有多种触发方式：手动触发：手动提交一个Workflow，就会触发一次构建，那么我们创建的流水线，理论上是WorkflowTemplate对象。定时触发：Cron
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
Hadoop 傲雪凌霜，松柏长青后端大数据 hadoop 大数据分布式
ApacheHadoop是一个开源的分布式计算框架，主要用于处理海量数据集。它具有高度的可扩展性、容错性和高效的分布式存储与计算能力。Hadoop核心由四个主要模块组成，分别是HDFS（分布式文件系统）、MapReduce（分布式计算框架）、YARN（资源管理）和HadoopCommon（公共工具和库）。1.HDFS（HadoopDistributedFileSystem）HDFS是Hadoop生
Hadoop架构 henan程序媛 hadoop 大数据分布式
一、案列分析1.1案例概述现在已经进入了大数据(BigData)时代，数以万计用户的互联网服务时时刻刻都在产生大量的交互，要处理的数据量实在是太大了，以传统的数据库技术等其他手段根本无法应对数据处理的实时性、有效性的需求。HDFS顺应时代出现，在解决大数据存储和计算方面有很多的优势。1.2案列前置知识点1.什么是大数据大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的大量数据集合，
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
Presto【基础 01】简介+架构+数据源+数据模型 2401_84254343 程序员架构
一个Catalog包含Schema和Connector。例如，配置JMX的Catalog，通过JXMConnector访问JXM信息。当执行一条SQL语句时，可以同时运行在多个Catalog。Presto处理table时，是通过表的完全限定（fully-qualified）名来找到Catalog。例如，一个表的权限定名是hive.test_data.test，则test是表名，test_data是
hbase介绍 CrazyL- 云计算+大数据 hbase
hbase是一个分布式的、多版本的、面向列的开源数据库hbase利用hadoophdfs作为其文件存储系统，提供高可靠性、高性能、列存储、可伸缩、实时读写、适用于非结构化数据存储的数据库系统hbase利用hadoopmapreduce来处理hbase、中的海量数据hbase利用zookeeper作为分布式系统服务特点：数据量大：一个表可以有上亿行，上百万列（列多时，插入变慢）面向列：面向列（族）的
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
月度总结 | 2022年03月 | 考研与就业的抉择 | 确定未来走大数据开发路线「已注销」个人总结 hadoop
一、时间线梳理3月3日，寻找到同专业的就业伙伴3月5日，着手准备Java八股文，决定先走Java后端路线3月8月，申请到了校图书馆的考研专座，决定暂时放弃就业，先准备考研，买了数学和408的资料书3月9日-3月13日，因疫情原因，宿舍区暂封，这段时间在准备考研，发现内容特别多3月13日-3月19日，大部分时间在刷Hadoop、Zookeeper、Kafka的视频，同时在准备实习的项目3月20日，退
HBase介绍 mingyu1016 数据库
概述HBase是一个分布式的、面向列的开源数据库,源于google的一篇论文《bigtable：一个结构化数据的分布式存储系统》。HBase是GoogleBigtable的开源实现，它利用HadoopHDFS作为其文件存储系统，利用HadoopMapReduce来处理HBase中的海量数据，利用Zookeeper作为协同服务。HBase的表结构HBase以表的形式存储数据。表有行和列组成。列划分为
大数据之flink与hive 星辰_mya 大数据 flink hive
其实吧我不太想写flink，因为线上经验确实不多，这也是我需要补的地方，没有条件创造条件，先来一篇吧flink：高性能低延迟流批一体的分布式计算框架基于事件时间对实时数据精准处理快速响应支持批处理，高效离线分析和数据挖掘数据仓库的引擎丰富数据源/接收器，集成多种数据存储格式和源，比较常见就是咱们今天的主题hive了checkpoint恢复机制，故障恢复快速恢复计算任务分布式弹性扩展，据业务灵活增加
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
hive血缘关系之输入表与目标表的解析 zxfBdd hive 大数据治理大数据
接了一个新需求：需要做数据仓库的血缘关系。正所谓兵来将挡水来土掩，那咱就动手吧。血缘关系是数据治理的一块，其实有专门的第三方数据治理框架，但考虑到目前的线上环境已经趋于稳定，引入新的框架无疑是劳民伤财，伤筋动骨，所以就想以最小的代价把这个事情给做了。目前我们考虑做的血缘关系呢只是做输入表和输出表，最后会形成一张表与表之间的链路图。这个东西的好处就是有助于仓库人员梳理业务，后面可能还会做字段之间的血
Hadoop windows intelij 跑 MR WordCount piziyang12138
一、软件环境我使用的软件版本如下:IntellijIdea2017.1Maven3.3.9Hadoop分布式环境二、创建maven工程打开Idea,file->new->Project,左侧面板选择maven工程。(如果只跑MapReduce创建java工程即可，不用勾选Creatfromarchetype，如果想创建web工程或者使用骨架可以勾选)image.png设置GroupId和Artif
初级练习[3]:Hive SQL子查询应用大数据深度洞察 Hive hive sql hadoop 数据仓库大数据数据库
目录环境准备看如下链接子查询查询所有课程成绩均小于60分的学生的学号、姓名查询没有学全所有课的学生的学号、姓名解释：没有学全所有课，也就是该学生选修的课程数<总的课程数。查询出只选修了三门课程的全部学生的学号和姓名环境准备看如下链接环境准备https://blog.csdn.net/qq_45115959/article/details/142057624?spm=1001.2014.3001.5
Linux下载压缩包：tar.gz、zip、tar.bz2格式全攻略 promise524 Linux linux 运维服务器后端 bash shell
在Linux中，下载各种格式的压缩包（如.tar.gz、.zip、.tar.bz2等）通常使用命令行工具如wget和curl。1.使用wget下载压缩包wget是Linux中最常用的文件下载工具，支持HTTP、HTTPS、FTP等协议，可以直接从命令行下载文件。基本命令：wget[URL]下载.tar.gz文件wgethttps://test.com/archive.tar.gz此命令将从指定的U
Anaconda版本和Python版本对应关系纬领网络 python anaconda3
官网下载地址：https://repo.anaconda.com/archive/下载地址：https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/anaconda3版本基础python版本Anaconda3-2024.06-1Python3.12.4Anaconda3-2024.02-1Python3.11.7Anaconda3-2023.09
Hadoop学习第三课（HDFS架构--读、写流程）小小程序员呀~ 数据库 hadoop 架构 big data
1.块概念举例1：一桶水1000ml，瓶子的规格100ml=>需要10个瓶子装完一桶水1010ml，瓶子的规格100ml=>需要11个瓶子装完一桶水1010ml，瓶子的规格200ml=>需要6个瓶子装完块的大小规格，只要是需要存储，哪怕一点点，也是要占用一个块的块大小的参数：dfs.blocksize官方默认的大小为128M官网：https://hadoop.apache.org/docs/r3.
hadoop启动HDFS命令 m0_67401228 java 搜索引擎 linux 后端
启动命令：/hadoop/sbin/start-dfs.sh停止命令：/hadoop/sbin/stop-dfs.sh
R语言包AMORE安装报错问题以及RStudio与Rtools环境配置卡卡_R-Python R语言数据分析与可视化 r语言开发语言
在使用R语言进行AMORE安装时会遇到报错，这时候需要采用解决办法：'''AMORE包安装，需要离线官网下载安装包：Indexof/src/contrib/Archive/AMORE(r-project.org)https://cran.r-project.org/src/contrib/Archive/AMORE/一、出现的问题最近开始学习R语言，安装了最新版的R4.4.1和RStudio，但安
中级练习[3]：Hive SQL用户行为与商品销售数据分析大数据深度洞察 Hive hive 数据仓库大数据 sql
目录1.用户累计消费金额及VIP等级查询1.1题目需求1.2代码实现2.首次下单后第二天连续下单的用户比率查询2.1题目需求2.2代码实现3.每个商品销售首年的年份、销售数量和销售金额统计3.1题目需求3.2代码实现1.用户累计消费金额及VIP等级查询1.1题目需求从订单信息表(order_info)中统计每个用户截止其每个下单日期的累积消费金额，以及每个用户在其每个下单日期的VIP等级。VIP等
Python基础知识进阶之正则表达式_头歌python正则表达式进阶前端陈萨龙程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
【计算机毕设-大数据方向】基于Hadoop的电商交易数据分析可视化系统的设计与实现程序员-石头山大数据实战案例大数据 hadoop 毕业设计毕设
博主介绍：✌全平台粉丝5W+,高级大厂开发程序员，博客之星、掘金/知乎/华为云/阿里云等平台优质作者。【源码获取】关注并且私信我【联系方式】最下边感兴趣的可以先收藏起来，同学门有不懂的毕设选题，项目以及论文编写等相关问题都可以和学长沟通，希望帮助更多同学解决问题前言随着电子商务行业的迅猛发展，电商平台积累了海量的数据资源，这些数据不仅包括用户的基本信息、购物记录，还包括用户的浏览行为、评价反馈等多
分布式离线计算—Spark—基础介绍测试开发abbey 人工智能—大数据
原文作者：饥渴的小苹果原文地址：【Spark】Spark基础教程目录Spark特点Spark相对于Hadoop的优势Spark生态系统Spark基本概念Spark结构设计Spark各种概念之间的关系Executor的优点Spark运行基本流程Spark运行架构的特点Spark的部署模式Spark三种部署方式Hadoop和Spark的统一部署摘要：Spark是基于内存计算的大数据并行计算框架Spar
spark常用命令我是浣熊的微笑 spark
查看报错日志：yarnlogsapplicationIDspark2-submit--masteryarn--classcom.hik.ReadHdfstest-1.0-SNAPSHOT.jar进入$SPARK_HOME目录，输入bin/spark-submit--help可以得到该命令的使用帮助。hadoop@wyy:/app/hadoop/spark100$bin/spark-submit--
spark启动命令学不会又听不懂 spark 大数据分布式
hadoop启动：cd/root/toolssstart-dfs.sh，只需在hadoop01上启动stop-dfs.sh日志查看：cat/root/toolss/hadoop/logs/hadoop-root-datanode-hadoop03.outzookeeper启动：cd/root/toolss/zookeeperbin/zkServer.shstart，三台都要启动bin/zkServ
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
博客园怎么了？ YYH1992
新年好，给大家拜个早年！今年来到安徽过年，无聊中，不知不觉中又来到博客园了（忠实粉丝哦），却发现一件奇怪的事情，请看截图难道博客园被挂马了？抑或其它问题？如果真有问题，还请dudu抓紧时间修正，免得影响我们园子的声誉！我要下线了，出去买回家的车票了，只能年后回家了。。。转载于:https://www.cnblogs.com/HollisYao/archive/2008/02/06/1065351.
Nginx负载均衡 510888780 nginx 应用服务器
Nginx负载均衡一些基础知识: nginx 的 upstream目前支持 4 种方式的分配 1)、轮询（默认）每个请求按时间顺序逐一分配到不同的后端服务器，如果后端服务器down掉，能自动剔除。 2)、weight 指定轮询几率，weight和访问比率成正比
RedHat 6.4 安装 rabbitmq bylijinnan erlang rabbitmq redhat
在 linux 下安装软件就是折腾，首先是测试机不能上外网要找运维开通，开通后发现测试机的 yum 不能使用于是又要配置 yum 源，最后安装 rabbitmq 时也尝试了两种方法最后才安装成功机器版本： [root@redhat1 rabbitmq]# lsb_release LSB Version: :base-4.0-amd64:base-4.0-noarch:core
FilenameUtils工具类 eksliang FilenameUtils common-io
转载请出自出处：http://eksliang.iteye.com/blog/2217081 一、概述这是一个Java操作文件的常用库，是Apache对java的IO包的封装，这里面有两个非常核心的类FilenameUtils跟FileUtils，其中FilenameUtils是对文件名操作的封装;FileUtils是文件封装，开发中对文件的操作，几乎都可以在这个框架里面找到。非常的好用。
xml文件解析SAX 不懂事的小屁孩 xml
xml文件解析:xml文件解析有四种方式， 1.DOM生成和解析XML文档(SAX是基于事件流的解析) 2.SAX生成和解析XML文档(基于XML文档树结构的解析) 3.DOM4J生成和解析XML文档 4.JDOM生成和解析XML 本文章用第一种方法进行解析，使用android常用的DefaultHandler import org.xml.sax.Attributes;
通过定时任务执行mysql的定期删除和新建分区，此处是按日分区酷的飞上天空 mysql
使用python脚本作为命令脚本，linux的定时任务来每天定时执行 #!/usr/bin/python # -*- coding: utf8 -*- import pymysql import datetime import calendar #要分区的表 table_name = 'my_table' #连接数据库的信息 host,user,passwd,db =
如何搭建数据湖架构？听听专家的意见蓝儿唯美架构
Edo Interactive在几年前遇到一个大问题：公司使用交易数据来帮助零售商和餐馆进行个性化促销，但其数据仓库没有足够时间去处理所有的信用卡和借记卡交易数据 “我们要花费27小时来处理每日的数据量，”Edo主管基础设施和信息系统的高级副总裁Tim Garnto说道：“所以在2013年，我们放弃了现有的基于PostgreSQL的关系型数据库系统，使用了Hadoop集群作为公司的数
spring学习——控制反转与依赖注入 a-john spring
控制反转（Inversion of Control，英文缩写为IoC）是一个重要的面向对象编程的法则来削减计算机程序的耦合问题，也是轻量级的Spring框架的核心。控制反转一般分为两种类型，依赖注入（Dependency Injection，简称DI）和依赖查找（Dependency Lookup）。依赖注入应用比较广泛。
用spool+unixshell生成文本文件的方法 aijuans xshell
例如我们把scott.dept表生成文本文件的语句写成dept.sql,内容如下: 　　set pages 50000; 　　set lines 200; 　　set trims on; 　　set heading off; 　　spool /oracle_backup/log/test/dept.lst; 　　select deptno||','||dname||','||loc
1、基础--名词解析(OOA/OOD/OOP) asia007 学习基础知识
OOA:Object-Oriented Analysis（面向对象分析方法）是在一个系统的开发过程中进行了系统业务调查以后，按照面向对象的思想来分析问题。OOA与结构化分析有较大的区别。OOA所强调的是在系统调查资料的基础上，针对OO方法所需要的素材进行的归类分析和整理，而不是对管理业务现状和方法的分析。　　OOA（面向对象的分析）模型由5个层次（主题层、对象类层、结构层、属性层和服务层）
浅谈java转成json编码格式技术百合不是茶 json编码 java转成json编码
json编码;是一个轻量级的数据存储和传输的语言在java中需要引入json相关的包,引包方式在工程的lib下就可以了 JSON与JAVA数据的转换（JSON 即 JavaScript Object Natation，它是一种轻量级的数据交换格式，非常适合于服务器与 JavaScript 之间的数据的交
web.xml之Spring配置(基于Spring+Struts+Ibatis) bijian1013 java web.xml SSI spring配置
指定Spring配置文件位置 <context-param> <param-name>contextConfigLocation</param-name> <param-value> /WEB-INF/spring-dao-bean.xml,/WEB-INF/spring-resources.xml, /WEB-INF/
Installing SonarQube（Fail to download libraries from server） sunjing Install Sonar
1. Download and unzip the SonarQube distribution 2. Starting the Web Server The default port is "9000" and the context path is "/". These values can be changed in &l
【MongoDB学习笔记十一】Mongo副本集基本的增删查 bit1129 mongodb
一、创建复本集假设mongod,mongo已经配置在系统路径变量上，启动三个命令行窗口，分别执行如下命令： mongod --port 27017 --dbpath data1 --replSet rs0 mongod --port 27018 --dbpath data2 --replSet rs0 mongod --port 27019 -
Anychart图表系列二之执行Flash和HTML5渲染白糖_ Flash
今天介绍Anychart的Flash和HTML5渲染功能 HTML5 Anychart从6.0第一个版本起，已经逐渐开始支持各种图的HTML5渲染效果了，也就是说即使你没有安装Flash插件，只要浏览器支持HTML5，也能看到Anychart的图形（不过这些是需要做一些配置的）。这里要提醒下大家，Anychart6.0版本对HTML5的支持还不算很成熟，目前还处于
Laravel版本更新异常4.2.8-> 4.2.9 Declaration of ... CompilerEngine ... should be compa bozch laravel
昨天在为了把laravel升级到最新的版本，突然之间就出现了如下错误： ErrorException thrown with message "Declaration of Illuminate\View\Engines\CompilerEngine::handleViewException() should be compatible with Illuminate\View\Eng
编程之美-NIM游戏分析-石头总数为奇数时如何保证先动手者必胜 bylijinnan 编程之美
import java.util.Arrays; import java.util.Random; public class Nim { /**编程之美 NIM游戏分析问题：有N块石头和两个玩家A和B，玩家A先将石头随机分成若干堆，然后按照BABA...的顺序不断轮流取石头，能将剩下的石头一次取光的玩家获胜，每次取石头时，每个玩家只能从若干堆石头中任选一堆，
lunce创建索引及简单查询 chengxuyuancsdn 查询创建索引 lunce
import java.io.File; import java.io.IOException; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Docume
[IT与投资]坚持独立自主的研究核心技术 comsci it
和别人合作开发某项产品....如果互相之间的技术水平不同,那么这种合作很难进行,一般都会成为强者控制弱者的方法和手段..... 所以弱者,在遇到技术难题的时候,最好不要一开始就去寻求强者的帮助,因为在我们这颗星球上,生物都有一种控制其
flashback transaction闪回事务查询 daizj oracle sql 闪回事务
闪回事务查询有别于闪回查询的特点有以下3个：（1）其正常工作不但需要利用撤销数据，还需要事先启用最小补充日志。（2）返回的结果不是以前的“旧”数据，而是能够将当前数据修改为以前的样子的撤销SQL（Undo SQL）语句。（3）集中地在名为flashback_transaction_query表上查询，而不是在各个表上通过“as of”或“vers
Java I/O之FilenameFilter类列举出指定路径下某个扩展名的文件游其是你 FilenameFilter
这是一个FilenameFilter类用法的例子，实现的列举出“c:\\folder“路径下所有以“.jpg”扩展名的文件。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
C语言学习五函数，函数的前置声明以及如何在软件开发中合理的设计函数来解决实际问题 dcj3sjt126com c
# include <stdio.h> int f(void) //括号中的void表示该函数不能接受数据，int表示返回的类型为int类型 { return 10; //向主调函数返回10 } void g(void) //函数名前面的void表示该函数没有返回值 { //return 10; //error 与第8行行首的void相矛盾 } in
今天在测试环境使用yum安装，遇到一个问题： Error: Cannot retrieve metalink for repository: epel. Pl dcj3sjt126com centos
今天在测试环境使用yum安装，遇到一个问题： Error: Cannot retrieve metalink for repository: epel. Please verify its path and try again 处理很简单，修改文件“/etc/yum.repos.d/epel.repo”，将baseurl的注释取消， mirrorlist注释掉。即可。 &n
单例模式 shuizhaosi888 单例模式
单例模式懒汉式 public class RunMain { /** * 私有构造 */ private RunMain() { } /** * 内部类，用于占位，只有 */ private static class SingletonRunMain { priv
Spring Security（09）——Filter 234390216 Spring Security
Filter 目录 1.1 Filter顺序 1.2 添加Filter到FilterChain 1.3 DelegatingFilterProxy 1.4 FilterChainProxy 1.5
公司项目NODEJS实践0.1 逐行分析JS源代码 mongodb nginx ubuntu nodejs
一、前言前端如何独立用nodeJs实现一个简单的注册、登录功能，是不是只用nodejs+sql就可以了？其实是可以实现，但离实际应用还有距离，那要怎么做才是实际可用的。网上有很多nod
java.lang.Math liuhaibo_ljf java Math lang
System.out.println(Math.PI); System.out.println(Math.abs(1.2)); System.out.println(Math.abs(1.2)); System.out.println(Math.abs(1)); System.out.println(Math.abs(111111111)); System.out.println(Mat
linux下时间同步 nonobaba ntp
今天在linux下做hbase集群的时候，发现hmaster启动成功了，但是用hbase命令进入shell的时候报了一个错误 PleaseHoldException: Master is initializing，查看了日志，大致意思是说master和slave时间不同步，没办法，只好找一种手动同步一下，后来发现一共部署了10来台机器，手动同步偏差又比较大，所以还是从网上找现成的解决方
ZooKeeper3.4.6的集群部署 roadrunners zookeeper 集群部署
ZooKeeper是Apache的一个开源项目，在分布式服务中应用比较广泛。它主要用来解决分布式应用中经常遇到的一些数据管理问题，如：统一命名服务、状态同步、集群管理、配置文件管理、同步锁、队列等。这里主要讲集群中ZooKeeper的部署。 1、准备工作我们准备3台机器做ZooKeeper集群，分别在3台机器上创建ZooKeeper需要的目录。数据存储目录
Java高效读取大文件 tomcat_oracle java
　　读取文件行的标准方式是在内存中读取，Guava 和Apache Commons IO都提供了如下所示快速读取文件行的方法：　　Files.readLines(new File(path), Charsets.UTF_8); 　　FileUtils.readLines(new File(path)); 　　这种方法带来的问题是文件的所有行都被存放在内存中，当文件足够大时很快就会导致
微信支付api返回的xml转换为Map的方法 xu3508620 xml map 微信api
举例如下： <xml> <return_code><![CDATA[SUCCESS]]></return_code> <return_msg><![CDATA[OK]]></return_msg> <appid><