bluexue0724

Hive 开窗函数汇总

开窗函数

普通的聚合函数聚合的行集是组,开窗函数聚合的行集是窗口。因此,普通的聚合函数每组(Group by)只返回一个值，而开窗函数则可为窗口中的每行都返回一个值。简单理解，就是对查询的结果多出一列，这一列可以是聚合值，也可以是排序值。
开窗函数一般分为两类,聚合开窗函数和排序开窗函数。

-- hive窗口和分析函数
SET mapreduce.job.queuename=qiuyanx
-- 1、不聚合(即没有真正聚合, 表原先有多少行, 结果还是多少行)的情况下, 运行各种聚合函数, 且可以随意指定分组方式(相当于在同一个sql中有多个group by),
SELECT 
 c1, 
 c2,
 c3,
 COUNT(c2) OVER (PARTITION BY c1) AS `count`, -- 相当于group by c1, 然后count
 avg(c2) OVER (PARTITION BY c1, c3) AS `avg`, -- 相当于group by c1, c3, 然后avg
 avg(c2) OVER (PARTITION BY c1, c3 ORDER BY c2 ASC) AS `avg2`, -- 相当于group by c1, c3, 然后avg至当前行  
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ) AS `sum`,  -- 相当于group by c1, c3, 然后依次sum至当前行
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS `sum1`, -- 相当于group by c1, c3, 然后从"无边界"处sum至当前行， 等价于上面一行的sum
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS `sum2`, -- 相当于group by c1,c3, 然后组内给个行号, 只取自己上边一行和自己这一行的数据进行sum
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING ) AS `sum3`, -- 同上, 也是先分组，然后组内序号, 从当前行上边1行,加上自己这一行，再加上自己下边的2行, 进行sum
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS `sum4`, -- 同上, 只是指定这个累加的窗口： 以当前行为基准: 上n行, 下n行, 上无边界, 下无边界
 LEAD(c2) OVER (PARTITION BY c1 ORDER BY c2) `test1`, -- 行错位(往前错位): 将c2列错位使用， 只错1位
 LEAD(c2, 2) OVER (PARTITION BY c1 ORDER BY c2) `test2`, -- 行错位(往前错位): 将c2列错位使用， 只错2位
 LEAD(c2, 3, 999) OVER (PARTITION BY c1 ORDER BY c2) `test2`, -- 行错位(往前错位): 将c2列错位使用， 只错3位, 错位3个, 那么会有两个行找不到错位行, 用99补充
 LAG(c3, 3, 'X') OVER (PARTITION BY c1 ORDER BY c2) `test3`  -- 行错位(往后错位): 将c2列错位使用， 只错3位, 错位3个, 那么会有两个行找不到错位行, 用X补充
FROM ( 
 SELECT 'a' AS c1, 1 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'b' AS c1, 2 AS c2, 'B' AS c3 UNION ALL 
 SELECT 'c' AS c1, 3 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'd' AS c1, 4 AS c2, 'C' AS c3 UNION ALL 
 SELECT 'a' AS c1, 5 AS c2, 'E' AS c3 UNION ALL 
 SELECT 'b' AS c1, 3 AS c2, 'F' AS c3 UNION ALL 
 SELECT 'a' AS c1, 7 AS c2, 'F' AS c3 UNION ALL 
 SELECT 'c' AS c1, 6 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'a' AS c1, 8 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'f' AS c1, 9 AS c2, 'D' AS c3 UNION ALL 
 SELECT 'a' AS c1, 2 AS c2, 'D' AS c3 UNION ALL 
 SELECT 'a' AS c1, 4 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'b' AS c1, 6 AS c2, 'C' AS c3 ) t
-- 综上： 可以不用group by 而做的聚合， 而且聚合时还可以指定顺序，或者指定要计算的窗口大小！ 以后hive中可以再也不用自关联了~~~~~~


-- 2、使用window字句, 简洁select 的字段! 其实下边这个值等价于上面的
SELECT 
 c1, 
 c2,
 c3,
 sum(c2) over w
FROM ( 
 SELECT 'a' AS c1, 1 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'b' AS c1, 2 AS c2, 'B' AS c3 UNION ALL 
 SELECT 'c' AS c1, 3 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'd' AS c1, 4 AS c2, 'C' AS c3 UNION ALL 
 SELECT 'a' AS c1, 5 AS c2, 'E' AS c3 UNION ALL 
 SELECT 'b' AS c1, 3 AS c2, 'F' AS c3 UNION ALL 
 SELECT 'a' AS c1, 7 AS c2, 'F' AS c3 UNION ALL 
 SELECT 'c' AS c1, 6 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'a' AS c1, 8 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'f' AS c1, 9 AS c2, 'D' AS c3 UNION ALL 
 SELECT 'a' AS c1, 2 AS c2, 'D' AS c3 UNION ALL 
 SELECT 'a' AS c1, 4 AS c2, 'A' AS c3 UNION ALL 
 SELECT 'b' AS c1, 6 AS c2, 'C' AS c3 ) t WINDOW w AS (PARTITION BY c1,c3 ORDER BY c2 ROWS UNBOUNDED PRECEDING)

聚合开窗函数

数据准备

-- 建表
create table student_scores(
id int,
studentId int,
language int,
math int,
english int,
classId string,
departmentId string
);
-- 写入数据
insert into table student_scores values 
  (1,111,68,69,90,'class1','department1'),
  (2,112,73,80,96,'class1','department1'),
  (3,113,90,74,75,'class1','department1'),
  (4,114,89,94,93,'class1','department1'),
  (5,115,99,93,89,'class1','department1'),
  (6,121,96,74,79,'class2','department1'),
  (7,122,89,86,85,'class2','department1'),
  (8,123,70,78,61,'class2','department1'),
  (9,124,76,70,76,'class2','department1'),
  (10,211,89,93,60,'class1','department2'),
  (11,212,76,83,75,'class1','department2'),
  (12,213,71,94,90,'class1','department2'),
  (13,214,94,94,66,'class1','department2'),
  (14,215,84,82,73,'class1','department2'),
  (15,216,85,74,93,'class1','department2'),
  (16,221,77,99,61,'class2','department2'),
  (17,222,80,78,96,'class2','department2'),
  (18,223,79,74,96,'class2','department2'),
  (19,224,75,80,78,'class2','department2'),
  (20,225,82,85,63,'class2','department2');

count开窗函数


select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
count(math) over() as count1,
 -- 以按classId分组的所有行作为窗口
count(math) over(partition by classId) as count2,
 -- 以按classId分组、按math排序的所有行作为窗口
count(math) over(partition by classId order by math) as count3,
 -- 以按classId分组、按math排序、按 当前行+往前1行+往后2行的行作为窗口
count(math) over(partition by classId order by math rows between 1 preceding and 2 following) as count4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid count1  count2  count3  count4
111         69      department1     class1  9       5       1       3
113         74      department1     class1  9       5       2       4
112         80      department1     class1  9       5       3       4
115         93      department1     class1  9       5       4       3
114         94      department1     class1  9       5       5       2
124         70      department1     class2  9       4       1       3
121         74      department1     class2  9       4       2       4
123         78      department1     class2  9       4       3       3
122         86      department1     class2  9       4       4       2

结果解释:
studentid=115,count1为所有的行数9,count2为分区class1中的行数5,count3为分区class1中math值<=93的行数4,
count4为分区class1中math值向前+1行向后+2行(实际只有1行)的总行数3。

sum开窗函数

-- sum开窗函数

select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
sum(math) over() as sum1,
-- 以按classId分组的所有行作为窗口
sum(math) over(partition by classId) as sum2,
 -- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口
sum(math) over(partition by classId order by math) as sum3,
 -- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口
sum(math) over(partition by classId order by math rows between 1 preceding and 2 following) as sum4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid sum1    sum2    sum3    sum4
111         69      department1     class1  718     410     69      223
113         74      department1     class1  718     410     143     316
112         80      department1     class1  718     410     223     341
115         93      department1     class1  718     410     316     267
114         94      department1     class1  718     410     410     187
124         70      department1     class2  718     308     70      222
121         74      department1     class2  718     308     144     308
123         78      department1     class2  718     308     222     238
122         86      department1     class2  718     308     308     164

结果解释:
    同count开窗函数

 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS `sum1`, -- 相当于group by c1, c3, 然后从"无边界"处sum至当前行， 等价于上面一行的sum
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) AS `sum2`, -- 相当于group by c1,c3, 然后组内给个行号, 只取自己上边一行和自己这一行的数据进行sum
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING ) AS `sum3`, -- 同上, 也是先分组，然后组内序号, 从当前行上边1行,加上自己这一行，再加上自己下边的2行, 进行sum
 sum(c2) OVER ( PARTITION BY c1, c3 ORDER BY c2 ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS `sum4`, -- 同上, 只是指定这个累加的窗口： 以当前行为基准: 上n行, 下n行, 上无边界, 下无边界

min开窗函数

-- min 开窗函数

select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
min(math) over() as min1,
-- 以按classId分组的所有行作为窗口
min(math) over(partition by classId) as min2,
 -- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口
min(math) over(partition by classId order by math) as min3,
 -- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口
min(math) over(partition by classId order by math rows between 1 preceding and 2 following) as min4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid min1    min2    min3    min4
111         69      department1     class1  69      69      69      69
113         74      department1     class1  69      69      69      69
112         80      department1     class1  69      69      69      74
115         93      department1     class1  69      69      69      80
114         94      department1     class1  69      69      69      93
124         70      department1     class2  69      70      70      70
121         74      department1     class2  69      70      70      70
123         78      department1     class2  69      70      70      74
122         86      department1     class2  69      70      70      78

结果解释:
    同count开窗函数

max开窗函数

-- max 开窗函数

select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
max(math) over() as max1,
-- 以按classId分组的所有行作为窗口
max(math) over(partition by classId) as max2,
 -- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口
max(math) over(partition by classId order by math) as max3,
 -- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口
max(math) over(partition by classId order by math rows between 1 preceding and 2 following) as max4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid max1    max2    max3    max4
111         69      department1     class1  94      94      69      80
113         74      department1     class1  94      94      74      93
112         80      department1     class1  94      94      80      94
115         93      department1     class1  94      94      93      94
114         94      department1     class1  94      94      94      94
124         70      department1     class2  94      86      70      78
121         74      department1     class2  94      86      74      86
123         78      department1     class2  94      86      78      86
122         86      department1     class2  94      86      86      86

结果解释:
    同count开窗函数

avg开窗函数

-- avg 开窗函数

select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
avg(math) over() as avg1,
-- 以按classId分组的所有行作为窗口
avg(math) over(partition by classId) as avg2,
 -- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口
avg(math) over(partition by classId order by math) as avg3,
 -- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口
avg(math) over(partition by classId order by math rows between 1 preceding and 2 following) as avg4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid avg1                avg2    avg3                avg4
111         69      department1     class1  79.77777777777777   82.0    69.0                74.33333333333333
113         74      department1     class1  79.77777777777777   82.0    71.5                79.0
112         80      department1     class1  79.77777777777777   82.0    74.33333333333333   85.25
115         93      department1     class1  79.77777777777777   82.0    79.0                89.0
114         94      department1     class1  79.77777777777777   82.0    82.0                93.5
124         70      department1     class2  79.77777777777777   77.0    70.0                74.0
121         74      department1     class2  79.77777777777777   77.0    72.0                77.0
123         78      department1     class2  79.77777777777777   77.0    74.0                79.33333333333333
122         86      department1     class2  79.77777777777777   77.0    77.0                82.0

结果解释:
    同count开窗函数

first_value开窗函数

返回分区中的第一个值。

-- first_value 开窗函数

select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
first_value(math) over() as first_value1,
-- 以按classId分组的所有行作为窗口
first_value(math) over(partition by classId) as first_value2,
 -- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口
first_value(math) over(partition by classId order by math) as first_value3,
 -- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口
first_value(math) over(partition by classId order by math rows between 1 preceding and 2 following) as first_value4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid first_value1    first_value2    first_value3    first_value4
111         69      department1     class1  69              69              69              69
113         74      department1     class1  69              69              69              69
112         80      department1     class1  69              69              69              74
115         93      department1     class1  69              69              69              80
114         94      department1     class1  69              69              69              93
124         70      department1     class2  69              74              70              70
121         74      department1     class2  69              74              70              70
123         78      department1     class2  69              74              70              74
122         86      department1     class2  69              74              70              78

结果解释:
    studentid=124 first_value1:第一个值是69,first_value2:classId=class1分区 math的第一个值是69。

last_value开窗函数

返回分区中的最后一个值。

-- last_value 开窗函数

select studentId,math,departmentId,classId,
-- 以符合条件的所有行作为窗口
last_value(math) over() as last_value1,
-- 以按classId分组的所有行作为窗口
last_value(math) over(partition by classId) as last_value2,
 -- 以按classId分组、按math排序后、按到当前行(含当前行)的所有行作为窗口
last_value(math) over(partition by classId order by math) as last_value3,
 -- 以按classId分组、按math排序后、按当前行+往前1行+往后2行的行作为窗口
last_value(math) over(partition by classId order by math rows between 1 preceding and 2 following) as last_value4
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid last_value1 last_value2 last_value3 last_value4
111         69      department1     class1  70          93          69          80
113         74      department1     class1  70          93          74          93
112         80      department1     class1  70          93          80          94
115         93      department1     class1  70          93          93          94
114         94      department1     class1  70          93          94          94
124         70      department1     class2  70          70          70          78
121         74      department1     class2  70          70          74          86
123         78      department1     class2  70          70          78          86
122         86      department1     class2  70          70          86          86

lag开窗函数

lag(col,n,default) 用于统计窗口内往上第n个值。
col:列名
n:往上第n行
default:往上第n行为NULL时候，取默认值,不指定则取NULL

-- lag 开窗函数

select studentId,math,departmentId,classId,
 --窗口内 往上取第二个 取不到时赋默认值60
lag(math,2,60) over(partition by classId order by math) as lag1,
 --窗口内 往上取第二个 取不到时赋默认值NULL
lag(math,2) over(partition by classId order by math) as lag2
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid lag1    lag2
111         69      department1     class1  60      NULL
113         74      department1     class1  60      NULL
112         80      department1     class1  69      69
115         93      department1     class1  74      74
114         94      department1     class1  80      80
124         70      department1     class2  60      NULL
121         74      department1     class2  60      NULL
123         78      department1     class2  70      70
122         86      department1     class2  74      74

结果解释:
    第3行 lag1:窗口内(69 74 80) 当前行80 向上取第二个值为69
    倒数第3行 lag2:窗口内(70 74) 当前行74 向上取第二个值为NULL

lead开窗函数

lead(col,n,default) 用于统计窗口内往下第n个值。
col:列名
n:往下第n行
default:往下第n行为NULL时候，取默认值,不指定则取NULL

-- lead开窗函数

select studentId,math,departmentId,classId,
 --窗口内 往下取第二个 取不到时赋默认值60
lead(math,2,60) over(partition by classId order by math) as lead1,
 --窗口内 往下取第二个 取不到时赋默认值NULL
lead(math,2) over(partition by classId order by math) as lead2
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid lead1   lead2
111         69      department1     class1  80      80
113         74      department1     class1  93      93
112         80      department1     class1  94      94
115         93      department1     class1  60      NULL
114         94      department1     class1  60      NULL
124         70      department1     class2  78      78
121         74      department1     class2  86      86
123         78      department1     class2  60      NULL
122         86      department1     class2  60      NULL

结果解释:
    第4行lead1 窗口内向下第二个值为空，赋值60

cume_dist 开窗函数

计算某个窗口或分区中某个值的累积分布。假定升序排序，则使用以下公式确定累积分布：
小于等于当前值x的行数 / 窗口或partition分区内的总行数。其中，x 等于 order by 子句中指定的列的当前行中的值。

-- cume_dist 开窗函数

select studentId,math,departmentId,classId,
-- 统计小于等于当前分数的人数占总人数的比例
cume_dist() over(order by math) as cume_dist1,
-- 统计大于等于当前分数的人数占总人数的比例
cume_dist() over(order by math desc) as cume_dist2,
-- 统计分区内小于等于当前分数的人数占总人数的比例
cume_dist() over(partition by classId order by math) as cume_dist3
from student_scores where departmentId='department1';

结果
studentid   math    departmentid    classid cume_dist1              cume_dist2          cume_dist3
111         69      department1     class1  0.1111111111111111      1.0                 0.2
113         74      department1     class1  0.4444444444444444      0.7777777777777778  0.4
112         80      department1     class1  0.6666666666666666      0.4444444444444444  0.6
115         93      department1     class1  0.8888888888888888      0.2222222222222222  0.8
114         94      department1     class1  1.0                     0.1111111111111111  1.0
124         70      department1     class2  0.2222222222222222      0.8888888888888888  0.25
121         74      department1     class2  0.4444444444444444      0.7777777777777778  0.5
123         78      department1     class2  0.5555555555555556      0.5555555555555556  0.75
122         86      department1     class2  0.7777777777777778      0.3333333333333333  1.0

结果解释:
    第三行:
        cume_dist1=小于等于80的人数为6/总人数9=0.6666666666666666
        cume_dist2=大于等于80的人数为4/总人数9=0.4444444444444444
        cume_dist3=分区内小于等于80的人数为3/分区内总人数5=0.6

排序开窗函数

rank开窗函数

rank 开窗函数基于 over 子句中的 order by 确定一组值中一个值的排名。如果存在partition by ,则为每个分区组中的每个值排名。排名可能不是连续的。例如，如果两个行的排名为 1，则下一个排名为 3。

-- rank 开窗函数

select *,
-- 对全部学生按数学分数排序 
rank() over(order by math) as rank1,
-- 对院系 按数学分数排序
rank() over(partition by departmentId order by math) as rank2,
-- 对每个院系每个班级 按数学分数排序
rank() over(partition by departmentId,classId order by math) as rank3
from student_scores;

结果

id  studentid   language    math    english     classid departmentid    rank1   rank2   rank3
1   111         68          69      90          class1  department1     1       1       1
3   113         90          74      75          class1  department1     3       3       2
2   112         73          80      96          class1  department1     9       6       3
5   115         99          93      89          class1  department1     15      8       4
4   114         89          94      93          class1  department1     17      9       5
9   124         76          70      76          class2  department1     2       2       1
6   121         96          74      79          class2  department1     3       3       2
8   123         70          78      61          class2  department1     7       5       3
7   122         89          86      85          class2  department1     14      7       4
15  216         85          74      93          class1  department2     3       1       1
14  215         84          82      73          class1  department2     11      5       2
11  212         76          83      75          class1  department2     12      6       3
10  211         89          93      60          class1  department2     15      8       4
12  213         71          94      90          class1  department2     17      9       5
13  214         94          94      66          class1  department2     17      9       5
18  223         79          74      96          class2  department2     3       1       1
17  222         80          78      96          class2  department2     7       3       2
19  224         75          80      78          class2  department2     9       4       3
20  225         82          85      63          class2  department2     13      7       4
16  221         77          99      61          class2  department2     20      11      5

dense_rank开窗函数

dense_rank与rank有一点不同,当排名一样的时候,接下来的行是连续的。如两个行的排名为 1，则下一个排名为 2。

-- dense_rank 开窗函数

select *,
-- 对全部学生按数学分数排序
dense_rank() over(order by math) as dense_rank1,
-- 对院系 按数学分数排序
dense_rank() over(partition by departmentId order by math) as dense_rank2,
-- 对每个院系每个班级 按数学分数排序
dense_rank() over(partition by departmentId,classId order by math) as dense_rank3
from student_scores;

结果:
id  studentid   language    math    english classid departmentid    dense_rank1 dense_rank2 dense_rank3
1   111         68          69      90      class1  department1     1           1           1
3   113         90          74      75      class1  department1     3           3           2
2   112         73          80      96      class1  department1     5           5           3
5   115         99          93      89      class1  department1     10          7           4
4   114         89          94      93      class1  department1     11          8           5
9   124         76          70      76      class2  department1     2           2           1
6   121         96          74      79      class2  department1     3           3           2
8   123         70          78      61      class2  department1     4           4           3
7   122         89          86      85      class2  department1     9           6           4
15  216         85          74      93      class1  department2     3           1           1
14  215         84          82      73      class1  department2     6           4           2
11  212         76          83      75      class1  department2     7           5           3
10  211         89          93      60      class1  department2     10          7           4
12  213         71          94      90      class1  department2     11          8           5
13  214         94          94      66      class1  department2     11          8           5
18  223         79          74      96      class2  department2     3           1           1
17  222         80          78      96      class2  department2     4           2           2
19  224         75          80      78      class2  department2     5           3           3
20  225         82          85      63      class2  department2     8           6           4
16  221         77          99      61      class2  department2     12          9           5

ntile开窗函数

将分区中已排序的行划分为大小尽可能相等的指定数量的排名的组，并返回给定行所在的组的排名。

-- ntile 开窗函数

select *,
-- 对分区内的数据分成两组
ntile(2) over(partition by departmentid order by math) as ntile1,
-- 对分区内的数据分成三组
ntile(3) over(partition by departmentid order by math) as ntile2
from student_scores;

结果
id  studentid   language    math    english classid departmentid    ntile1  ntile2
1   111         68          69      90      class1  department1     1       1
9   124         76          70      76      class2  department1     1       1
6   121         96          74      79      class2  department1     1       1
3   113         90          74      75      class1  department1     1       2
8   123         70          78      61      class2  department1     1       2
2   112         73          80      96      class1  department1     2       2
7   122         89          86      85      class2  department1     2       3
5   115         99          93      89      class1  department1     2       3
4   114         89          94      93      class1  department1     2       3
18  223         79          74      96      class2  department2     1       1
15  216         85          74      93      class1  department2     1       1
17  222         80          78      96      class2  department2     1       1
19  224         75          80      78      class2  department2     1       1
14  215         84          82      73      class1  department2     1       2
11  212         76          83      75      class1  department2     1       2
20  225         82          85      63      class2  department2     2       2
10  211         89          93      60      class1  department2     2       2
12  213         71          94      90      class1  department2     2       3
13  214         94          94      66      class1  department2     2       3
16  221         77          99      61      class2  department2     2       3

结果解释:
    第8行
        ntile1:对分区的数据均匀分成2组后，当前行的组排名为2
        ntile2:对分区的数据均匀分成3组后，当前行的组排名为3

row_number开窗函数

从1开始对分区内的数据排序。

-- row_number 开窗函数

select studentid,departmentid,classid,math,
-- 对分区departmentid,classid内的数据按math排序
row_number() over(partition by departmentid,classid order by math) as row_number
from student_scores;

结果
studentid   departmentid    classid math    row_number
111         department1     class1  69      1
113         department1     class1  74      2
112         department1     class1  80      3
115         department1     class1  93      4
114         department1     class1  94      5
124         department1     class2  70      1
121         department1     class2  74      2
123         department1     class2  78      3
122         department1     class2  86      4
216         department2     class1  74      1
215         department2     class1  82      2
212         department2     class1  83      3
211         department2     class1  93      4
213         department2     class1  94      5
214         department2     class1  94      6
223         department2     class2  74      1
222         department2     class2  78      2
224         department2     class2  80      3
225         department2     class2  85      4
221         department2     class2  99      5

结果解释:
    同一分区,相同值，不同序。如studentid=213 studentid=214 值都为94 排序为5,6。

percent_rank开窗函数

计算给定行的百分比排名。可以用来计算超过了百分之多少的人。如360小助手开机速度超过了百分之多少的人。
(当前行的rank值-1)/(分组内的总行数-1)

-- percent_rank 开窗函数

select studentid,departmentid,classid,math,
row_number() over(partition by departmentid,classid order by math) as row_number,
percent_rank() over(partition by departmentid,classid order by math) as percent_rank
from student_scores;

结果
studentid   departmentid    classid math    row_number  percent_rank
111         department1     class1  69      1           0.0
113         department1     class1  74      2           0.25
112         department1     class1  80      3           0.5
115         department1     class1  93      4           0.75
114         department1     class1  94      5           1.0
124         department1     class2  70      1           0.0
121         department1     class2  74      2           0.3333333333333333
123         department1     class2  78      3           0.6666666666666666
122         department1     class2  86      4           1.0
216         department2     class1  74      1           0.0
215         department2     class1  82      2           0.2
212         department2     class1  83      3           0.4
211         department2     class1  93      4           0.6
213         department2     class1  94      5           0.8
214         department2     class1  94      6           0.8
223         department2     class2  74      1           0.0
222         department2     class2  78      2           0.25
224         department2     class2  80      3           0.5
225         department2     class2  85      4           0.75
221         department2     class2  99      5           1.0

结果解释:
    studentid=115,percent_rank=(4-1)/(5-1)=0.75
    studentid=123,percent_rank=(3-1)/(4-1)=0.6666666666666666

你可能感兴趣的:(hive,开窗函数)

Linux(Centos 7.6)命令详解：unzip 豆是浪个 linux centos 运维
1.命令作用unzip用于在一个ZIP存档压缩文件中进行陈列/检测/提取文件(list,testandextractcompressedfilesinaZIParchive)；unzip命令是Linux系统中用于解压缩ZIP格式压缩文件的常用工具。它能够快速、方便地将ZIP文件解压到当前目录或指定目录。2.命令语法Usage:unzip[-Z][-opts[modifiers]]file[.zip
Linux(Centos 7.6)命令详解：zip 豆是浪个 linux 运维服务器
1.命令作用打包和压缩(存档)文件(packageandcompress(archive)files)；该程序用于打包一组文件进行分发；存档文件；通过临时压缩未使用的文件或目录来节省磁盘空间；且压缩文件可以在Linux、Windows和macOS中轻松提取。2.命令语法usage:zip[-options][-bpath][-tmmddyyyy][-nsuffixes][zipfilelist][
大数据实战：Spark + Hive 逐笔计算用户盈亏 WuJiWeb3 区块链链上数据分析从0到1搭建区块链大数据平台 spark hive 大数据 web3 区块链 hadoop
简介本文将通过使用Spark+Hive实现逐笔计算区块链上用户交易数据的盈亏需求。由于我们是进行离线计算，所以我们的数据源是Hive表数据，Sink表也是Hive表，即Spark读取Hive表数据进行批计算之后写回到Hive表并供后续使用。通过本文你将会学到：如何使用SparkSQLAPI读取Hive数据源如何通过读取配置文件进行传参执行SQL如何将SparkSQL转换为JavaRDD进行处理如何
Ubuntu系统下交叉编译szip linux运维
一、交叉编译szip1.下载源码下载Szip：https://docs.hdfgroup.org/archive/support/doc_resource/SZIP/i...下载并解压源码。tar-xvzfszip-2.1.1.tar.gzcdtar-xvzfszip-2.1.1mkdirszipbuild2.设置环境变量设置交叉编译工具链的环境变量：exportPATH=/home/yoyo/3
Hbase在hdfs上的archive目录占用空间过大宝罗Paul 大数据 hbase
hbase版本：1.1.2hadoop版本：2.7.3Hbase在hdfs上的目录/apps/hbase/data/archive占用空间过大，导致不停地发出hdfs空间使用率告警。【问题】告警信息alert:datanode_storageistriggered告警信息表明某个或某些datanode的HDFS存储空间使用率已超过阈值(我们设置的是80%)，需要清理。[hdfs@master-2r
hbase集群archive目录过大问题处理 spring208208 大数据组件线上问题分析 hbase 数据库大数据
1.问题现象现场反馈hbase集群/hbase/archive目录过大，大小约为1.52PB现场集群已经清理掉2个月以前的snapshot文件，当前archive目录文件仍不能释放现场发现1T以上的archive子目录有211个查看集群hbase配置，hmaster堆栈大小20GB，hmaster清理周期5分钟查看hmaster进程分配内存占用6G上下问题分析HMaster内存估算，假如/hbas
在mac上如何配置clion使用googletest进行测试 rd_cheng c++c++clion googletest gtest
1、下载googletest并且编译wgethttps://github.com/google/googletest/archive/release-1.8.0.tar.gz&&\tarzxfrelease-1.8.0.tar.gz&&\rm-frelease-1.8.0.tar.gz&&\cdgoogletest-release-1.8.0&&\cmakeconfigure.&&\make&&\
Hive JDBC 大数据查询场景下的 Socket 读超时问题及实战解决方案窝窝和牛牛大数据 hive hadoop
文章目录HiveJDBC大数据查询场景下的Socket读超时问题及实战解决方案问题背景️解决方案方案一：通过JDBCURL直接配置超时（推荐）方案二：动态设置全局loginTimeout（兼容旧版本）总结与建议HiveJDBC大数据查询场景下的Socket读超时问题及实战解决方案问题背景在使用HiveJDBC执行查询时，偶发SocketTimeoutException异常，堆栈显示在ResultS
ASIHTTPRequest类库简介和使用说明从小爱吃苹果干 iOS ASIHttpRequest ASIHTTPRequest
一、简介原文链接http://www.cnblogs.com/dotey/archive/2011/05/10/2041966.html1.下载源码官方网站：http://allseeing-i.com/ASIHTTPRequest/。可以从上面下载到最新源码，以及获取到相关的资料。2.概况使用iOSSDK中的HTTP网络请求API，相当的复杂，调用很繁琐，ASIHTTPRequest就是一个对C
查看归档日志及rman备份文件大叶梧桐 LINUX 归档日志 man备份
[root@his1/]#ls----查看**/**根目录下的文件archivebootdevhomelib64mediamntoptrmansbinsrvtmpusrbincgroupetcliblost+foundmiscnetprocrootselinuxsysu01var[root@his1/]#cdarchive/------进入archive/[root@his1archive]#ls
Ubuntu系统中下载安装使用Anaconda xxxn1102_ ubuntu ubuntu conda
文章目录一、下载二、安装1、上传安装文件2、阅读安装协议3、确认安装协议4、确认安装位置5、初始化Anaconda6、查看是否安装成功三、base环境四、Anaconda管理虚拟环境一、下载官网下载地址：https://www.anaconda.com/download中科大镜像源官网：https://mirrors.ustc.edu.cn/anaconda/archive/以上两个网址都可下载，
达梦数据库备份 huazhixuthink 数据库 oracle sql
达梦数据库联机在线备份操作指南一、基础条件与准备开启归档模式‌:联机备份必须处于归档模式下，否则无法执行。需通过disql工具执行以下操作：alterdatabasemount;alterdatabaseARCHIVELOG;例子：[dmdba@server~]$cd/opt/dmdbms/bin[dmdba@serverbin]$./disqlSYSDBA/'"Dameng@123"':5236
基于hive的电信离线用户的行为分析系统赵谨言论文经验分享毕业设计
标题:基于hive的电信离线用户的行为分析系统内容:1.摘要随着电信行业的快速发展，用户行为数据呈现出海量、复杂的特点。为了深入了解用户行为模式，提升电信服务质量和精准营销能力，本研究旨在构建基于Hive的电信离线用户行为分析系统。通过收集电信用户的通话记录、上网行为、短信使用等多源数据，利用Hive数据仓库工具进行数据存储和处理，采用数据挖掘和机器学习算法对用户行为进行分析。实验结果表明，该系统
安装httpd m0_74536424 Linux学习笔记 apache linux 网络
安装httpd1.源码编译安装//下载依赖包[root@openEulter-1~]#dnf-yinstallgccgcc-c++makeapr-*pcre-develredhat-rpm-config...安装过程省略...Complete!//下载软件包[root@openEulter-1~]#wgethttps://archive.apache.org/dist/httpd/httpd-2.
数据分析学习目录且行且安~ 数据分析进阶之路 #数据分析目录数据分析
在未来5个月里，将会陪伴大家一起来学习关于数据分析的相关内容，包括从数据思维，数据工具（Excel，Mysql，Hive，Python），数据方法论，数据展示（Tableau,BI），数据挖掘、数据实战项目一整套的内容，同步会将可能用到的以及有用的知识点整理出来。内容会慢慢更新。如下为数据分析的整个目录一、数据分析思维与方法论1.1、从0-1搭建指标体系、用户标签体系1.1.1、指标体系搭建-专项
Python进阶--多线程桔子code Python笔记本多线程 python
原文链接：http://www.juzicode.com/archives/841在《Python进阶教程m9–网络通信–socket通信》中我们实现了一个socket服务端和客户端通信的例子，这个例子中服务端需要等待客户端发送消息后才能返回消息给客户端，在客户端没有发送消息时，服务端一直在data=connet.recv(1024)上被阻塞住，直到等到客户端发来消息才能做下一步的动作。但是在实际
安装mysql 大霞上仙数据库 mysql 数据库
1、安装数据库下载链接https://downloads.mysql.com/archives/community/下载zip安装包，解压到某个路径下，将bin文件夹添加到系统环境变量。然后终端输入指令mysql--version验证2、初始化数据库打开命令提示符（以管理员身份）。导航到你的MySQL目录的bin文件夹（例如cdC:\mysql\bin）。basedir路径下新建my.ini文件中
Databend 产品月报（2025年2月）数据库
很高兴为您带来Databend2025年2月的最新更新、新功能和改进！我们希望这些增强功能对您有所帮助，并期待您的反馈。从MySQL迁移到DatabendDatabend推荐使用db-archiver进行MySQL批量迁移，使用FlinkCDC进行实时变更数据捕获（CDC）迁移。教程已更新：使用db-archiver从MySQL迁移使用FlinkCDC从MySQL迁移设置会话标签现在，您可以为会话
doris: Hive 向阳1218 大数据 hive hadoop 数据仓库 doris
自2.1.3版本开始，ApacheDoris支持对Hive的DDL和DML操作。用户可以直接通过ApacheDoris在Hive中创建库表，并将数据写入到Hive表中。通过该功能，用户可以通过ApacheDoris对Hive进行完整的数据查询和写入操作，进一步帮助用户简化湖仓一体架构。本文介绍在ApacheDoris中支持的Hive操作，语法和使用须知。提示这是一个实验功能。提示使用前，请先设置：
hive-DML语法(超级详细) 研发咨询顾问核心库-大数据 hive hadoop 数据仓库
N.0变量使用setwindow_day=50--定义变量select${hiveconf:window_day}--使用变量N.1单表查询语句N.1.1语法
Hive SQL 优化大数据侠客大数据相关技术文档总结 hive sql 性能优化
标题一、HIVESQL执##标题行顺序了解hivesql的执行顺序，有助于写出更高质量的代码。第一步：确定数据源，进行表的查询和加载from(left/right/inner/outner)joinon第二步：过滤数据，进行条件筛选wheregroupbyhaving第三步：查询数据select第四步：显示数据distinctorderbylimitunion/unionallSql:select
Hive--桶表 XK&RM Hive hive
目录1.为什么要使用桶表？？？2.桶表分桶规则3.桶表的创建3.1DLL3.2数据3.3DML3.4查看桶表里面的数据3.5临时表创建并加载数据3.6把临时表的数据加载到桶表里面4.桶表的查询4.1桶表查询全表的数据4.2桶表查看第一个桶里面的数据4.3查看第二个桶里面的数据4.4查看第三个桶里面的数据4.5查看桶表固定行数据4.6桶表查询语法4.7其他查询5桶表、分区表的区别6两个桶表之间的Jo
HIVE的执行计划实操不爱学习的小枫大数据 hive 大数据
什么是执行计划所谓执行计划，顾名思义，就是对一个查询任务（sql），做出一份怎样去完成任务的详细方案。举个生活中的例子，我从上海要去新疆，我可以选择坐飞机、坐高铁、坐火车，甚至于自驾。具体到线路更是五花八门，现在我准备选择自驾了，具体什么路线怎样去划算（时间&费用），这是一件值得考究的事情。HIVE（我们的自驾工具）提供了EXPLAIN命令来展示一个查询的执行计划（什么路线）,这个执行计划对于我们
hive alter table add columns 是否使用 cascade 的方案 houzhizhen hive hive
结论altertablexxxaddcolumns时加上cascade时，会把所有的分区都加上此字段。如果不加则只有新的分区会加上此字段，旧的分区没有此字段，即便数据文件里有对应的数据，也不能显示内容。如果分区都是insertoverwrite生成的，并且旧分区的数据不再重新生成，可以在addcolumns不用cascade，这样旧的分区对应的列显示null。新的分区正常显示新增的列。如果分区都是
Hive Exception: Too many counters: 2001 max=2000 的解决方法 houzhizhen hive hive hadoop big data
在hive任务的执行过程中，可能出现Toomanycounters的异常。如果执行引擎时tez，则说明当前作业的counters数量超过tez默认的counters限制。Exception:Toomanycounters:2001max=2000atorg.apache.tez.common.counters.Limits.checkCounters(Limits.java:88)atorg.ap
Hive 3.1 在 metastore 运行的 remote threads houzhizhen hive hive hadoop 数据仓库
Remotethreads是仅当Hivemetastore作为单独的服务运行是启动，请求需要开启compactor。有以下几种：1.AcidOpenTxnsCounterService统计当前open的事务数从表TXNS中统计状态为open的事务。此事务数量可以再hivemetrics中。2.AcidHouseKeeperService定期调用txnHandler.performTimeOuts(
通过spark-redshift工具包读取redshift上的表 stark_summer spark spark redshift parquet api 数据
spark数据源API在spark1.2以后，开始提供插件诗的机制，并与各种结构化数据源整合。spark用户可以读取各种各样数据源的数据，比如Hive表、JSON文件、列式的Parquet表、以及其他表。通过spark包可以获取第三方数据源。而这篇文章主要讨论spark新的数据源，通过spark-redshift包，去访问AmazonRedshift服务。spark-redshift包主要由Dat
UBuntu 软件安装 denlee Linux ubuntu deb windows firefox linux 工具
一。先安装中文包，这个我就不多说了，谁都会装。在“语言支持”中选中文就行了二。设置更新源，更新系统。sudoapt-getupdatesudoapt-getdist-upgradesudoapt-getupgrade-y有一个简单办法可以使更新速度更快，把以前更新的时候下载的软件包备份一下，把var/cache/apt/archives/下面的所有deb包放在另外的分区下，建一个文件夹，比如bei
初学者如何用 Python 写第一个爬虫？ ADFVBM 面试学习路线阿里巴巴 python 爬虫开发语言
??欢迎来到我的博客！非常高兴能在这里与您相遇。在这里，您不仅能获得有趣的技术分享，还能感受到轻松愉快的氛围。无论您是编程新手，还是资深开发者，都能在这里找到属于您的知识宝藏，学习和成长。??博客内容包括：Java核心技术与微服务：涵盖Java基础、JVM、并发编程、Redis、Kafka、Spring等，帮助您全面掌握企业级开发技术。大数据技术：涵盖Hadoop（HDFS）、Hive、Spark
Hive之正则表达式三生暮雨渡瀟瀟 hive hive 正则表达式
Hive版本：hive-3.1.2目录一、Hive的正则表达式概述1.1字符集合1.2边界集合1.3量词（重复次数）集合1.4转义操作符1.5运算符优先级二、Hive正则表达式案例2.1like2.2rlike2.3regexp2.4regexp_replace正则替换2.5regexp_extract正则提取2.6、hive实现Oracle中的REGEXP_SUBSTR三、完整代码示例场景：清洗
辗转相处求最大公约数沐刃青蛟 C++漏洞
无言面对”江东父老“了，接触编程一年了，今天发现还不会辗转相除法求最大公约数。惭愧惭愧！为此，总结一下以方便日后忘了好查找。 1.输入要比较的两个数a,b 忽略：2.比较大小（因为后面要的是大的数对小的数做%操作） 3.辗转相除（用循环不停的取余，如a%b,直至b=0） 4.最后的a为两数的最大公约数 &
F5负载均衡会话保持技术及原理技术白皮书 bijian1013 F5 负载均衡
一.什么是会话保持？在大多数电子商务的应用系统或者需要进行用户身份认证的在线系统中，一个客户与服务器经常经过好几次的交互过程才能完成一笔交易或者是一个请求的完成。由于这几次交互过程是密切相关的，服务器在进行这些交互过程的某一个交互步骤时，往往需要了解上一次交互过程的处理结果，或者上几步的交互过程结果，服务器进行下
Object.equals方法：重载还是覆盖 Cwind java generics override overload
本文译自StackOverflow上对此问题的讨论。原问题链接在阅读Joshua Bloch的《Effective Java（第二版）》第8条“覆盖equals时请遵守通用约定”时对如下论述有疑问： “不要将equals声明中的Object对象替换为其他的类型。程序员编写出下面这样的equals方法并不鲜见，这会使程序员花上数个小时都搞不清它为什么不能正常工作：” pu
初始线程 15700786134
暑假学习的第一课是讲线程，任务是是界面上的一条线运动起来。既然是在界面上，那必定得先有一个界面，所以第一步就是，自己的类继承JAVA中的JFrame，在新建的类中写一个界面，代码如下： public class ShapeFr
Linux的tcpdump 被触发 tcpdump
用简单的话来定义tcpdump，就是：dump the traffic on a network，根据使用者的定义对网络上的数据包进行截获的包分析工具。 tcpdump可以将网络中传送的数据包的“头”完全截获下来提供分析。它支持针对网络层、协议、主机、网络或端口的过滤，并提供and、or、not等逻辑语句来帮助你去掉无用的信息。实用命令实例默认启动 tcpdump 普通情况下，直
安卓程序listview优化后还是卡顿肆无忌惮_ ListView
最近用eclipse开发一个安卓app，listview使用baseadapter，里面有一个ImageView和两个TextView。使用了Holder内部类进行优化了还是很卡顿。后来发现是图片资源的问题。把一张分辨率高的图片放在了drawable-mdpi文件夹下，当我在每个item中显示，他都要进行缩放，导致很卡顿。解决办法是把这个高分辨率图片放到drawable-xxhdpi下。 &nb
扩展easyUI tab控件，添加加载遮罩效果知了ing jquery
(function () { $.extend($.fn.tabs.methods, { //显示遮罩 loading: function (jq, msg) { return jq.each(function () { var panel = $(this).tabs(&
gradle上传jar到nexus 矮蛋蛋 gradle
原文地址： https://docs.gradle.org/current/userguide/maven_plugin.html configurations { deployerJars } dependencies { deployerJars "org.apache.maven.wagon
千万条数据外网导入数据库的解决方案。 alleni123 sql mysql
从某网上爬了数千万的数据，存在文本中。然后要导入mysql数据库。悲剧的是数据库和我存数据的服务器不在一个内网里面。。 ping了一下， 19ms的延迟。于是下面的代码是没用的。 ps = con.prepareStatement(sql); ps.setString(1, info.getYear())............; ps.exec
JAVA IO InputStreamReader和OutputStreamReader 百合不是茶 JAVA.io操作字符流
这是第三篇关于java.io的文章了，从开始对io的不了解-->熟悉--->模糊，是这几天来对文件操作中最大的感受，本来自己认为的熟悉了的，刚刚在回想起前面学的好像又不是很清晰了，模糊对我现在或许是最好的鼓励我会更加的去学加油！： JAVA的API提供了另外一种数据保存途径，使用字符流来保存的，字符流只能保存字符形式的流字节流和字符的难点：a,怎么将读到的数据
MO、MT解读 bijian1013 GSM
MO= Mobile originate，上行，即用户上发给SP的信息。MT= Mobile Terminate，下行，即SP端下发给用户的信息；上行:mo提交短信到短信中心下行:mt短信中心向特定的用户转发短信，你的短信是这样的，你所提交的短信，投递的地址是短信中心。短信中心收到你的短信后，存储转发，转发的时候就会根据你填写的接收方号码寻找路由，下发。在彩信领域是一样的道理。下行业务：由SP
五个JavaScript基础问题 bijian1013 JavaScript call apply this Hoisting
下面是五个关于前端相关的基础问题，但却很能体现JavaScript的基本功底。问题1：Scope作用范围考虑下面的代码： (function() { var a = b = 5; })(); console.log(b); 什么会被打印在控制台上？回答：上面的代码会打印 5。 &nbs
【Thrift二】Thrift Hello World bit1129 Hello world
本篇，不考虑细节问题和为什么，先照葫芦画瓢写一个Thrift版本的Hello World，了解Thrift RPC服务开发的基本流程 1. 在Intellij中创建一个Maven模块，加入对Thrift的依赖，同时还要加上slf4j依赖，如果不加slf4j依赖，在后面启动Thrift Server时会报错 <dependency>
【Avro一】Avro入门 bit1129 入门
本文的目的主要是总结下基于Avro Schema代码生成，然后进行序列化和反序列化开发的基本流程。需要指出的是，Avro并不要求一定得根据Schema文件生成代码，这对于动态类型语言很有用。 1. 添加Maven依赖 <?xml version="1.0" encoding="UTF-8"?> <proj
安装nginx+ngx_lua支持WAF防护功能 ronin47
需要的软件:LuaJIT-2.0.0.tar.gz nginx-1.4.4.tar.gz &nb
java-5.查找最小的K个元素-使用最大堆 bylijinnan java
import java.util.Arrays; import java.util.Random; public class MinKElement { /** * 5.最小的K个元素 * I would like to use MaxHeap. * using QuickSort is also OK */ public static void
TCP的TIME-WAIT bylijinnan socket
原文连接： http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html 以下为对原文的阅读笔记说明：主动关闭的一方称为local end，被动关闭的一方称为remote end 本地IP、本地端口、远端IP、远端端口这一“四元组”称为quadruplet，也称为socket 1、TIME_WA
jquery ajax 序列化表单 coder_xpf Jquery ajax 序列化
checkbox 如果不设定值，默认选中值为on；设定值之后，选中则为设定的值 <input type="checkbox" name="favor" id="favor" checked="checked"/> $("#favor&quo
Apache集群乱码和最高并发控制 cuisuqiang apache tomcat 并发集群乱码
都知道如果使用Http访问，那么在Connector中增加URIEncoding即可，其实使用AJP时也一样，增加useBodyEncodingForURI和URIEncoding即可。最大连接数也是一样的，增加maxThreads属性即可，如下，配置如下： <Connector maxThreads="300" port="8019" prot
websocket dalan_123 websocket
一、低延迟的客户端-服务器和服务器-客户端的连接很多时候所谓的http的请求、响应的模式，都是客户端加载一个网页，直到用户在进行下一次点击的时候，什么都不会发生。并且所有的http的通信都是客户端控制的，这时候就需要用户的互动或定期轮训的，以便从服务器端加载新的数据。通常采用的技术比如推送和comet（使用http长连接、无需安装浏览器安装插件的两种方式：基于ajax的长
菜鸟分析网络执法官 dcj3sjt126com 网络
最近在论坛上看到很多贴子在讨论网络执法官的问题。菜鸟我正好知道这回事情.人道"人之患好为人师" 手里忍不住,就写点东西吧. 我也很忙.又没有MM,又没有MONEY....晕倒有点跑题. OK,闲话少说,切如正题. 要了解网络执法官的原理. 就要先了解局域网的通信的原理. 前面我们看到了.在以太网上传输的都是具有以太网头的数据包.
Android相对布局属性全集 dcj3sjt126com android
RelativeLayout布局android:layout_marginTop="25dip" //顶部距离android:gravity="left" //空间布局位置android:layout_marginLeft="15dip //距离左边距 // 相对于给定ID控件android:layout_above 将该控件的底部置于给定ID的
Tomcat内存设置详解 eksliang jvm tomcat tomcat内存设置
Java内存溢出详解一、常见的Java内存溢出有以下三种： 1. java.lang.OutOfMemoryError: Java heap space ----JVM Heap（堆）溢出JVM在启动的时候会自动设置JVM Heap的值，其初始空间(即-Xms)是物理内存的1/64，最大空间(-Xmx)不可超过物理内存。可以利用JVM提
Java6 JVM参数选项 greatwqs java HotSpot jvm jvm参数 JVM Options
Java 6 JVM参数选项大全（中文版）作者：Ken Wu Email: ken.wug@gmail.com 转载本文档请注明原文链接 http://kenwublog.com/docs/java6-jvm-options-chinese-edition.htm！本文是基于最新的SUN官方文档Java SE 6 Hotspot VM Opt
weblogic创建JMC i5land weblogic jms
进入 weblogic控制太 1.创建持久化存储 --Services--Persistant Stores--new--Create FileStores--name随便起--target默认--Directory写入在本机建立的文件夹的路径--ok 2.创建JMS服务器 --Services--Messaging--JMS Servers--new--name随便起--Pers
基于 DHT 网络的磁力链接和BT种子的搜索引擎架构 justjavac DHT
上周开发了一个磁力链接和 BT 种子的搜索引擎 {Magnet & Torrent}，本文简单介绍一下主要的系统功能和用到的技术。系统包括几个独立的部分：使用 Python 的 Scrapy 框架开发的网络爬虫，用来爬取磁力链接和种子；使用 PHP CI 框架开发的简易网站；搜索引擎目前直接使用的 MySQL，将来可以考虑使
sql添加、删除表中的列 macroli sql
添加没有默认值：alter table Test add BazaarType char(1) 有默认值的添加列：alter table Test add BazaarType char(1) default(0) 删除没有默认值的列：alter table Test drop COLUMN BazaarType 删除有默认值的列：先删除约束（默认值）alter table Test DRO
PHP中二维数组的排序方法 abc123456789cba 排序二维数组 PHP
<?php/*** @package BugFree* @version $Id: FunctionsMain.inc.php,v 1.32 2005/09/24 11:38:37 wwccss Exp $*** Sort an two-dimension array by some level
hive优化之------控制hive任务中的map数和reduce数 superlxw1234 hive hive优化
一、控制hive任务中的map数: 1. 通常情况下，作业会通过input的目录产生一个或者多个map任务。主要的决定因素有： input的文件总个数，input的文件大小，集群设置的文件块大小(目前为128M, 可在hive中通过set dfs.block.size;命令查看到，该参数不能自定义修改)；2.
Spring Boot 1.2.4 发布 wiselyman spring boot
Spring Boot 1.2.4已于6.4日发布，repo.spring.io and Maven Central可以下载(推荐使用maven或者gradle构建下载)。这是一个维护版本，包含了一些修复small number of fixes,建议所有的用户升级。 Spring Boot 1.3的第一个里程碑版本将在几天后发布，包含许多

Hive 开窗函数 汇总

开窗函数

聚合开窗函数

count开窗函数

sum开窗函数

min开窗函数

max开窗函数

avg开窗函数

first_value开窗函数

last_value开窗函数

lag开窗函数

lead开窗函数

cume_dist 开窗函数

排序开窗函数

rank开窗函数

dense_rank开窗函数

ntile开窗函数

row_number开窗函数

percent_rank开窗函数

你可能感兴趣的:(hive,开窗函数)

Hive 开窗函数汇总