fengfengchen95

hive函数大全

Hive常用函数大全一览

Hive常用函数大全一览
1 关系运算
1.1 1、等值比较: =
1.2 2、不等值比较:
1.3 3、小于比较: <
1.4 4、小于等于比较: <=
1.5 5、大于比较: >
1.6 6、大于等于比较: >=
1.7 7、空值判断: IS NULL
1.8 8、非空判断: IS NOT NULL
1.9 9、LIKE比较: LIKE
1.10 10、JAVA的LIKE操作: RLIKE
1.11 11、REGEXP操作: REGEXP
2 数学运算：
2.1 1、加法操作: +
2.2 2、减法操作: -
2.3 3、乘法操作: *
2.4 4、除法操作: /
2.5 5、取余操作: %
2.6 6、位与操作: &
2.7 7、位或操作: |
2.8 8、位异或操作: ^
2.9 9．位取反操作: ~
3 逻辑运算：
3.1 1、逻辑与操作: AND
3.2 2、逻辑或操作: OR
3.3 3、逻辑非操作: NOT
4 数值计算
4.1 1、取整函数: round
4.2 2、指定精度取整函数: round
4.3 3、向下取整函数: floor
4.4 4、向上取整函数: ceil
4.5 5、向上取整函数: ceiling
4.6 6、取随机数函数: rand
4.7 7、自然指数函数: exp
4.8 8、以10为底对数函数: log10
4.9 9、以2为底对数函数: log2
4.10 10、对数函数: log
4.11 11、幂运算函数: pow
4.12 12、幂运算函数: power
4.13 13、开平方函数: sqrt
4.14 14、二进制函数: bin
4.15 15、十六进制函数: hex
4.16 16、反转十六进制函数: unhex
4.17 17、进制转换函数: conv
4.18 18、绝对值函数: abs
4.19 19、正取余函数: pmod
4.20 20、正弦函数: sin
4.21 21、反正弦函数: asin
4.22 22、余弦函数: cos
4.23 23、反余弦函数: acos
4.24 24、positive函数: positive
4.25 25、negative函数: negative
5 日期函数
5.1 1、UNIX时间戳转日期函数: from_unixtime
5.2 2、获取当前UNIX时间戳函数: unix_timestamp
5.3 3、日期转UNIX时间戳函数: unix_timestamp
5.4 4、指定格式日期转UNIX时间戳函数: unix_timestamp
5.5 5、日期时间转日期函数: to_date
5.6 6、日期转年函数: year
5.7 7、日期转月函数: month
5.8 8、日期转天函数: day
5.9 9、日期转小时函数: hour
5.10 10、日期转分钟函数: minute
5.11 11、日期转秒函数: second
5.12 12、日期转周函数: weekofyear
5.13 13、日期比较函数: datediff
5.14 14、日期增加函数: date_add
5.15 15、日期减少函数: date_sub
6 条件函数
6.1 1、If函数: if
6.2 2、非空查找函数: COALESCE
6.3 3、条件判断函数：CASE
6.4 4、条件判断函数：CASE
7 字符串函数
7.1 1、字符串长度函数：length
7.2 2、字符串反转函数：reverse
7.3 3、字符串连接函数：concat
7.4 4、带分隔符字符串连接函数：concat_ws
7.5 5、字符串截取函数：substr,substring
7.6 6、字符串截取函数：substr,substring
7.7 7、字符串转大写函数：upper,ucase
7.8 8、字符串转小写函数：lower,lcase
7.9 9、去空格函数：trim
7.10 10、左边去空格函数：ltrim
7.11 11、右边去空格函数：rtrim
7.12 12、正则表达式替换函数：regexp_replace
7.13 13、正则表达式解析函数：regexp_extract
7.14 14、URL解析函数：parse_url
7.15 15、json解析函数：get_json_object
7.16 16、空格字符串函数：space
7.17 17、重复字符串函数：repeat
7.18 18、首字符ascii函数：ascii
7.19 19、左补足函数：lpad
7.20 20、右补足函数：rpad
7.21 21、分割字符串函数: split
7.22 22、集合查找函数: find_in_set
8 集合统计函数
8.1 1、个数统计函数: count
8.2 2、总和统计函数: sum
8.3 3、平均值统计函数: avg
8.4 4、最小值统计函数: min
8.5 5、最大值统计函数: max
8.6 6、非空集合总体变量函数: var_pop
8.7 7、非空集合样本变量函数: var_samp
8.8 8、总体标准偏离函数: stddev_pop
8.9 9、样本标准偏离函数: stddev_samp
8.10 10．中位数函数: percentile
8.11 11、中位数函数: percentile
8.12 12、近似中位数函数: percentile_approx
8.13 13、近似中位数函数: percentile_approx
8.14 14、直方图: histogram_numeric
9 复合类型构建操作
9.1 1、Map类型构建: map
9.2 2、Struct类型构建: struct
9.3 3、array类型构建: array
10 复杂类型访问操作
10.1 1、array类型访问: A[n]
10.2 2、map类型访问: M[key]
10.3 3、struct类型访问: S.x
11 复杂类型长度统计函数
11.1 1.Map类型长度函数: size(Map)
11.2 2.array类型长度函数: size(Array)
11.3 3.类型转换函数
关系运算
1、等值比较: =
语法：A=B
操作类型：所有基本类型
描述: 如果表达式A与表达式B相等，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 1=1;
1
2、不等值比较: <>
语法: A <> B
操作类型: 所有基本类型
描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A与表达式B不相等，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 1 <> 2;
1
3、小于比较: <
语法: A < B
操作类型：所有基本类型
描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于表达式B，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 1 < 2;
1
4、小于等于比较: <=
语法: A <= B
操作类型: 所有基本类型
描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于或者等于表达式B，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 1 < = 1;
1
5、大于比较: >
语法: A > B
操作类型: 所有基本类型
描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于表达式B，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 2 > 1;
1
6、大于等于比较: >=
语法: A >= B
操作类型: 所有基本类型
描述: 如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于或者等于表达式B，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 1 >= 1;
1
注意：String的比较要注意(常用的时间比较可以先 to_date 之后再比较)
hive> select * from iteblog;
OK
2011111209 00:00:00 2011111209

hive> select a, b, ab, a=b from iteblog;
2011111209 00:00:00 2011111209 false true false
7、空值判断: IS NULL
语法: A IS NULL
操作类型: 所有类型
描述: 如果表达式A的值为NULL，则为TRUE；否则为FALSE
hive> select 1 from iteblog where null is null;
1
8、非空判断: IS NOT NULL
语法: A IS NOT NULL
操作类型: 所有类型
描述: 如果表达式A的值为NULL，则为FALSE；否则为TRUE
hive> select 1 from iteblog where 1 is not null;
1
9、LIKE比较: LIKE
语法: A LIKE B
操作类型: strings
描述: 如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合表达式B 的正则语法，则为TRUE；否则为FALSE。B中字符”_”表示任意单个字符，而字符”%”表示任意数量的字符。
hive> select 1 from iteblog where 'football' like 'foot%';
1
hive> select 1 from iteblog where 'football' like 'foot____';
1
注意：否定比较时候用NOT A LIKE B
hive> select 1 from iteblog where NOT 'football' like 'fff%';
1
10、JAVA的LIKE操作: RLIKE
语法: A RLIKE B
操作类型: strings
描述: 如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE；否则为FALSE。
hive> select 1 from iteblog where 'footbar’ rlike '^f.*r$’;
1
注意：判断一个字符串是否全为数字：
hive>select 1 from iteblog where '123456' rlike '^\\d+$';
1
hive> select 1 from iteblog where '123456aa' rlike '^\\d+$';
11、REGEXP操作: REGEXP
语法: A REGEXP B
操作类型: strings
描述: 功能与RLIKE相同
hive> select 1 from iteblog where 'footbar' REGEXP '^f.*r$';
1
数学运算：
1、加法操作: +
语法: A + B
操作类型：所有数值类型
说明：返回A与B相加的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int + int 一般结果为int类型，而 int + double 一般结果为double类型
hive> select 1 + 9 from iteblog;
10
hive> create table iteblog as select 1 + 1.2 from iteblog;
hive> describe iteblog;
_c0 double
2、减法操作: -
语法: A – B
操作类型：所有数值类型
说明：返回A与B相减的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int – int 一般结果为int类型，而 int – double 一般结果为double类型
hive> select 10 – 5 from iteblog;
5
hive> create table iteblog as select 5.6 – 4 from iteblog;
hive> describe iteblog;
_c0 double
3、乘法操作: *
语法: A * B
操作类型：所有数值类型
说明：返回A与B相乘的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。注意，如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型
hive> select 40 * 5 from iteblog;
200
4、除法操作: /
语法: A / B
操作类型：所有数值类型
说明：返回A除以B的结果。结果的数值类型为double
hive> select 40 / 5 from iteblog;
8.0
注意：hive中最高精度的数据类型是double,只精确到小数点后16位，在做除法运算的时候要特别注意
hive>select ceil(28.0/6.999999999999999999999) from iteblog limit 1;
结果为4
hive>select ceil(28.0/6.99999999999999) from iteblog limit 1;
结果为5
5、取余操作: %
语法: A % B
操作类型：所有数值类型
说明：返回A除以B的余数。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。
hive> select 41 % 5 from iteblog;
1
hive> select 8.4 % 4 from iteblog;
0.40000000000000036
注意：精度在hive中是个很大的问题，类似这样的操作最好通过round指定精度
hive> select round(8.4 % 4 , 2) from iteblog;
0.4
6、位与操作: &
语法: A & B
操作类型：所有数值类型
说明：返回A和B按位进行与操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。
hive> select 4 & 8 from iteblog;
0
hive> select 6 & 4 from iteblog;
4
7、位或操作: |
语法: A | B
操作类型：所有数值类型
说明：返回A和B按位进行或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。
hive> select 4 | 8 from iteblog;
12
hive> select 6 | 8 from iteblog;
14
8、位异或操作: ^
语法: A ^ B
操作类型：所有数值类型
说明：返回A和B按位进行异或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。
hive> select 4 ^ 8 from iteblog;
12
hive> select 6 ^ 4 from iteblog;
2
9．位取反操作: ~
语法: ~A
操作类型：所有数值类型
说明：返回A按位取反操作的结果。结果的数值类型等于A的类型。
hive> select ~6 from iteblog;
-7
hive> select ~4 from iteblog;
-5
逻辑运算：
1、逻辑与操作: AND
语法: A AND B
操作类型：boolean
说明：如果A和B均为TRUE，则为TRUE；否则为FALSE。如果A为NULL或B为NULL，则为NULL
hive> select 1 from iteblog where 1=1 and 2=2;
1
2、逻辑或操作: OR
语法: A OR B
操作类型：boolean
说明：如果A为TRUE，或者B为TRUE，或者A和B均为TRUE，则为TRUE；否则为FALSE
hive> select 1 from iteblog where 1=2 or 2=2;
1
3、逻辑非操作: NOT
语法: NOT A
操作类型：boolean
说明：如果A为FALSE，或者A为NULL，则为TRUE；否则为FALSE
hive> select 1 from iteblog where not 1=2;
1
数值计算
1、取整函数: round
语法: round(double a)
返回值: BIGINT
说明: 返回double类型的整数值部分（遵循四舍五入）
hive> select round(3.1415926) from iteblog;
3
hive> select round(3.5) from iteblog;
4
hive> create table iteblog as select round(9542.158) from iteblog;
hive> describe iteblog;
_c0 bigint
2、指定精度取整函数: round
语法: round(double a, int d)
返回值: DOUBLE
说明: 返回指定精度d的double类型
hive> select round(3.1415926,4) from iteblog;
3.1416
3、向下取整函数: floor
语法: floor(double a)
返回值: BIGINT
说明: 返回等于或者小于该double变量的最大的整数
hive> select floor(3.1415926) from iteblog;
3
hive> select floor(25) from iteblog;
25
4、向上取整函数: ceil
语法: ceil(double a)
返回值: BIGINT
说明: 返回等于或者大于该double变量的最小的整数
hive> select ceil(3.1415926) from iteblog;
4
hive> select ceil(46) from iteblog;
46
5、向上取整函数: ceiling
语法: ceiling(double a)
返回值: BIGINT
说明: 与ceil功能相同
hive> select ceiling(3.1415926) from iteblog;
4
hive> select ceiling(46) from iteblog;
46
6、取随机数函数: rand
语法: rand(),rand(int seed)
返回值: double
说明: 返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列
hive> select rand() from iteblog;
0.5577432776034763
hive> select rand() from iteblog;
0.6638336467363424
hive> select rand(100) from iteblog;
0.7220096548596434
hive> select rand(100) from iteblog;
0.7220096548596434
7、自然指数函数: exp
语法: exp(double a)
返回值: double
说明: 返回自然对数e的a次方
hive> select exp(2) from iteblog;
7.38905609893065
自然对数函数: ln
语法: ln(double a)
返回值: double
说明: 返回a的自然对数
1
hive> select ln(7.38905609893065) from iteblog;
2.0
8、以10为底对数函数: log10
语法: log10(double a)
返回值: double
说明: 返回以10为底的a的对数
hive> select log10(100) from iteblog;
2.0
9、以2为底对数函数: log2
语法: log2(double a)
返回值: double
说明: 返回以2为底的a的对数
hive> select log2(8) from iteblog;
3.0
10、对数函数: log
语法: log(double base, double a)
返回值: double
说明: 返回以base为底的a的对数
hive> select log(4,256) from iteblog;
4.0
11、幂运算函数: pow
语法: pow(double a, double p)
返回值: double
说明: 返回a的p次幂
hive> select pow(2,4) from iteblog;
16.0
12、幂运算函数: power
语法: power(double a, double p)
返回值: double
说明: 返回a的p次幂,与pow功能相同
hive> select power(2,4) from iteblog;
16.0
13、开平方函数: sqrt
语法: sqrt(double a)
返回值: double
说明: 返回a的平方根
hive> select sqrt(16) from iteblog;
4.0
14、二进制函数: bin
语法: bin(BIGINT a)
返回值: string
说明: 返回a的二进制代码表示
hive> select bin(7) from iteblog;
111
15、十六进制函数: hex
语法: hex(BIGINT a)
返回值: string
说明: 如果变量是int类型，那么返回a的十六进制表示；如果变量是string类型，则返回该字符串的十六进制表示
hive> select hex(17) from iteblog;
11
hive> select hex(‘abc’) from iteblog;
616263
16、反转十六进制函数: unhex
语法: unhex(string a)
返回值: string
说明: 返回该十六进制字符串所代码的字符串
hive> select unhex(‘616263’) from iteblog;
abc
hive> select unhex(‘11’) from iteblog;
-
hive> select unhex(616263) from iteblog;
abc
17、进制转换函数: conv
语法: conv(BIGINT num, int from_base, int to_base)
返回值: string
说明: 将数值num从from_base进制转化到to_base进制
hive> select conv(17,10,16) from iteblog;
11
hive> select conv(17,10,2) from iteblog;
10001
18、绝对值函数: abs
语法: abs(double a) abs(int a)
返回值: double int
说明: 返回数值a的绝对值
hive> select abs(-3.9) from iteblog;
3.9
hive> select abs(10.9) from iteblog;
10.9
19、正取余函数: pmod
语法: pmod(int a, int b),pmod(double a, double b)
返回值: int double
说明: 返回正的a除以b的余数
hive> select pmod(9,4) from iteblog;
1
hive> select pmod(-9,4) from iteblog;
3
20、正弦函数: sin
语法: sin(double a)
返回值: double
说明: 返回a的正弦值
hive> select sin(0.8) from iteblog;
0.7173560908995228
21、反正弦函数: asin
语法: asin(double a)
返回值: double
说明: 返回a的反正弦值
hive> select asin(0.7173560908995228) from iteblog;
0.8
22、余弦函数: cos
语法: cos(double a)
返回值: double
说明: 返回a的余弦值
hive> select cos(0.9) from iteblog;
0.6216099682706644
23、反余弦函数: acos
语法: acos(double a)
返回值: double
说明: 返回a的反余弦值
hive> select acos(0.6216099682706644) from iteblog;
0.9
24、positive函数: positive
语法: positive(int a), positive(double a)
返回值: int double
说明: 返回a
hive> select positive(-10) from iteblog;
-10
hive> select positive(12) from iteblog;
12
25、negative函数: negative
语法: negative(int a), negative(double a)
返回值: int double
说明: 返回-a
hive> select negative(-5) from iteblog;
5
hive> select negative(8) from iteblog;
-8
日期函数
1、UNIX时间戳转日期函数: from_unixtime
语法: from_unixtime(bigint unixtime[, string format])
返回值: string
说明: 转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式
hive> select from_unixtime(1323308943,'yyyyMMdd') from iteblog;
20111208
2、获取当前UNIX时间戳函数: unix_timestamp
语法: unix_timestamp()
返回值: bigint
说明: 获得当前时区的UNIX时间戳
hive> select unix_timestamp() from iteblog;
1323309615
3、日期转UNIX时间戳函数: unix_timestamp
语法: unix_timestamp(string date)
返回值: bigint
说明: 转换格式为"yyyy-MM-dd HH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回0。
hive> select unix_timestamp('2011-12-07 13:01:03') from iteblog;
1323234063
4、指定格式日期转UNIX时间戳函数: unix_timestamp
语法: unix_timestamp(string date, string pattern)
返回值: bigint
说明: 转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。
hive> select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss') from iteblog;
1323234063
5、日期时间转日期函数: to_date
语法: to_date(string timestamp)
返回值: string
说明: 返回日期时间字段中的日期部分。
hive> select to_date('2011-12-08 10:03:01') from iteblog;
2011-12-08
6、日期转年函数: year
语法: year(string date)
返回值: int
说明: 返回日期中的年。
hive> select year('2011-12-08 10:03:01') from iteblog;
2011
hive> select year('2012-12-08') from iteblog;
2012
7、日期转月函数: month
语法: month (string date)
返回值: int
说明: 返回日期中的月份。
hive> select month('2011-12-08 10:03:01') from iteblog;
12
hive> select month('2011-08-08') from iteblog;
8
8、日期转天函数: day
语法: day (string date)
返回值: int
说明: 返回日期中的天。
hive> select day('2011-12-08 10:03:01') from iteblog;
8
hive> select day('2011-12-24') from iteblog;
24
9、日期转小时函数: hour
语法: hour (string date)
返回值: int
说明: 返回日期中的小时。
hive> select hour('2011-12-08 10:03:01') from iteblog;
10
10、日期转分钟函数: minute
语法: minute (string date)
返回值: int
说明: 返回日期中的分钟。
hive> select minute('2011-12-08 10:03:01') from iteblog;
3
11、日期转秒函数: second
语法: second (string date)
返回值: int
说明: 返回日期中的秒。
hive> select second('2011-12-08 10:03:01') from iteblog;
1
12、日期转周函数: weekofyear
语法: weekofyear (string date)
返回值: int
说明: 返回日期在当前的周数。
hive> select weekofyear('2011-12-08 10:03:01') from iteblog;
49
13、日期比较函数: datediff
语法: datediff(string enddate, string startdate)
返回值: int
说明: 返回结束日期减去开始日期的天数。
hive> select datediff('2012-12-08','2012-05-09') from iteblog;
213
14、日期增加函数: date_add
语法: date_add(string startdate, int days)
返回值: string
说明: 返回开始日期startdate增加days天后的日期。
hive> select date_add('2012-12-08',10) from iteblog;
2012-12-18
15、日期减少函数: date_sub
语法: date_sub (string startdate, int days)
返回值: string
说明: 返回开始日期startdate减少days天后的日期。
hive> select date_sub('2012-12-08',10) from iteblog;
2012-11-28
条件函数
1、If函数: if
语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull
hive> select if(1=2,100,200) from iteblog;
200
hive> select if(1=1,100,200) from iteblog;
100
2、非空查找函数: COALESCE
语法: COALESCE(T v1, T v2, …)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL
hive> select COALESCE(null,'100','50′) from iteblog;
100
3、条件判断函数：CASE
语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f
hive> Select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from iteblog;
mary
hive> Select case 200 when 50 then 'tom' when 100 then 'mary' else 'tim' end from iteblog;
tim
4、条件判断函数：CASE
语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e
hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from iteblog;
mary
hive> select case when 1=1 then 'tom' when 2=2 then 'mary' else 'tim' end from iteblog;
tom
字符串函数
1、字符串长度函数：length
语法: length(string A)
返回值: int
说明：返回字符串A的长度
hive> select length('abcedfg') from iteblog;
7
2、字符串反转函数：reverse
语法: reverse(string A)
返回值: string
说明：返回字符串A的反转结果
hive> select reverse(abcedfg’) from iteblog;
gfdecba
3、字符串连接函数：concat
语法: concat(string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串
hive> select concat(‘abc’,'def’,'gh’) from iteblog;
abcdefgh
4、带分隔符字符串连接函数：concat_ws
语法: concat_ws(string SEP, string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
hive> select concat_ws(',','abc','def','gh') from iteblog;
abc,def,gh
5、字符串截取函数：substr,substring
语法: substr(string A, int start),substring(string A, int start)
返回值: string
说明：返回字符串A从start位置到结尾的字符串
hive> select substr('abcde',3) from iteblog;
cde
hive> select substring('abcde',3) from iteblog;
cde
hive> select substr('abcde',-1) from iteblog; （和ORACLE相同）
e
6、字符串截取函数：substr,substring
语法: substr(string A, int start, int len),substring(string A, int start, int len)
返回值: string
说明：返回字符串A从start位置开始，长度为len的字符串
hive> select substr('abcde',3,2) from iteblog;
cd
hive> select substring('abcde',3,2) from iteblog;
cd
hive>select substring('abcde',-2,2) from iteblog;
de
7、字符串转大写函数：upper,ucase
语法: upper(string A) ucase(string A)
返回值: string
说明：返回字符串A的大写格式
hive> select upper('abSEd') from iteblog;
ABSED
hive> select ucase('abSEd') from iteblog;
ABSED
8、字符串转小写函数：lower,lcase
语法: lower(string A) lcase(string A)
返回值: string
说明：返回字符串A的小写格式
hive> select lower('abSEd') from iteblog;
absed
hive> select lcase('abSEd') from iteblog;
absed
9、去空格函数：trim
语法: trim(string A)
返回值: string
说明：去除字符串两边的空格
hive> select trim(' abc ') from iteblog;
abc
10、左边去空格函数：ltrim
语法: ltrim(string A)
返回值: string
说明：去除字符串左边的空格
hive> select ltrim(' abc ') from iteblog;
abc
11、右边去空格函数：rtrim
语法: rtrim(string A)
返回值: string
说明：去除字符串右边的空格
hive> select rtrim(' abc ') from iteblog;
abc
12、正则表达式替换函数：regexp_replace
语法: regexp_replace(string A, string B, string C)
返回值: string
说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。
hive> select regexp_replace('foobar', 'oo|ar', '') from iteblog;
fb
13、正则表达式解析函数：regexp_extract
语法: regexp_extract(string subject, string pattern, int index)
返回值: string
说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 1) from iteblog;
the
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 2) from iteblog;
bar
hive> select regexp_extract('foothebar', 'foo(.*?)(bar)', 0) from iteblog;
foothebar
strong>注意，在有些情况下要使用转义字符，下面的等号要用双竖线转义，这是java正则表达式的规则。
select data_field,
regexp_extract(data_field,'.*?bgStart\\=([^&]+)',1) as aaa,
regexp_extract(data_field,'.*?contentLoaded_headStart\\=([^&]+)',1) as bbb,
regexp_extract(data_field,'.*?AppLoad2Req\\=([^&]+)',1) as ccc
from pt_nginx_loginlog_st
where pt = '2012-03-26' limit 2;
14、URL解析函数：parse_url
语法: parse_url(string urlString, string partToExtract [, string keyToExtract])
返回值: string
说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
hive> select parse_url('https://www.iteblog.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') from iteblog;
facebook.com
hive> select parse_url('https://www.iteblog.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') from iteblog;
v1
15、json解析函数：get_json_object
语法: get_json_object(string json_string, string path)
返回值: string
说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。
hive> select get_json_object('{"store":
> {"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
> "bicycle":{"price":19.95,"color":"red"}
> },
> "email":"amy@only_for_json_udf_test.net",
> "owner":"amy"
> }
> ','$.owner') from iteblog;
amy
16、空格字符串函数：space
语法: space(int n)
返回值: string
说明：返回长度为n的字符串
hive> select space(10) from iteblog;
hive> select length(space(10)) from iteblog;
10
17、重复字符串函数：repeat
语法: repeat(string str, int n)
返回值: string
说明：返回重复n次后的str字符串
hive> select repeat('abc',5) from iteblog;
abcabcabcabcabc
18、首字符ascii函数：ascii
语法: ascii(string str)
返回值: int
说明：返回字符串str第一个字符的ascii码
hive> select ascii('abcde') from iteblog;
97
19、左补足函数：lpad
语法: lpad(string str, int len, string pad)
返回值: string
说明：将str进行用pad进行左补足到len位
hive> select lpad('abc',10,'td') from iteblog;
tdtdtdtabc
注意：与GP，ORACLE不同，pad 不能默认
20、右补足函数：rpad
语法: rpad(string str, int len, string pad)
返回值: string
说明：将str进行用pad进行右补足到len位
hive> select rpad('abc',10,'td') from iteblog;
abctdtdtdt
21、分割字符串函数: split
语法: split(string str, string pat)
返回值: array
说明: 按照pat字符串分割str，会返回分割后的字符串数组
hive> select split('abtcdtef','t') from iteblog;
["ab","cd","ef"]
22、集合查找函数: find_in_set
语法: find_in_set(string str, string strList)
返回值: int
说明: 返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0
hive> select find_in_set('ab','ef,ab,de') from iteblog;
2
hive> select find_in_set('at','ef,ab,de') from iteblog;
0
集合统计函数
1、个数统计函数: count
语法: count(*), count(expr), count(DISTINCT expr[, expr_.])
返回值: int
说明: count(*)统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCT expr[, expr_.])返回指定字段的不同的非空值的个数
hive> select count(*) from iteblog;
20
hive> select count(distinct t) from iteblog;
10
2、总和统计函数: sum
语法: sum(col), sum(DISTINCT col)
返回值: double
说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果
hive> select sum(t) from iteblog;
100
hive> select sum(distinct t) from iteblog;
70
3、平均值统计函数: avg
语法: avg(col), avg(DISTINCT col)
返回值: double
说明: avg(col)统计结果集中col的平均值；avg(DISTINCT col)统计结果中col不同值相加的平均值
hive> select avg(t) from iteblog;
50
hive> select avg (distinct t) from iteblog;
30
4、最小值统计函数: min
语法: min(col)
返回值: double
说明: 统计结果集中col字段的最小值
hive> select min(t) from iteblog;
20
5、最大值统计函数: max
语法: maxcol)
返回值: double
说明: 统计结果集中col字段的最大值
hive> select max(t) from iteblog;
120
6、非空集合总体变量函数: var_pop
语法: var_pop(col)
返回值: double
说明: 统计结果集中col非空集合的总体变量（忽略null）
7、非空集合样本变量函数: var_samp
语法: var_samp (col)
返回值: double
说明: 统计结果集中col非空集合的样本变量（忽略null）
8、总体标准偏离函数: stddev_pop
语法: stddev_pop(col)
返回值: double
说明: 该函数计算总体标准偏离，并返回总体变量的平方根，其返回值与VAR_POP函数的平方根相同
9、样本标准偏离函数: stddev_samp
语法: stddev_samp (col)
返回值: double
说明: 该函数计算样本标准偏离
10．中位数函数: percentile
语法: percentile(BIGINT col, p)
返回值: double
说明: 求准确的第pth个百分位数，p必须介于0和1之间，但是col字段目前只支持整数，不支持浮点数类型
11、中位数函数: percentile
语法: percentile(BIGINT col, array(p1 [, p2]…))
返回值: array
说明: 功能和上述类似，之后后面可以输入多个百分位数，返回类型也为array，其中为对应的百分位数。
select percentile(score,<0.2,0.4>) from iteblog；取0.2，0.4位置的数据
12、近似中位数函数: percentile_approx
语法: percentile_approx(DOUBLE col, p [, B])
返回值: double
说明: 求近似的第pth个百分位数，p必须介于0和1之间，返回类型为double，但是col字段支持浮点类型。参数B控制内存消耗的近似精度，B越大，结果的准确度越高。默认为10,000。当col字段中的distinct值的个数小于B时，结果为准确的百分位数
13、近似中位数函数: percentile_approx
语法: percentile_approx(DOUBLE col, array(p1 [, p2]…) [, B])
返回值: array
说明: 功能和上述类似，之后后面可以输入多个百分位数，返回类型也为array，其中为对应的百分位数。
14、直方图: histogram_numeric
语法: histogram_numeric(col, b)
返回值: array
说明: 以b为基准计算col的直方图信息。
hive> select histogram_numeric(100,5) from iteblog;
[{"x":100.0,"y":1.0}]
复合类型构建操作
1、Map类型构建: map
语法: map (key1, value1, key2, value2, …)
说明：根据输入的key和value对构建map类型
hive> Create table iteblog as select map('100','tom','200','mary') as t from iteblog;
hive> describe iteblog;
t map
hive> select t from iteblog;
{"100":"tom","200":"mary"}
2、Struct类型构建: struct
语法: struct(val1, val2, val3, …)
说明：根据输入的参数构建结构体struct类型
hive> create table iteblog as select struct('tom','mary','tim') as t from iteblog;
hive> describe iteblog;
t struct
hive> select t from iteblog;
{"col1":"tom","col2":"mary","col3":"tim"}
3、array类型构建: array
语法: array(val1, val2, …)
说明：根据输入的参数构建数组array类型
hive> create table iteblog as select array("tom","mary","tim") as t from iteblog;
hive> describe iteblog;
t array
hive> select t from iteblog;
["tom","mary","tim"]
复杂类型访问操作
1、array类型访问: A[n]
语法: A[n]
操作类型: A为array类型，n为int类型
说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'
hive> create table iteblog as select array("tom","mary","tim") as t from iteblog;
hive> select t[0],t[1],t[2] from iteblog;
tom mary tim
2、map类型访问: M[key]
语法: M[key]
操作类型: M为map类型，key为map中的key值
说明：返回map类型M中，key值为指定值的value值。比如，M是值为{'f' -> 'foo', 'b' -> 'bar', 'all' -> 'foobar'}的map类型，那么M['all']将会返回'foobar'
hive> Create table iteblog as select map('100','tom','200','mary') as t from iteblog;
hive> select t['200'],t['100'] from iteblog;
mary tom
3、struct类型访问: S.x
语法: S.x
操作类型: S为struct类型
说明：返回结构体S中的x字段。比如，对于结构体struct foobar {int foo, int bar}，foobar.foo返回结构体中的foo字段
hive> create table iteblog as select struct('tom','mary','tim') as t from iteblog;
hive> describe iteblog;
t struct
hive> select t.col1,t.col3 from iteblog;
tom tim
复杂类型长度统计函数
1.Map类型长度函数: size(Map)
语法: size(Map)
返回值: int
说明: 返回map类型的长度
hive> select size(map('100','tom','101','mary')) from iteblog;
2
2.array类型长度函数: size(Array)
语法: size(Array)
返回值: int
说明: 返回array类型的长度
hive> select size(array('100','101','102','103')) from iteblog;
4
3.类型转换函数
类型转换函数: cast
语法: cast(expr as )
返回值: Expected "=" to follow "type"
说明: 返回转换后的数据类型

hive> select cast(1 as bigint) from iteblog;

你可能感兴趣的:(hive)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
浅谈MapReduce Android路上的人 Hadoop 分布式计算 mapreduce 分布式框架 hadoop
从今天开始，本人将会开始对另一项技术的学习，就是当下炙手可热的Hadoop分布式就算技术。目前国内外的诸多公司因为业务发展的需要，都纷纷用了此平台。国内的比如BAT啦，国外的在这方面走的更加的前面，就不一一列举了。但是Hadoop作为Apache的一个开源项目，在下面有非常多的子项目，比如HDFS，HBase,Hive，Pig,等等，要先彻底学习整个Hadoop，仅仅凭借一个的力量，是远远不够的。
Presto【基础 01】简介+架构+数据源+数据模型 2401_84254343 程序员架构
一个Catalog包含Schema和Connector。例如，配置JMX的Catalog，通过JXMConnector访问JXM信息。当执行一条SQL语句时，可以同时运行在多个Catalog。Presto处理table时，是通过表的完全限定（fully-qualified）名来找到Catalog。例如，一个表的权限定名是hive.test_data.test，则test是表名，test_data是
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
大数据之flink与hive 星辰_mya 大数据 flink hive
其实吧我不太想写flink，因为线上经验确实不多，这也是我需要补的地方，没有条件创造条件，先来一篇吧flink：高性能低延迟流批一体的分布式计算框架基于事件时间对实时数据精准处理快速响应支持批处理，高效离线分析和数据挖掘数据仓库的引擎丰富数据源/接收器，集成多种数据存储格式和源，比较常见就是咱们今天的主题hive了checkpoint恢复机制，故障恢复快速恢复计算任务分布式弹性扩展，据业务灵活增加
hive血缘关系之输入表与目标表的解析 zxfBdd hive 大数据治理大数据
接了一个新需求：需要做数据仓库的血缘关系。正所谓兵来将挡水来土掩，那咱就动手吧。血缘关系是数据治理的一块，其实有专门的第三方数据治理框架，但考虑到目前的线上环境已经趋于稳定，引入新的框架无疑是劳民伤财，伤筋动骨，所以就想以最小的代价把这个事情给做了。目前我们考虑做的血缘关系呢只是做输入表和输出表，最后会形成一张表与表之间的链路图。这个东西的好处就是有助于仓库人员梳理业务，后面可能还会做字段之间的血
初级练习[3]:Hive SQL子查询应用大数据深度洞察 Hive hive sql hadoop 数据仓库大数据数据库
目录环境准备看如下链接子查询查询所有课程成绩均小于60分的学生的学号、姓名查询没有学全所有课的学生的学号、姓名解释：没有学全所有课，也就是该学生选修的课程数<总的课程数。查询出只选修了三门课程的全部学生的学号和姓名环境准备看如下链接环境准备https://blog.csdn.net/qq_45115959/article/details/142057624?spm=1001.2014.3001.5
Linux下载压缩包：tar.gz、zip、tar.bz2格式全攻略 promise524 Linux linux 运维服务器后端 bash shell
在Linux中，下载各种格式的压缩包（如.tar.gz、.zip、.tar.bz2等）通常使用命令行工具如wget和curl。1.使用wget下载压缩包wget是Linux中最常用的文件下载工具，支持HTTP、HTTPS、FTP等协议，可以直接从命令行下载文件。基本命令：wget[URL]下载.tar.gz文件wgethttps://test.com/archive.tar.gz此命令将从指定的U
Anaconda版本和Python版本对应关系纬领网络 python anaconda3
官网下载地址：https://repo.anaconda.com/archive/下载地址：https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/anaconda3版本基础python版本Anaconda3-2024.06-1Python3.12.4Anaconda3-2024.02-1Python3.11.7Anaconda3-2023.09
R语言包AMORE安装报错问题以及RStudio与Rtools环境配置卡卡_R-Python R语言数据分析与可视化 r语言开发语言
在使用R语言进行AMORE安装时会遇到报错，这时候需要采用解决办法：'''AMORE包安装，需要离线官网下载安装包：Indexof/src/contrib/Archive/AMORE(r-project.org)https://cran.r-project.org/src/contrib/Archive/AMORE/一、出现的问题最近开始学习R语言，安装了最新版的R4.4.1和RStudio，但安
中级练习[3]：Hive SQL用户行为与商品销售数据分析大数据深度洞察 Hive hive 数据仓库大数据 sql
目录1.用户累计消费金额及VIP等级查询1.1题目需求1.2代码实现2.首次下单后第二天连续下单的用户比率查询2.1题目需求2.2代码实现3.每个商品销售首年的年份、销售数量和销售金额统计3.1题目需求3.2代码实现1.用户累计消费金额及VIP等级查询1.1题目需求从订单信息表(order_info)中统计每个用户截止其每个下单日期的累积消费金额，以及每个用户在其每个下单日期的VIP等级。VIP等
Python基础知识进阶之正则表达式_头歌python正则表达式进阶前端陈萨龙程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
博客园怎么了？ YYH1992
新年好，给大家拜个早年！今年来到安徽过年，无聊中，不知不觉中又来到博客园了（忠实粉丝哦），却发现一件奇怪的事情，请看截图难道博客园被挂马了？抑或其它问题？如果真有问题，还请dudu抓紧时间修正，免得影响我们园子的声誉！我要下线了，出去买回家的车票了，只能年后回家了。。。转载于:https://www.cnblogs.com/HollisYao/archive/2008/02/06/1065351.
linux下文件的复制、移动与删除搬砖中年人
一、文件复制命令cp命令格式：cp[-adfilprsu]源文件(source)目标文件(destination)cp[option]source1source2source3...directory参数说明：-a:是指archive的意思，也说是指复制所有的目录-d:若源文件为连接文件(linkfile)，则复制连接文件属性而非文件本身-f:强制(force)，若有重复或其它疑问时，不会询问用户
2024年最全使用Python求解方程_python解方程(1)，字节面试官迟到 2401_84569545 程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
兼容 Trino Connector，扩展 Apache Doris 数据源接入能力｜Lakehouse 使用手册 vvvae1234 apache
ApacheDoris内置支持包括Hive、Iceberg、Hudi、Paimon、LakeSoul、JDBC在内的多种Catalog，并为其提供原生高性能且稳定的访问能力，以满足与数据湖的集成需求。而随着ApacheDoris用户的增加，新的数据源连接需求也随之增加。因此，从3.0版本开始，ApacheDoris引入了TrinoConnector兼容框架。Trino/Presto作为业界较早应用
SAP HANA makaitai BW sap 数据库工具报表 layer 服务器
原文地址：http://LiuAlex.com/archives/1776也是刚刚开始学习HANA的一些知识，一边看书一遍做笔记，说到底无非是用自己的语言来理解标准帮组文档所讲解的意思，肯定有理解失误的地方，毕竟没有参加过标准培训，即使有培训，从老师那边来的知识也不可能是完整的传授过来，中间多少的知识遗漏是正常的，所以多看看HELP的文档，应该可以原汁原味的理解作者的意思。这张图片是从SAPHAN
Hive SQL查询汇总分析大数据深度洞察 Hive hive sql hadoop 数据仓库数据库大数据
目录SQL查询汇总分析成绩查询查询编号为“02”的课程的总成绩查询参加考试的学生个数分组查询查询各科成绩最高和最低的分查询每门课程有多少学生参加了考试（有考试成绩）查询男生、女生人数分组结果的条件查询平均成绩大于60分的学生的学号和平均成绩查询至少选修四门课程的学生学号查询同姓（假设每个学生姓名的第一个字为姓）的学生名单并统计同姓人数大于2的姓查询每门课程的平均成绩，结果按平均成绩升序排序，平均成
RMAN-08137 rman delete archivelog force jnrjian 数据库 oracle
deleteforcearchiveloguntiltime'trunc(sysdate-4)'backedup1timestodevicetypedisk;SymptomsDatabaseAClonedtoDatabaseBonCloneserver.GoldenGateisConfiguredonSourcedatbaseA.DatabaseBwhichisclonedfromSourcedo
hive表格统计信息不准确 weixin_41956627 hive hive hadoop 数据仓库
问题描述有个hive分区表，orc存储格式，有个分区，查询selectcount(1)fromtablewheredt='yyyyMMdd'结果是0，但查询select*fromtablewheredt='yyyyMMdd'又能查到数据，去hdfs对应目录下查看，也能看到有数据文件解决执行如下sqlANALYZETABLEdb.table1PARTITION(dt='20240908')COMPU
Conda创建环境失败：000和404错误柚柚柚柚柚 conda
一、首先下载Anaconda1.打开网址Indexof/anaconda/archive/|清华大学开源软件镜像站|TsinghuaOpenSourceMirror，滑到最底部，下载Anaconda3-5.3.1-Linux-x86_64.sh。2.使用winscp拖动本地的Anaconda3-5.3.1-Linux-x86_64.sh到服务器的个人工作目录下。二、安装Anaconda软件，创建虚
C#中两个问号的含义 weixin_30363981 测试
stringstrParam=Request.Params["param"]??"";取??左边的值,如果??左边的值为null则取右边的值转载于:https://www.cnblogs.com/shadowtale/archive/2012/10/19/2731152.html
如何下载各个版本的tomcat-比如tomcat9 耳边轻语999 tomcat java
1，找到tomcat官网https://tomcat.apache.org/ApacheTomcat®-Welcome!找到tomcat9，或者archives1.1，找到对应版本1.2，找到小版本1.3，找到bin2，Indexof/dist/tomcat/tomcat-9/v9.0.39/bin2.1，下载对应的解压版本或者安装版本
Percona-toolkit工具详解小一_d28d
1.pt工具安装[root@master~]#yuminstall-ypercona-toolkit-3.1.0-2.el7.x86_64.rpm2.常用工具使用介绍2.1pt-archiver归档表#重要参数--limit100每次取100行数据用pt-archive处理--txn-size100设置100行为一个事务提交一次，--where'id>/root/db/checksum.logpt
Ubuntu更换apt-get的下载源愤愤的有痣青年
将以下内容替换/etc/apt/sources.list中的内容deb-srchttp://archive.ubuntu.com/ubuntuxenialmainrestricted#Addedbysoftware-propertiesdebhttp://mirrors.aliyun.com/ubuntu/xenialmainrestricteddeb-srchttp://mirrors.aliy
apt 下载指定架构的包及离线安装的方法错误重复学习记录 linux
#设置系统架构sudodpkg--add-architectureamd64#安装apt-rdependssudoaptinstallapt-rdepends#创建单独的目录mkdir-p/home/apt/postgresql-client-common#仅下载安装包sudoapt-getinstall--download-onlysudomv/var/cache/apt/archives/*/
游戏运营环节的一些关键转化率 turtle081025 数据分析游戏网络游戏运营
转载于http://www.gamedatas.com/archives/134转化率这个指标在各行各业的数据分析中运用的非常之广泛，例如：电商中就会存在，点击到订单生成的一系列转化率，传统的销售行业也会在做广告的时候考虑该广告能够转化多少订单，而在游戏行业，转化率同样是一个不容忽视的指标。一般来说，游戏运营的过程中主要会关注到这些转化率：1.下载-安装（激活）转化率；2.安装（激活）-注册转化率
Python API操作RocketMQ 京城小筑 #Python编程 python
背景：开发背景:公司相关报表需求需要将订单业务数据同步至RocketMQ中，由于需要保证开发的一致性(多个部门协同开发)，所以采用读取Hive离线数据的方式通过PythonAPI写入RocketMQ中，便于其他开发同事调用~开发环境:本地调试系统MacPython3.7.5rocketmq0.4.4(Python模块)rocketmq-client-python2.0.0(Python模块)服务器
hive搭建 -----内嵌模式和本地模式 lzhlizihang hive hadoop
文章目录一、内嵌模式（使用较少）1、上传、解压、重命名2、配置环境变量3、配置conf下的hive-env.sh4、修改conf下的hive-site.xml5、启动hadoop集群6、给hdfs创建文件夹7、修改hive-site.xml中的非法字符8、初始化元数据9、测试是否成功10、内嵌模式的缺点二、本地模式（最常用）1、检查mysql是否正常2、上传、解压、重命名3、配置环境变量4、修改c
HttpClient 4.3与4.3版本以下版本比较 spjich java httpclient
网上利用java发送http请求的代码很多，一搜一大把，有的利用的是java.net.*下的HttpURLConnection，有的用httpclient，而且发送的代码也分门别类。今天我们主要来说的是利用httpclient发送请求。 httpclient又可分为 httpclient3.x httpclient4.x到httpclient4.3以下 httpclient4.3
Essential Studio Enterprise Edition 2015 v1新功能体验 Axiba .net
概述：Essential Studio已全线升级至2015 v1版本了！新版本为JavaScript和ASP.NET MVC添加了新的文件资源管理器控件，还有其他一些控件功能升级，精彩不容错过，让我们一起来看看吧！ syncfusion公司是世界领先的Windows开发组件提供商，该公司正式对外发布Essential Studio Enterprise Edition 2015 v1版本。新版本
[宇宙与天文]微波背景辐射值与地球温度 comsci 背景
宇宙这个庞大,无边无际的空间是否存在某种确定的,变化的温度呢? 如果宇宙微波背景辐射值是表示宇宙空间温度的参数之一,那么测量这些数值,并观测周围的恒星能量输出值,我们是否获得地球的长期气候变化的情况呢? &nbs
lvs-server 男人50 server
#!/bin/bash # # LVS script for VS/DR # #./etc/rc.d/init.d/functions # VIP=10.10.6.252 RIP1=10.10.6.101 RIP2=10.10.6.13 PORT=80 case $1 in start) /sbin/ifconfig eth2:0 $VIP broadca
java的WebCollector爬虫框架 oloz 爬虫
WebCollector主页： https://github.com/CrawlScript/WebCollector 下载：webcollector-版本号-bin.zip将解压后文件夹中的所有jar包添加到工程既可。接下来看demo package org.spider.myspider; import cn.edu.hfut.dmic.webcollector.cra
jQuery append 与 after 的区别小猪猪08
1、after函数定义和用法： after() 方法在被选元素后插入指定的内容。语法： $(selector).after(content) 实例： <html> <head> <script type="text/javascript" src="/jquery/jquery.js"></scr
mysql知识充电香水浓 mysql
索引索引是在存储引擎中实现的，因此每种存储引擎的索引都不一定完全相同，并且每种存储引擎也不一定支持所有索引类型。根据存储引擎定义每个表的最大索引数和最大索引长度。所有存储引擎支持每个表至少16个索引，总索引长度至少为256字节。大多数存储引擎有更高的限制。MYSQL中索引的存储类型有两种：BTREE和HASH，具体和表的存储引擎相关； MYISAM和InnoDB存储引擎
我的架构经验系列文章索引 agevs 架构
下面是一些个人架构上的总结，本来想只在公司内部进行共享的，因此内容写的口语化一点，也没什么图示，所有内容没有查任何资料是脑子里面的东西吐出来的因此可能会不准确不全，希望抛砖引玉，大家互相讨论。要注意，我这些文章是一个总体的架构经验不针对具体的语言和平台，因此也不一定是适用所有的语言和平台的。（内容是前几天写的，现附上索引）前端架构 http://www.
Android so lib库远程http下载和动态注册 aijuans andorid
一、背景在开发Android应用程序的实现，有时候需要引入第三方so lib库，但第三方so库比较大，例如开源第三方播放组件ffmpeg库, 如果直接打包的apk包里面, 整个应用程序会大很多.经过查阅资料和实验，发现通过远程下载so文件，然后再动态注册so文件时可行的。主要需要解决下载so文件存放位置以及文件读写权限问题。二、主要
linux中svn配置出错 conf/svnserve.conf:12: Option expected 解决方法 baalwolf option
在客户端访问subversion版本库时出现这个错误： svnserve.conf:12: Option expected 为什么会出现这个错误呢，就是因为subversion读取配置文件svnserve.conf时，无法识别有前置空格的配置文件，如### This file controls the configuration of the svnserve daemon, if you##
MongoDB的连接池和连接管理 BigCat2013 mongodb
在关系型数据库中，我们总是需要关闭使用的数据库连接，不然大量的创建连接会导致资源的浪费甚至于数据库宕机。这篇文章主要想解释一下mongoDB的连接池以及连接管理机制，如果正对此有疑惑的朋友可以看一下。通常我们习惯于new 一个connection并且通常在finally语句中调用connection的close()方法将其关闭。正巧，mongoDB中当我们new一个Mongo的时候，会发现它也
AngularJS使用Socket.IO bijian1013 JavaScript AngularJS Socket.IO
目前，web应用普遍被要求是实时web应用，即服务端的数据更新之后，应用能立即更新。以前使用的技术（例如polling）存在一些局限性，而且有时我们需要在客户端打开一个socket，然后进行通信。 Socket.IO(http://socket.io/)是一个非常优秀的库，它可以帮你实
[Maven学习笔记四]Maven依赖特性 bit1129 maven
三个模块为了说明问题，以用户登陆小web应用为例。通常一个web应用分为三个模块，模型和数据持久化层user-core, 业务逻辑层user-service以及web展现层user-web， user-service依赖于user-core user-web依赖于user-core和user-service 依赖作用范围 Maven的dependency定义
【Akka一】Akka入门 bit1129 akka
什么是Akka Message-Driven Runtime is the Foundation to Reactive Applications In Akka, your business logic is driven through message-based communication patterns that are independent of physical locatio
zabbix_api之perl语言写法 ronin47 zabbix_api之perl
zabbix_api网上比较多的写法是python或curl。上次我用java－－http://bossr.iteye.com/blog/2195679，这次用perl。for example: #!/usr/bin/perl use 5.010 ; use strict ; use warnings ; use JSON :: RPC :: Client ; use
比优衣库跟牛掰的视频流出了，兄弟连Linux运维工程师课堂实录，更加刺激，更加实在！ brotherlamp linux运维工程师 linux运维工程师教程 linux运维工程师视频 linux运维工程师资料 linux运维工程师自学
比优衣库跟牛掰的视频流出了，兄弟连Linux运维工程师课堂实录，更加刺激，更加实在！ ----------------------------------------------------- 兄弟连Linux运维工程师课堂实录-计算机基础-1-课程体系介绍1 链接：http://pan.baidu.com/s/1i3GQtGL 密码：bl65 兄弟连Lin
bitmap求哈密顿距离-给定N（1<=N<=100000）个五维的点A(x1,x2,x3,x4,x5)，求两个点X(x1,x2,x3,x4,x5)和Y( bylijinnan java
import java.util.Random; /** * 题目： * 给定N（1<=N<=100000）个五维的点A(x1,x2,x3,x4,x5)，求两个点X(x1,x2,x3,x4,x5)和Y(y1,y2,y3,y4,y5)， * 使得他们的哈密顿距离（d=|x1-y1| + |x2-y2| + |x3-y3| + |x4-y4| + |x5-y5|）最大
map的三种遍历方法 chicony map
package com.test; import java.util.Collection; import java.util.HashMap; import java.util.Iterator; import java.util.Map; import java.util.Set; public class TestMap { public static v
Linux安装mysql的一些坑 chenchao051 linux
1、mysql不建议在root用户下运行 2、出现服务启动不了，111错误，注意要用chown来赋予权限，我在root用户下装的mysql，我就把usr/share/mysql/mysql.server复制到/etc/init.d/mysqld, (同时把my-huge.cnf复制/etc/my.cnf) chown -R cc /etc/init.d/mysql
Sublime Text 3 配置 daizj 配置 Sublime Text
Sublime Text 3 配置解释(默认){// 设置主题文件“color_scheme”: “Packages/Color Scheme – Default/Monokai.tmTheme”,// 设置字体和大小“font_face”: “Consolas”,“font_size”: 12,// 字体选项：no_bold不显示粗体字，no_italic不显示斜体字，no_antialias和
MySQL server has gone away 问题的解决方法 dcj3sjt126com SQL Server
MySQL server has gone away 问题解决方法，需要的朋友可以参考下。应用程序（比如PHP）长时间的执行批量的MYSQL语句。执行一个SQL，但SQL语句过大或者语句中含有BLOB或者longblob字段。比如，图片数据的处理。都容易引起MySQL server has gone away。今天遇到类似的情景，MySQL只是冷冷的说：MySQL server h
javascript/dom:固定居中效果 dcj3sjt126com JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&
使用 Spring 2.5 注释驱动的 IoC 功能 e200702084 spring bean 配置管理 IOC Office
使用 Spring 2.5 注释驱动的 IoC 功能 developerWorks 文档选项将打印机的版面设置成横向打印模式打印本页将此页作为电子邮件发送将此页作为电子邮件发送级别：初级陈雄华 ([email protected]), 技术总监, 宝宝淘网络科技有限公司 2008 年 2 月 28 日 &nb
MongoDB常用操作命令 geeksun mongodb
1. 基本操作 db.AddUser(username,password) 添加用户 db.auth(usrename,password) 设置数据库连接验证 db.cloneDataBase(fromhost)
php写守护进程（Daemon） hongtoushizi PHP
转载自： http://blog.csdn.net/tengzhaorong/article/details/9764655 守护进程（Daemon）是运行在后台的一种特殊进程。它独立于控制终端并且周期性地执行某种任务或等待处理某些发生的事件。守护进程是一种很有用的进程。php也可以实现守护进程的功能。 1、基本概念 &nbs
spring整合mybatis,关于注入Dao对象出错问题 jonsvien DAO spring bean mybatis prototype
今天在公司测试功能时发现一问题：先进行代码说明： 1，controller配置了Scope="prototype"（表明每一次请求都是原子型） @resource/@autowired service对象都可以（两种注解都可以）。 2，service 配置了Scope="prototype"（表明每一次请求都是原子型）
对象关系行为模式之标识映射 home198979 PHP 架构企业应用对象关系标识映射
HELLO!架构一、概念 identity Map:通过在映射中保存每个已经加载的对象，确保每个对象只加载一次，当要访问对象的时候，通过映射来查找它们。其实在数据源架构模式之数据映射器代码中有提及到标识映射，Mapper类的getFromMap方法就是实现标识映射的实现。二、为什么要使用标识映射？在数据源架构模式之数据映射器中 //c
Linux下hosts文件详解 pda158 linux
　1、主机名：　　无论在局域网还是INTERNET上，每台主机都有一个IP地址，是为了区分此台主机和彼台主机，也就是说IP地址就是主机的门牌号。　　公网：IP地址不方便记忆，所以又有了域名。域名只是在公网（INtERNET)中存在，每个域名都对应一个IP地址，但一个IP地址可有对应多个域名。　　局域网：每台机器都有一个主机名，用于主机与主机之间的便于区分，就可以为每台机器设置主机
nginx配置文件粗解 spjich java nginx
#运行用户#user nobody;#启动进程,通常设置成和cpu的数量相等worker_processes 2;#全局错误日志及PID文件#error_log logs/error.log;#error_log logs/error.log notice;#error_log logs/error.log inf
数学函数 w54653520 java
public class S { // 传入两个整数，进行比较，返回两个数中的最大值的方法。 public int get( int num1, int nu