Hive作为大数据Hadoop集群的数据仓库,实际使用自然是要用来数据处理和数据分析的,必然经常用到Hive函数,Hive函数分为内置函数
和自定义函数(user defined function,简称UDF)
;
内置函数
:跟其他常用数据库一样,伴随引擎自带的函数,用来满足绝大部分通用的数据处理和数据分析;自定义函数
:针对用户的特殊需求,需要频繁使用的某一段业务逻辑,但是Hive又不提供处理函数,就需要用户根据Hive的UDF开发指南自行创建的函数(下一篇详述,这里跳过)。 其实关于Hive的函数详解,最权威和经典的,当然要属于Hive的官方wiki:Hive Operators and User-Defined Functions (UDFs),界面安排如图1,这里就带着大家一起解读下官方界面的内置函数;
除了从内置活用户自定义函数来区分Hive函数,也可以根据函数的功能模块来分类,一般可划分为日期函数,字符串函数,数学函数,聚合函数,开窗函数,其他函数等,这里列举这些功能函数开发中使用较为频繁的Hive内置函数,至于其它偏门的,使用中再查吧,总览如表1;
函数类型 | 函数列表 |
---|---|
日期函数 | year、month、day、date_add、date_sub、datediff、from_unixtime、unix_timestamp、to_date等 |
字符串函数 | substr 、substring、 concat、 concat_ws 、 split 、 regexp_replace 、 replace、regexp_extract 、 get_json_object 、 trim 、instr、 length等 |
数学函数 | abs 、ceil、floor、round 、rand 、ipow,pmod 等 |
聚合函数 | count、max、min、avg、count distinct 、sum、group_concat 、collect_set、collect_list等 |
开窗函数 | row_number、lead 、lag、rank、dense_rank、max 、min 、count 等(此处的 max 、 min 、 count 等不同于聚合函数的,区别在于此处的计算在当前窗口内) |
其他函数 | coalesce 、 cast、 decode 、行列转换如 lateral view 、 explode 等 |
顾名思义就是处理日期相关的函数,注意:日期函数一般需要传入的参数为数据类型date
,timestamp
,格式为yyyy-MM-dd
或者yyyy-MM-dd hh:mm:ss
,而格式yyyyMMdd
形式,Hive可能会把它当成字符串,对这种格式的数据直接使用日期函数可能得到NULL
值,具体如下;
year、month、day、hour、minute、second
:year(date),month(date),day(date),hour(timestamp),minute(timestamp),second(timestamp)
timestamp
或者date
的日期,其他类型参数可能触发异常;date
,timestamp
的年,月,日,Hive Cli环境实现如下;hive> set hive.cli.print.header=true;
hive> select
> year('20200706') as `字符串年无效`
> ,month('20200706') as `字符串月无效`
> ,day('20200706') as `字符串日无效`
> ,year(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))) as `字符串年`
> ,month(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))) as `字符串月`
> ,day(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))) as `字符串日`
> ,year('2020-07-06') as `年`
> ,month('2020-07-06') as `月`
> ,day('2020-07-06') as `日`
> ,hour('2020-07-06 12:30:49') as `时`
> ,minute('2020-07-06 12:30:49') as `分`
> ,second('2020-07-06 12:30:49') as `秒`
> ;
OK
字符串年无效 字符串月无效 字符串日无效 字符串年 字符串月 字符串日 年 月 日 时 分 秒
NULL NULL NULL 2020 7 6 2020 7 6 12 30 49
Time taken: 0.177 seconds, Fetched: 1 row(s)
date_add
:date_add(timestamp/date time int days)
timestamp
或者date
的日期;日期类型
,取决于第二个参数的是正数还是负数,正数为后几天,即参数基础上往后加,负数则相反;hive> set hive.cli.print.header=true;
hive> select
> date_add('20200706',1) as `参数异常`,
> date_add('2020-07-06',1) as `明天`,
> date_add('2020-07-06',-1) as `昨天`;
OK
参数异常 明天 昨天
NULL 2020-07-07 2020-07-05
Time taken: 0.088 seconds, Fetched: 1 row(s)
date_sub
:date_sub(timestamp/date time, int days)
timestamp
或者date
的日期;日期类型
,取决于第二个参数的是正数还是负数,和date_add
的两个参数意义正负数相反,正数为前几天,即参数基础上往前减,负数则相反;hive> set hive.cli.print.header=true;
hive> select
> date_sub('20200706',1) as `参数异常`,
> date_sub('2020-07-06',1) as `昨天`,
> date_sub('2020-07-06',-1) as `明天`;
OK
参数异常 昨天 明天
NULL 2020-07-05 2020-07-07
Time taken: 0.609 seconds, Fetched: 1 row(s)
datediff
:datediff(timestamp/date enddate, timestamp/date startdate)
timestamp
或者date
的结束日期;timestamp
或者date
的开始日期;int
天数;hive> set hive.cli.print.header=true;
hive> select
> datediff('2020-07-06','2020-06-06') as `相差天数`,
> datediff('20200606','20200706') as `参数异常相差天数`;
OK
相差天数 参数异常相差天数
30 NULL
Time taken: 0.084 seconds, Fetched: 1 row(s)
unix_timestamp
:unix_timestamp(timestamp/date,格式)
timestamp
或者date
的日期;'yyMMdd','yy-MM-dd hh:mm:ss'
;Unix时间戳
;
hive> set hive.cli.print.header=true;
hive> select
> unix_timestamp('20200706','yyyyMMdd') as `日期时间戳`,
> unix_timestamp('20200706 18:34:56','yyyyMMdd hh:mm:ss') as `时间日期时间戳`
> ;
OK
日期时间戳 时间日期时间戳
1593964800 1594031696
Time taken: 0.061 seconds, Fetched: 1 row(s)
from_unixtime
:from_unixtime(unix_timestamp(timestamp/date,格式))
hive> set hive.cli.print.header=true;
hive> select
> from_unixtime(1593964800) as `日期1`
> ,from_unixtime(1594031696) as `日期2`
> ,from_unixtime(unix_timestamp('20200706','yyyyMMdd')) as `字符串转日期`
> ;
OK
日期1 日期2 字符串转日期
2020-07-06 00:00:00 2020-07-06 18:34:56 2020-07-06 00:00:00
Time taken: 0.07 seconds, Fetched: 1 row(s)
to_date
:to_date(timestamp/date)
timestamp
或者date
的日期;
hive> set hive.cli.print.header=true;
hive> select
> to_date('2020-07-06 12:30:30') as `日期`
> ,to_date(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))) as `字符串转日期`
> ;
OK
日期 字符串转日期
2020-07-06 2020-07-06
Time taken: 0.544 seconds, Fetched: 1 row(s)
综合使用,Hive SQL实现某一日期对应的上周六,假设这个某一日期是个动态日期,格式为yyMMdd
,刚好当前该日期取值为20200706
,明天则变成了20200707
,Hive SQL实现如下;
hive> set hive.cli.print.header=true;
hive> select
>
> unix_timestamp('20200706','yyyyMMdd') as `时间戳`,
> -- 获取当前时间戳 日期的格式和yyyyMMdd的格式要保持一致,不然报错或者为null,为什么是MM,和分钟区别
>
> from_unixtime(unix_timestamp('20200706','yyyyMMdd')) as `日期`,
> -- from_unixtime从时间戳返回到正常的日期
>
> datediff(to_date(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))),'1900-01-01') as `相差几天`,
> -- 获取20191120和1900-01-01直接相差几天,1900-01-01为最早的周一
>
>
> pmod(datediff(to_date(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))),'1900-01-01'),7)+1 as `周几`,
> --取相差的天数除以7的余数+1天,得到目前属于周几
>
> date_sub(to_date(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))),pmod(datediff(to_date(from_unixtime(unix_timestamp('20200706','yyyyMMdd'))),'1900-01-08'),7)+2) as `上周六`
> --获取上周六
> ;
OK
时间戳 日期 相差几天 周几 上周六
1593964800 2020-07-06 00:00:00 44016 1 2020-07-04
Time taken: 0.725 seconds, Fetched: 1 row(s)
更多的Hive内置日期函数,可以参考Hive官网日期函数wiki,不懂的欢迎下方留言一起学习;
处理字符相关的常用函数举例;
substring
substring(string,startindexi,endindex)
string
或者varchar
或者char
的字符串;hive> set hive.cli.print.header=true;
hive> select substring('abcderf',2,3) as `substring截取`
> ,substr('abcderf',2,3) as `substr截取`
> ,substring('abcderf',0,100) as `substring溢出位数`
> ,substring('abcderf',0,-1) as `substring溢出位数`
> ;
OK
substring截取 substr截取 substring溢出位数 substring溢出位数
bcd bcd abcderf
Time taken: 0.233 seconds, Fetched: 1 row(s)
substr
用法:substr(string,startindexi,endindex)
参数1:同substring一样;
返回值:同substring一样;
Hive Cli实现:同substring一样;
concat
用法:concat(agrs1,agrs2,agrs3,...,agrsn)
参数n:多个不同类型的字段
返回值:返回多个字段连接组合的结果;
Hive Cli实现:如下;
hive> set hive.cli.print.header=true;
hive> select concat('adc',1,22.3) as `字符连接`
> ;
OK
字符连接
adc122.3
Time taken: 0.066 seconds, Fetched: 1 row(s)
concat_ws
concat_ws(string regex,string1,string2,....,stringn)
string
的分隔符regex;string
的不同字符串字段;hive> set hive.cli.print.header=true;
hive> select concat_ws(',','adc','1','22.3') as `字符连接`
> ,concat_ws('/','跑步','游戏','看书') as `爱好`
> ;
OK
字符连接 爱好
adc,1,22.3 跑步/游戏/看书
Time taken: 0.231 seconds, Fetched: 1 row(s)
split
split(str,regex)
string
的字符串;string
的分隔符;hive> set hive.cli.print.header=true;
hive> select split('张三,李四,王五',',') as `合伙人`
> ;
OK
合伙人
["张三","李四","王五"]
Time taken: 0.103 seconds, Fetched: 1 row(s)
regexp_replace
regexp_replace(string, string1, string2)
regexp_replace
和replace
处理后的结果换行符被处理清洗掉,如下;hive> set hive.cli.print.header=true;
hive> select
> regexp_replace("foobar", "oo|ar", "") `regexp_replace替换`,
> replace("foobar", "oo|ar", "") as `replace替换`,
> 'foobar\n' as `源字符`,
> regexp_replace('foobar\n', '\n', '') `regexp_replace替换特殊字符`,
> replace('foobar\n', '\n', '') as `replace替换替换特殊字符`
> ;
OK
regexp_replace替换 replace替换 源字符 regexp_replace替换特殊字符 replace替换替换特殊字符
fb foobar foobar
foobar foobar
Time taken: 0.203 seconds, Fetched: 1 row(s)
replace
replace(string, string1, string2)
不支持正则
,但可以特殊字符的匹配不支持正则匹配
,支持特殊字符的替换;hive> set hive.cli.print.header=true;
hive> select
> regexp_replace("foobar", "oo|ar", "") `regexp_replace替换`,
> replace("foobar", "oo|ar", "") as `replace替换`,
> 'foobar\n' as `源字符`,
> regexp_replace('foobar\n', '\n', '') `regexp_replace替换特殊字符`,
> replace('foobar\n', '\n', '') as `replace替换替换特殊字符`
> ;
OK
regexp_replace替换 replace替换 源字符 regexp_replace替换特殊字符 replace替换替换特殊字符
fb foobar foobar
foobar foobar
Time taken: 0.203 seconds, Fetched: 1 row(s)
regexp_extract
regexp_extract(string subject, string pattern, int index)
string
源字符字段;string
匹配规则字符串,支持正则表达式;hive> set hive.cli.print.header=true;
hive> select
> regexp_extract('foothebar', 'foo(.*?)(bar)', 2) as `匹配第2组数据`,
> regexp_extract('foothebar', 'foo(.*?)(bar)', 1) as `匹配第1组数据`,
> regexp_extract('foothebar', 'foo(.*?)(bar)', 0) as `匹配第整组数据`
> ;
OK
匹配第2组数据 匹配第1组数据 匹配第整组数据
bar the foothebar
Time taken: 0.265 seconds, Fetched: 1 row(s)
get_json_object
get_json_object(string json_string, string path)
string
的json字符串;hive> set hive.cli.print.header=true;
hive> select
> get_json_object('{"name":"Jack","sex":"man","age":12}','$.name') as `姓名`
> ,get_json_object('{"name":"Jack","sex":"man","age":12}','$.sex') as `性别`
> ,get_json_object('{"name":"Jack","sex":"man","age":12}','$.age') as `年龄`
> ;
OK
姓名 性别 年龄
Jack man 12
Time taken: 0.47 seconds, Fetched: 1 row(s)
trim
trim(string A)
string
的字符串;hive> set hive.cli.print.header=true;
hive> select
>
> trim(" hello world ") as `去掉两端的空格`
> ,rtrim(" hello world ") as `只去掉右边空格`
> ,ltrim(" hello world ") as `只去掉左边空格`
> ;
OK
去掉两端的空格 只去掉右边空格 只去掉左边空格
hello world hello world hello world
Time taken: 0.042 seconds, Fetched: 1 row(s)
instr
instr(string str, string substr)
首次
出现的位置下标,从1开始算起,如果找不到则返回0;hive> set hive.cli.print.header=true;
hive>
> select
> instr("阿兹卡班的囚徒阿", "囚徒") as `位置下标1`,
> instr("阿兹卡班的囚徒", "mei") as `位置下标2`,
> instr("阿兹卡班的囚徒阿", "阿") as `位置下标3`
> ;
OK
位置下标1 位置下标2 位置下标3
6 0 1
Time taken: 0.048 seconds, Fetched: 1 row(s)
length
length(string A)
hive> set hive.cli.print.header=true;
hive>
> select
> length("阿兹卡班的囚徒") as `长度`
> ;
OK
长度
7
更多的字符操作函数,这里就不一一列举了,请参考表2,或者参考Hive字符串函数官方wiki;
返回类型 | 函数名 | 描述 |
---|---|---|
int | ascii(string str) | 返回str第一个字符串的数值 |
string | base64(binary bin) | 将二进制参数转换为base64字符串 |
string | concat(string | binary A, string |
array |
context_ngrams(array |
从一组标记化的句子中返回前k个文本 |
string | concat_ws(string SEP, string A, string B…) | 类似concat() ,但使用自定义的分隔符SEP |
string | concat_ws(string SEP, array) | 类似concat_ws() ,但参数为字符串数组 |
string | decode(binary bin, string charset) | 使用指定的字符集将第一个参数解码为字符串,如果任何一个参数为null,返回null。可选字符集为: ‘US_ASCII’, ‘ISO-8859-1’, ‘UTF-8’, ‘UTF-16BE’, ‘UTF-16LE’, ‘UTF-16’ |
binary | encode(string src, string charset) | 使用指定的字符集将第一个参数编码为binary ,如果任一参数为null,返回null |
int | find_in_set(string str, string strList) | 返回str在strList中第一次出现的位置,strList为用逗号分隔的字符串,如果str包含逗号则返回0,若任何参数为null,返回null。如: find_in_set(‘ab’, ‘abc,b,ab,c,def’) 返回3 |
string | format_number(number x, int d) | 将数字x格式化为’#,###,###.##’,四舍五入为d位小数位,将结果做为字符串返回。如果d=0,结果不包含小数点或小数部分 |
string | get_json_object(string json_string, string path) | 从基于json path的json字符串中提取json对象,返回json对象的json字符串,如果输入的json字符串无效返回null。Json 路径只能有数字、字母和下划线,不允许大写和其它特殊字符 |
boolean | in_file(string str, string filename) | 如果str在filename中以正行的方式出现,返回true |
int | instr(string str, string substr) | 返回substr在str中第一次出现的位置。若任何参数为null返回null,若substr不在str中返回0。Str中第一个字符的位置为1 |
int | length(string A) | 返回A的长度 |
int | locate(string substr, string str[, int pos]) | 返回substr在str的位置pos后第一次出现的位置 |
string | lower(string A) lcase(string A) | 返回字符串的小写形式 |
string | lpad(string str, int len, string pad) | 将str左侧用字符串pad填充,长度为len |
string | ltrim(string A) | 去掉字符串A左侧的空格,如:ltrim(’ foobar ')的结果为’foobar ’ |
array |
ngrams(array |
从一组标记化的Returns the top-k 句子中返回前K个N-grams |
string | parse_url(string urlString, string partToExtract [, string keyToExtract]) | 返回给定URL的指定部分,partToExtract的有效值包括HOST,PATH, QUERY, REF, PROTOCOL, AUTHORITY,FILE和USERINFO。例如: parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’, ‘HOST’) 返回 ‘facebook.com’.。当第二个参数为QUERY时,可以使用第三个参数提取特定参数的值,例如: parse_url(‘http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1’,‘QUERY’, ‘k1’) 返回’v1’ |
string | printf(String format, Obj… args) | 将输入参数进行格式化输出 |
string | regexp_extract(string subject, string pattern, int index) | 使用pattern从给定字符串中提取字符串。如: regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 2) 返回’bar’ 有时需要使用预定义的字符类:使用’\s’ 做为第二个参数将匹配s,'s’匹配空格等。参数index是Java正则匹配器方法group()方法中的索引 |
string | regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) | 使用REPLACEMENT替换字符串INITIAL_STRING中匹配PATTERN的子串,例如: regexp_replace(“foobar”, "oo |
string | repeat(string str, int n) | 将str重复n次 |
string | reverse(string A) | 将字符串A翻转 |
string | rpad(string str, int len, string pad) | 在str的右侧使用pad填充至长度len |
string | rtrim(string A) | 去掉字符串A右侧的空格,如: rtrim(’ foobar ‘) 返回 ’ foobar’ |
array |
sentences(string str, string lang, string locale) | 将自然语言文本处理为单词和句子,每个句子在适当的边界分割,返回单词的数组。参数lang和local为可选参数,例如: sentences(‘Hello there! How are you?’) 返回( (“Hello”, “there”), (“How”, “are”, “you”) ) |
string | space(int n) | 返回n个空格的字符串 |
array | split(string str, string pat) | 用pat分割字符串str,pat为正则表达式 |
map |
str_to_map(text[, delimiter1, delimiter2]) | 使用两个分隔符将文本分割为键值对。第一个分隔符将文本分割为K-V 对,第二个分隔符分隔每个K-V 对。默认第一个分隔符为“,“,第二个分隔符为= |
string | substr(string | binary A, int start) substring(string |
string | substr(string | binary A, int start, int len) substring(string |
string | translate(string input, string from, string to) | 将input中出现在from中的字符替换为to中的字符串,如果任何参数为null,结果为null |
string | trim(string A) | 去掉字符串A两端的空格 |
binary | unbase64(string str) | 将base64字符串转换为二进制 |
string | upper(string A) ucase(string A) | 返回字符串A的大写形式 |
用来进行数学计算的函数,常用的数学函数举例如下;
abs
:hive> set hive.cli.print.header=true;
hive>
> select
> abs(-12.89) as `绝对值1`
> ,abs(-28) as `绝对值2`
> ,abs(28) as `绝对值3`
> ;
OK
绝对值1 绝对值2 绝对值3
12.89 28 28
Time taken: 0.333 seconds, Fetched: 1 row(s)
ceil
:hive> set hive.cli.print.header=true;
hive>
> select
> ceil(-12.89) as `向上取整1`,
> ceil(11.239) as `向上取整2`
> ;
OK
向上取整1 向上取整2
-12 12
Time taken: 0.068 seconds, Fetched: 1 row(s)
floor
:hive> set hive.cli.print.header=true;
hive>
> select
> floor(-12.89) as `向下取整1`,
> floor(11.239) as `向下取整2`
> ;
OK
向下取整1 向下取整2
-13 11
Time taken: 0.063 seconds, Fetched: 1 row(s)
round
:round(double d)/round(double d,int n)
hive> set hive.cli.print.header=true;
hive>
> select
> round(12.89) as `四舍五入取整`,
> round(11.239123,2) as `四舍五入保留2为小数`
> ;
OK
四舍五入取整 四舍五入保留2为小数
13 11.24
rand
:rand()/rand(int seed)
hive> set hive.cli.print.header=true;
hive>
> select
> rand() as `随机数`,
> rand() as `随机数`,
> rand(3) as `固定随机因子随机数`,
> rand(3) as `固定随机因子随机数`,
> floor(rand()*10)%6+1 as `摇骰子一次`,
> floor(rand()*10)%6+1 as `摇骰子二次`
> ;
OK
随机数 随机数 固定随机因子随机数 固定随机因子随机数 摇骰子一次 摇骰子二次
0.4495149771126544 0.9207548698949779 0.731057369148862 0.731057369148862 6 2
Time taken: 0.647 seconds, Fetched: 1 row(s)
pow
:pow(DOUBLE a, DOUBLE p)/ power(DOUBLE a, DOUBLE p)
hive> set hive.cli.print.header=true;
hive>
> select
> pow(2,2) `2的2次方`,
> pow(2,3) `2的3次方`
> ;
OK
2的2次方 2的3次方
4.0 8.0
pmod
:pmod(INT a, INT b)
hive> set hive.cli.print.header=true;
hive>
> select
> pmod(5,2) `取模`
> ;
OK
取模
1
Time taken: 0.055 seconds, Fetched: 1 row(s)
更多的数学操作函数,请参考Hive官方数学函数wiki;
所谓聚合函数,通常是指跟随group by
关键字后面的维度指标统计分析的度量值,在日常数据分析中占有非常常用的地位,以下举例一些常用的聚合函数使用。
这里为了方便说明问题,准备好一张学生表,数据如下;
hive> set hive.cli.print.header=true;
hive> select * FROM ods_rs_basic_tbd_student where event_day='20200618';
OK
sno sname ssex sage classid event_week event_day event_hour
1 小明 男 15 6 25 20200618 00
2 小红 女 13 5 25 20200618 00
3 小丽 女 14 7 25 20200618 00
4 小华 男 17 1 25 20200618 00
5 小蓝 男 15 2 25 20200618 00
6 大林 男 14 3 25 20200618 00
7 大姝 女 13 5 25 20200618 00
8 大瑶 女 14 7 25 20200618 00
9 大发 男 17 1 25 20200618 00
10 大佬 男 15 4 25 20200618 00
10 NULL 男 14 1 25 20200618 00
11 大稳 男 14 NULL 25 20200618 00
Time taken: 2.708 seconds, Fetched: 12 row(s)
count
count(*)/count(expr)/count(distinct expr[, expr...])
NULL
的情况;count(expr)返回其中某一列的总行数,该列的NULL
不计入数量;count(distinct expr[, expr…])返回某些列去重后的数量,NULL
不计入数量;hive> set hive.cli.print.header=true;
hive> select
> count(*) as `所有行数`
> ,count(sname) as `学生数`
> ,count(distinct classid) as `班号数`
> from ods_rs_basic_tbd_student where event_day='20200618';
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = liuxiaowei_20200714143913_0c4645f8-03b1-431d-a8e1-f8e6bc0b96ac
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0071, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0071/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0071
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2020-07-14 14:39:22,464 Stage-1 map = 0%, reduce = 0%
2020-07-14 14:39:33,063 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.9 sec
2020-07-14 14:39:40,392 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.26 sec
MapReduce Total cumulative CPU time: 12 seconds 260 msec
Ended Job = job_1592876386879_0071
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 12.26 sec HDFS Read: 19855 HDFS Write: 107 SUCCESS
Total MapReduce CPU Time Spent: 12 seconds 260 msec
OK
所有行数 学生数 班号数
12 11 7
Time taken: 27.714 seconds, Fetched: 1 row(s)
max
min
hive> set hive.cli.print.header=true;
hive> select
> ssex as `学生性别`,
> max(sage) as `最大年龄`,
> min(sage) as `最小年龄`
> from ods_rs_basic_tbd_student
> where event_day='20200618'
> group by ssex
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = liuxiaowei_20200714145235_8c25f6ed-fc0a-4a85-8c7f-2cda501af316
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0072, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0072/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0072
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2020-07-14 14:52:43,145 Stage-1 map = 0%, reduce = 0%
2020-07-14 14:52:51,934 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.32 sec
2020-07-14 14:52:52,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 12.43 sec
2020-07-14 14:52:58,190 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.62 sec
MapReduce Total cumulative CPU time: 14 seconds 620 msec
Ended Job = job_1592876386879_0072
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 14.62 sec HDFS Read: 19691 HDFS Write: 137 SUCCESS
Total MapReduce CPU Time Spent: 14 seconds 620 msec
OK
学生性别 最大年龄 最小年龄
女 14 13
男 17 14
Time taken: 23.729 seconds, Fetched: 2 row(s)
avg
sum
hive> set hive.cli.print.header=true;
hive> select
> ssex as `学生性别`,
> sum(sage) as `年龄总和`,
> avg(sage) as `平均年龄`
> from ods_rs_basic_tbd_student
> where event_day='20200618'
> group by ssex
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = liuxiaowei_20200714150201_b4e790f7-340f-4ea9-b4dc-fe637cd4bec1
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0074, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0074/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0074
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2020-07-14 15:02:28,797 Stage-1 map = 0%, reduce = 0%
2020-07-14 15:02:36,135 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.58 sec
2020-07-14 15:02:41,370 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 11.75 sec
MapReduce Total cumulative CPU time: 11 seconds 750 msec
Ended Job = job_1592876386879_0074
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 11.75 sec HDFS Read: 20505 HDFS Write: 148 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 750 msec
OK
学生性别 年龄总和 平均年龄
女 54.0 13.5
男 121.0 15.125
Time taken: 41.839 seconds, Fetched: 2 row(s)
collect_set
collect_list
hive> set hive.cli.print.header=true;
hive> select
> ssex as `学生性别`,
> collect_set(classid) as `性别班级分布`,
> collect_list(classid) as `性别班级分布`
> from ods_rs_basic_tbd_student
> where event_day='20200618'
> group by ssex
> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = liuxiaowei_20200714153712_cd429f62-2507-470c-a3b7-2ecd1f45ce75
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1592876386879_0076, Tracking URL = http://dw-test-cluster-007:8088/proxy/application_1592876386879_0076/
Kill Command = /usr/local/tools/hadoop/current//bin/hadoop job -kill job_1592876386879_0076
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2020-07-14 15:37:20,134 Stage-1 map = 0%, reduce = 0%
2020-07-14 15:37:32,699 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 11.63 sec
2020-07-14 15:37:38,990 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 13.72 sec
MapReduce Total cumulative CPU time: 13 seconds 720 msec
Ended Job = job_1592876386879_0076
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 13.72 sec HDFS Read: 20372 HDFS Write: 161 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 720 msec
OK
学生性别 性别班级分布 性别班级分布
女 ["5","7"] ["5","7","5","7"]
男 ["6","1","2","3","4"] ["6","1","2","1","3","1","4"]
Time taken: 27.955 seconds, Fetched: 2 row(s)
更多的聚合函数,请参考Hive官方聚合函数wiki;
开窗函数太有意思了,是Hive函数分析的核心之一,这里单独出一篇博客讲解,具体请期待Hive从入门到放弃——玩一玩Hive的数据分析开窗函数(十三);
Hive因为支持复杂结构数据类型,如array
,map
,struct
等,针对这些复杂的数据类型,Hive也提供了一系列操作函数;
首选我们先准本一个复杂类型的表,DDL语句如下;
CREATE EXTERNAL TABLE `rowyet.employees`(
`name` string,
`salary` float,
`subordinates` array<string>,
`deductions` map<string,float>,
`address` struct<street:string,city:string,state:string,zip:int>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'colelction.delim'=',',
'field.delim'='|',
'line.delim'='\n',
'mapkey.delim'='\;',
'serialization.format'='|')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/rowyet/employees'
TBLPROPERTIES (
'transient_lastDdlTime'='1590204797')
数据预览如下;
hive> select * from rowyet.employees;
OK
name salary subordinates deductions address
John Doe 100000.0 ["Mary","SmithTodd","JonesFederal"] {"Taxes":2.0,"State":15.2} {"street":"Insurance.11","city":"Michigan","state":"Ave.ChicagoIL","zip":60600}
Mary Smith 80000.0 ["Bill","KingFedera"] {"Taxes":0.5,"State":10.2} {"street":"Insurance.1100","city":"Ontario","state":"St.ChicagoIL","zip":60601}
Todd Jones 70000.0 ["Federal"] {"Taxes":15.0} {"street":"Insurance.1200","city":"Chicago","state":"Ave.OakParkIL","zip":60700}
Bill King 60000.0 ["Federal"] {"Taxes":15.0} {"street":"Insurance.1300","city":"Obscure","state":"Dr.ObscuriaIL","zip":60100}
Boss Man 200000.0 ["John","DoeFred","FinanceFederal"] {"Taxes":13.0,"late":200.0} {"street":"Insurance.051","city":"Pretentious","state":"Drive.ChicagoIL","zip":60500}
Fred Finance 150000.0 ["Stacy","AccountantFederal"] {"Taxes":30.23,"others":102.9} {"street":"Insurance.052","city":"Pretentious","state":"Drive.ChicagoIL","zip":60500}
Stacy Accountant 60000.0 ["Federal"] {"Taxes":15.0} {"street":"Insurance.1300","city":"Main","state":"St.NapervilleIL","zip":60563}
Time taken: 0.169 seconds, Fetched: 7 row(s)
size
size(Map)/size(Array)
map_keys
map_keys
map_values
map_values(Map)
array_contains
array_contains(Array, value)
sort_array
sort_array(Array)
hive> set hive.cli.print.header=true;
hive> select
> size(subordinates) as `array元素个数`,
> size(deductions) as `map元素个数`,
> map_keys(deductions) as map_keys,
> map_values(deductions) as map_values,
> array_contains(subordinates,'Mary') as `array是否包含元素`,
> sort_array(subordinates) as `array元素排序`
> from rowyet.employees
> ;
OK
array元素个数 map元素个数 map_keys map_values array是否包含元素 array元素排序
3 2 ["Taxes","State"] [2.0,15.2] true ["JonesFederal","Mary","SmithTodd"]
2 2 ["Taxes","State"] [0.5,10.2] false ["Bill","KingFedera"]
1 1 ["Taxes"] [15.0] false ["Federal"]
1 1 ["Taxes"] [15.0] false ["Federal"]
3 2 ["Taxes","late"] [13.0,200.0] false ["DoeFred","FinanceFederal","John"]
2 2 ["Taxes","others"] [30.23,102.9] false ["AccountantFederal","Stacy"]
1 1 ["Taxes"] [15.0] false ["Federal"]
Time taken: 0.162 seconds, Fetched: 7 row(s)
explode
explode(ARRAY a)/explode(MAP m)
-- array类型的explode
hive> set hive.cli.print.header=true;
hive>
> select explode(subordinates) as `array_explode`
> from rowyet.employees;
OK
array_explode
Mary
SmithTodd
JonesFederal
Bill
KingFedera
Federal
Federal
John
DoeFred
FinanceFederal
Stacy
AccountantFederal
Federal
Time taken: 7.779 seconds, Fetched: 13 row(s)
-- map类型的explode
hive> set hive.cli.print.header=true;
> select explode(deductions) as (`array_explode_key`,`array_explode_value`)
> from rowyet.employees;
OK
array_explode_key array_explode_value
Taxes 2.0
State 15.2
Taxes 0.5
State 10.2
Taxes 15.0
Taxes 15.0
Taxes 13.0
late 200.0
Taxes 30.23
others 102.9
Taxes 15.0
Time taken: 0.14 seconds, Fetched: 11 row(s)
lateral view
lateral view explod(T)
-- 单列array行转列,那么可以改写为lateral view用法;
-- lateral view可以理解为把explode的操作存成了一个视图,
-- 然后选用处理过的视图的列
-- 注意select出来的字段都是来源 lateral view视图,为了区分我都加了re_开头
hive> set hive.cli.print.header=true;
hive>
> select tf.re_subordinates
> from rowyet.employees
> lateral view explode(subordinates) tf as re_subordinates;
OK
tf.re_subordinates
Mary
SmithTodd
JonesFederal
Bill
KingFedera
Federal
Federal
John
DoeFred
FinanceFederal
Stacy
AccountantFederal
Federal
Time taken: 0.126 seconds, Fetched: 13 row(s)
-- 单列map行转列,那么可以改写为lateral view用法;
-- lateral view可以理解为把explode的操作存成了一个视图,
-- 然后选用处理过的视图的列
-- 注意select出来的字段都是来源 lateral view视图,为了区分我都加了re_开头
hive> set hive.cli.print.header=true;
hive>
> select tf.array_explode_key,tf.array_explode_value
> from rowyet.employees
> lateral view explode(deductions) tf as `array_explode_key`,`array_explode_value`;
OK
tf.array_explode_key tf.array_explode_value
Taxes 2.0
State 15.2
Taxes 0.5
State 10.2
Taxes 15.0
Taxes 15.0
Taxes 13.0
late 200.0
Taxes 30.23
others 102.9
Taxes 15.0
Time taken: 0.086 seconds, Fetched: 11 row(s)
-- 多列复杂复合类型行转列,那么必须用lateral view用法;
-- lateral view可以理解为把explode的操作存成了一个个视图合并,
-- 然后选用处理过的视图的列
-- 注意select出来的字段都是来源 lateral view视图,为了区分我都加了re_开头
hive> set hive.cli.print.header=true;
hive>
> select tf.re_subordinates, tf.array_explode_key,tf.array_explode_value
> from rowyet.employees
> lateral view explode(subordinates) tf as re_subordinates
> lateral view explode(deductions) tf as `array_explode_key`,`array_explode_value`;
OK
tf.re_subordinates tf.array_explode_key tf.array_explode_value
Mary Taxes 2.0
Mary State 15.2
SmithTodd Taxes 2.0
SmithTodd State 15.2
JonesFederal Taxes 2.0
JonesFederal State 15.2
Bill Taxes 0.5
Bill State 10.2
KingFedera Taxes 0.5
KingFedera State 10.2
Federal Taxes 15.0
Federal Taxes 15.0
John Taxes 13.0
John late 200.0
DoeFred Taxes 13.0
DoeFred late 200.0
FinanceFederal Taxes 13.0
FinanceFederal late 200.0
Stacy Taxes 30.23
Stacy others 102.9
AccountantFederal Taxes 30.23
AccountantFederal others 102.9
Federal Taxes 15.0
Time taken: 0.07 seconds, Fetched: 23 row(s)
posexplode
posexplode (array)
-- 直接使用posexplode
hive> set hive.cli.print.header=true;
hive>
> select posexplode(subordinates) as (pos,subordinates)
> from rowyet.employees;
OK
pos subordinates
0 Mary
1 SmithTodd
2 JonesFederal
0 Bill
1 KingFedera
0 Federal
0 Federal
0 John
1 DoeFred
2 FinanceFederal
0 Stacy
1 AccountantFederal
0 Federal
Time taken: 0.155 seconds, Fetched: 13 row(s)
-- 结合lateral view使用
-- 注意select出来的字段都是来源 lateral view视图,为了区分我都加了re_开头
hive>
> select tf.re_pos,tf.re_subordinates
> from rowyet.employees
> lateral view posexplode(subordinates) tf as re_pos,re_subordinates;
OK
tf.re_pos tf.re_subordinates
0 Mary
1 SmithTodd
2 JonesFederal
0 Bill
1 KingFedera
0 Federal
0 Federal
0 John
1 DoeFred
2 FinanceFederal
0 Stacy
1 AccountantFederal
0 Federal
Time taken: 0.062 seconds, Fetched: 13 row(s)
inline
inline(ARRAY> a)
-- 直接使用inline
hive> set hive.cli.print.header=true;
hive> select inline(array(address)) as (street,city,state,zip)
> from rowyet.employees;
OK
street city state zip
Insurance.11 Michigan Ave.ChicagoIL 60600
Insurance.1100 Ontario St.ChicagoIL 60601
Insurance.1200 Chicago Ave.OakParkIL 60700
Insurance.1300 Obscure Dr.ObscuriaIL 60100
Insurance.051 Pretentious Drive.ChicagoIL 60500
Insurance.052 Pretentious Drive.ChicagoIL 60500
Insurance.1300 Main St.NapervilleIL 60563
Time taken: 0.178 seconds, Fetched: 7 row(s)
-- 结合lateral view 使用inline
hive>
> select tf.re_street,tf.re_city,tf.re_state,tf.re_zip
> from rowyet.employees
> lateral view inline(array(address)) tf as re_street,re_city,re_state,re_zip
> ;
OK
tf.re_street tf.re_city tf.re_state tf.re_zip
Insurance.11 Michigan Ave.ChicagoIL 60600
Insurance.1100 Ontario St.ChicagoIL 60601
Insurance.1200 Chicago Ave.OakParkIL 60700
Insurance.1300 Obscure Dr.ObscuriaIL 60100
Insurance.051 Pretentious Drive.ChicagoIL 60500
Insurance.052 Pretentious Drive.ChicagoIL 60500
Insurance.1300 Main St.NapervilleIL 60563
Time taken: 0.061 seconds, Fetched: 7 row(s)
stack
stack(int r,T1 V1,...,Tn/r Vn)
hive> set hive.cli.print.header=true;
hive>
> select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (name,age,my_date);
OK
name age my_date
A 10 2015-01-01
B 20 2016-01-01
Time taken: 0.151 seconds, Fetched: 2 row(s)
hive> select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as re_name,re_age,my_date;
OK
tf.re_name tf.re_age tf.my_date
A 10 2015-01-01
B 20 2016-01-01
Time taken: 0.044 seconds, Fetched: 2 row(s)
json_tuple
json_tuple(string jsonStr,string k1,...,string kn)
get_json_object
的另一种实现方式,在读取多个key时, json_tuple
比get_json_object
方法人性化,可以一次返回多个json的key对应的值;-- 复习下get_json_object的用法
hive> set hive.cli.print.header=true;
> select get_json_object('{"name":"Jack","sex":"man","age":12}','$.name') as `姓名`,
> get_json_object('{"name":"Jack","sex":"man","age":12}','$.sex') as `姓名`,
> get_json_object('{"name":"Jack","sex":"man","age":12}','$.age') as `姓名`
> ;
OK
姓名 姓名 姓名
Jack man 12
Time taken: 0.146 seconds, Fetched: 1 row(s)
-- json_tuple单独实现
hive>select
> json_tuple('{"name":"Jack","sex":"man","age":12}','name','sex','age') as (`姓名`,`性别`,`年龄`)
> ;
OK
姓名 性别 年龄
Jack man 12
Time taken: 0.038 seconds, Fetched: 1 row(s)
-- json_tuple结合lateral view实现
hive> select tf.*,t.num
> from (select 0 as num) t
> lateral view json_tuple('{"name":"Jack","sex":"man","age":12}','name','sex','age') tf as `re_姓名`,`re_性别`,`re_年龄`
> ;
OK
tf.re_姓名 tf.re_性别 tf.re_年龄 t.num
Jack man 12 0
Time taken: 0.061 seconds, Fetched: 1 row(s)
parse_url_tuple
parse_url_tuple(string urlStr,string p1,...,string pn)
parse_url()
吗?也是一样的,parse_url_tuple
在同时需要获取多个ulr部分的时候很人性化,可以一次性获取,parse_url()
则需要一次一次写;-- parse_url单项单个获取
hive> set hive.cli.print.header=true;
hive>
>
> select parse_url('http://facebook.com/path/p1.php?query=1&name=3', 'HOST') as host;
OK
host
facebook.com
Time taken: 0.076 seconds, Fetched: 1 row(s)
hive> select parse_url('http://facebook.com/path/p1.php?query=1&name=3', 'PATH') as path;
OK
path
/path/p1.php
Time taken: 0.043 seconds, Fetched: 1 row(s)
-- parse_url_tuple普通使用
hive> select parse_url_tuple('http://facebook.com/path/p1.php?query=1&name=3', 'HOST', 'PATH', 'QUERY')as (host,path,query);
OK
host path query
facebook.com /path/p1.php query=1&name=3
Time taken: 0.043 seconds, Fetched: 1 row(s)
-- parse_url_tuple结合lateral view来使用
hive> SELECT tf.*
> FROM (select 0 as num) src
> lateral view parse_url_tuple('http://facebook.com/path/p1.php?query=1&name=3', 'HOST', 'PATH', 'QUERY') tf as re_host, re_path, re_query;
OK
tf.re_host tf.re_path tf.re_query
facebook.com /path/p1.php query=1&name=3
Time taken: 0.086 seconds, Fetched: 1 row(s)
更多复杂数据类型操作的函数,请参看Hive复杂数据类型函数操作官方wiki;
用于Hive各类型之间的转换。
cast
cast(expr as )
hive> set hive.cli.print.header=true;
hive> select
> cast(1.0 as int) as `强类型转换`
> ;
OK
强类型转换
1
Time taken: 0.071 seconds, Fetched: 1 row(s)
更多类型转换的函数,请参看Hive类型转换官方wiki;
用户Hive的中各种条件的判别筛选;
if
if(boolean testCondition, T valueTrue, T valueFalseOrNull)
isnull/isnotnull
isnull(a)/isnotnull(a)
nvl
nvl(T value, T default_value)
coalesce
COALESCE(T v1, T v2, ...)
case
CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
ELSE
条件,但是判别的a值又不满足b,d等条件,则返回null值;nullif
nullif(a, b)
assert_true
assert_true(boolean condition)
hive> set hive.cli.print.header=true;
hive> select
> if(true,'yes','no') as `if条件1`,
> if(false,'yes','no') as `if条件2`,
> isnull(null) as `isnull条件1` ,
> isnull('hello') as `isnull条件2`,
> nvl(null, 0) as `nvl条件1`,
> nvl('hello', 0) as `nvl条件2`,
> coalesce(null,null,null,'hello') as `coalese条件`,
> case when 1=1 then 'hello' else 'bye' end as `case条件`,
> nullif('a','b') as `nullif条件1`,
> nullif('a','a') as `nullif条件2`,
> assert_true(true) as `assert_true条件1`
> ;
OK
if条件1 if条件2 isnull条件1 isnull条件2 nvl条件1 nvl条件2 coalese条件 case条件 nullif条件1 nullif条件2 assert_true条件1
yes no true false 0 hello hello hello a NULL NULL
Time taken: 0.055 seconds, Fetched: 1 row(s)
hive>
> select assert_true(2<1) as `assert_true条件2`
> ;
OK
assert_true条件2
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: ASSERT_TRUE(): assertion failed.
Time taken: 0.05 seconds
更多条件函数请参考Hive条件函数官方wiki;
Hive自带对敏感数据打马赛克的函数,如家庭住址号,手机电话等,再某些对外预览时,需要数据脱敏
处理,这个时候就可用到马赛克函数,即保证了数据的格式准确,又对数据进行了马赛克处理,保证了数据的安全;
mask
mask(string str[, string upper[, string lower[, string number]]])
X
;x
;n
;mask_first_n
mask_first_n(string str[, int n])
mask_last_n
mask_last_n(string str[, int n])
mask_show_first_n
mask_show_first_n(string str[, int n])
mask_show_last_n
mask_show_last_n(string str[, int n])
mask_hash
mask_hash(string|char|varchar str)
md5(string str[, int n])
,相同的字符,字段种子处理后的结果是一致的;
hive> set hive.cli.print.header=true;
hive> select
> mask('abcd-EFGH-8765-4321') as `默认打码1`
> ,mask('abcd-EFGH-8765-4321','U','l','#') as `自定义打码符号`
> ,mask_first_n('1234-5678-8765-4321', 4) as `前n位打码`
> ,mask_last_n('1234-5678-8765-4321', 4) as `后n位打码`
> ,mask_show_first_n('1234-5678-8765-4321', 4) as `排除前n位后的全部打码`
> ,mask_show_last_n('1234-5678-8765-4321', 4) as `排除后n位后的全部打码`
> ,mask_hash('abcd-EFGH-8765-4321') as `哈希打码`
> ,mask_hash('abcd-EFGH-8765-4321') as `哈希打码`
> ,md5('abcd-EFGH-8765-4321') as `md5处理`
> ;
OK
默认打码1 自定义打码符号 前n位打码 后n位打码 排除前n位后的全部打码 排除后n位后的全部打码 哈希打码 哈希打码 md5处理
xxxx-XXXX-nnnn-nnnn llll-UUUU-####-#### nnnn-5678-8765-4321 1234-5678-8765-nnnn 1234-nnnn-nnnn-nnnn nnnn-nnnn-nnnn-4321 60c713f5ec6912229d2060df1c322776 60c713f5ec6912229d2060df1c322776 60c713f5ec6912229d2060df1c322776
Time taken: 0.076 seconds, Fetched: 1 row(s)
更多马赛克函数请参考Hive马赛克函数官方wiki;
## ==其他杂项函数==
根据不同的功能衍生出来的函数,比较常用的也就一个`version()`函数;
- `version()`
**用法**:` version()`
**参数**:无;
**返回值**:返回当前Hive的版本;
**version()函数的Hive Cli实现**:如下;
```sql
hive> set hive.cli.print.header=true;
hive> select
> version() as `Hive版本`
> ;
OK
Hive版本
2.3.5 r76595628ae13b95162e77bba365fe4d2c60b3f29
Time taken: 0.064 seconds, Fetched: 1 row(s)
更多杂项函数请参考Hive杂项函数官方wiki;