Hive常用函数总结

目录：
一、关系运算
二、数学运算
三、逻辑运算
四、复杂的数据类型 array、map、struct
五、复杂类型访问操作
六、复杂类型长度统计函数
七、复合类型构造函数 map struct array
八、类型转换函数
九、日期函数
十、数值计算函数
十一、条件函数
十二、字符串函数
十三、混合函数
十四、汇总统计函数（UDAF）
十五、常用函数

查看hive内置函数

show functions;

查看某个函数用法：

//查看coalesce函数用法
desc function extended coalesce;

一、关系运算：

等值比较: =
语法：A=B
操作类型：所有基本类型
描述:如果表达式A与表达式B相等，则为TRUE；否则为FALSE

举例：
select * from person where 1=1;
select * from person where 1=2;

等值比较:<=>

语法：<=>
操作类型：所有基本类型
描述:如果表达式A与表达式B相等，则为TRUE；否则为FALSE
说明：作用于 =相同

举例：
select * from person where 1<=>1;
select * from person where 1<=>2;

不等值比较: <>和!=
语法: A <> B A != B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A与表达式B不相等，则为TRUE；否则为FALSE

举例：
select * from person where 1<>2;
select * from person where 1<>1;
select * from person where null<>null;---无查询结果
select * from person where 1 != 1;
select * from person where 1 != 2;
select * from person where null != null;---无查询结果

小于比较: <
语法: A < B
操作类型:所有基本类型
描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于表达式B，则为TRUE；否则为FALSE

举例：
select * from person where 1<2;---有查询结果
select * from person where 2<1; ---无查询结果
select * from person where null

 
  
  小于等于比较: <=
 语法: A <= B
 操作类型:所有基本类型
 描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A小于或者等于表达式B，则为TRUE；否则为FALSE 
  
 举例：
select * from person where 1<= 2;---有查询结果
select * from person where 2<= 1; ---无查询结果
select * from person where null<=null;---无查询结果
 
  
  大于比较: >
 语法: A > B
 操作类型:所有基本类型
 描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于表达式B，则为TRUE；否则为FALSE 
  
 举例：
select * from person where 1> 2;---无查询结果
select * from person where 2 >1; ---有查询结果
select * from person where null>null;---无查询结果
 
  
  大于等于比较: >=
 语法: A >= B
 操作类型:所有基本类型
 描述:如果表达式A为NULL，或者表达式B为NULL，返回NULL；如果表达式A大于或者等于表达式B，则为TRUE；否则为FALSE 
  
 举例：
select * from person where 1>= 2;---无查询结果
select * from person where 2 >=1; ---有查询结果
select * from person where 1>=1; ---有查询结果
select * from person where null>= null;---无查询结果
 
  
  区间比较 
  空值判断: IS NULL
 语法: A IS NULL
 操作类型:所有类型
 描述:如果表达式A的值为NULL，则为TRUE；否则为FALSE 
  
 举例：
select * from person where 1 is null;---无查询结果
select * from person where null is null;---有查询结果
 
  
   
     
    
   
   
  
  
  非空判断: IS NOT NULL
 语法: A IS NOT NULL
 操作类型:所有类型
 描述:如果表达式A的值为NULL，则为FALSE；否则为TRUE 
  
 举例：
select * from person where 1 IS NOT NULL;---有查询结果
select * from person where null IS NOT NULL;---无查询结果
 
  
  LIKE比较: LIKE
 语法: A LIKE B
 操作类型: strings
 描述:如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合表达式B 的正则语法，则为TRUE；否则为FALSE。B中字符”_”表示任意单个字符，而字符”%”表示任意数量的字符。 
  
 举例：
select1 from person where 'football' like 'foot%';
 
  
  JAVA的LIKE操作: RLIKE
 语法: A RLIKE B
 操作类型: strings
 描述:如果字符串A或者字符串B为NULL，则返回NULL；如果字符串A符合JAVA正则表达式B的正则语法，则为TRUE；否则为FALSE。 
  
 举例：
select 1 from person where '123456' rlike '^\\d+$';----判断一个字符串是否全为数字
select 1 from person where '12aa456' rlike '^\\d+$';
 
  
   
     
    
   
   
  
  
  REGEXP操作: REGEXP
 语法: A REGEXP B
 操作类型: strings
 描述:功能与RLIKE相同 
  
 举例：
select 1 from person where 'footbar' REGEXP '^f.*r$';---有查询结果
 
 二、数学运算： 
  
  加法操作: +
 语法: A + B
 操作类型：所有数值类型
 说明：返回A与B相加的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int + int 一般结果为int类型，而int + double一般结果为double类型 
  
 举例：
select 1+2 from person;
 
  
  减法操作: –
 语法: A– B
 操作类型：所有数值类型
 说明：返回A与B相减的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。比如，int– int 一般结果为int类型，而int– double 一般结果为double类型 
  
 举例：
select 5-3 from person;
select 5.2-3 from person;
 
  
   
     
    
   
   
  
  
  乘法操作: *
 语法: A * B
 操作类型：所有数值类型
 说明：返回A与B相乘的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。注意，如果A乘以B的结果超过默认结果类型的数值范围，则需要通过cast将结果转换成范围更大的数值类型 
  
 举例：
select 5*3 from person;
select 5.2*3 from person;
 
  
   
     
    
   
   
  
  
  除法操作: /
 语法: A / B
 操作类型：所有数值类型
 说明：返回A除以B的结果。结果的数值类型为double 
  
 举例：
select 5/3 from person;
select 6.0/3 from person;
select 6/3 from person;
 
  
   
     
    
   
   
  
  
  取余操作: %
 语法: A % B
 操作类型：所有数值类型
 说明：返回A除以B的余数。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。 
  
 举例：
select 41 % 5 from person;
 
  
   
     
    
   
   
  
  
  位与操作: &
 语法: A & B
 操作类型：所有数值类型
 说明：返回A和B按位进行与操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。 
  
 举例：
select 4 & 8 from person;-----不会这个位与操作，没听过
 
  
  位或操作: |
 语法: A | B
 操作类型：所有数值类型
 说明：返回A和B按位进行或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。 
  
 举例：
select 4 | 8 from person;-----不会这个位与操作，没听过,后期学习补
 
  
  位异或操作: ^
 语法: A ^ B
 操作类型：所有数值类型
 说明：返回A和B按位进行异或操作的结果。结果的数值类型等于A的类型和B的类型的最小父类型（详见数据类型的继承关系）。 
  
 举例：
 select 4 ^ 8 from person;-----不会这个位与操作，没听过,后期学习补
 
 9．位取反操作: ~
 语法: ~A
 操作类型：所有数值类型
 说明：返回A按位取反操作的结果。结果的数值类型等于A的类型。 
 举例： 
select ~6 ;
select 6 ;
 
  
   
     
    
   
   
  
 三、逻辑运算： 
  
  逻辑与操作: AND 、&&
 语法: A AND B
 操作类型：boolean
 说明：如果A和B均为TRUE，则为TRUE；否则为FALSE。如果A为NULL或B为NULL，则为NULL  
  
 举例：
select 1 from person where 1=1 and 2=2;
select 1 from person where 1=1 and 2<2;
 
  
   
     
    
   
   
  
  
  逻辑或操作: OR
 语法: A OR B
 操作类型：boolean
 说明：如果A为TRUE，或者B为TRUE，或者A和B均为TRUE，则为TRUE；否则为FALSE 
  
 举例：
select 1 from person  where 1=2 or 2<1;
select 1 from person  where 1=2 or 2>1;
 
  
  逻辑非操作: NOT
 语法: NOT A
 操作类型：boolean
 说明：如果A为FALSE，或者A为NULL，则为TRUE；否则为FALSE 
  
 举例：
select 1 from person  where not 1=2;
 
 四、复杂的数据类型 array、map、struct 
 Hive中支持多种数据类型除了常用的TINYINT、SMALLINT、INT、BIGINT、BOOLEAN、FLOAT、DOUBLE、STRING、BINARY、TIMESTAMP、DECIMAL、DATE、VARCHAR、CHAR类型外，当然还包含一些复杂的数据类型（array、map、struct、union）。 
  
  1、数组array的用法
 2.map的用法
 3.struct的用法 
  
 参考文章:Hive复合数据类型array,map,struct的使用 
 1、数组array的用法 
 Array数组类型：由一系列相同数据类型的元素组成。 
 实例数据array.txt：姓名和工作地点 
 Huangbo beijing,shanghai,tianjin,Hangzhou
Xuzheng tianjin,chengdu,wuhan 
Wangbaoqiang    wuhan,shenyang,jilin
 
 创建数据库表，该表中location的类型是数组类型 
 create table person(name string,location array) row format delimited fields terminated by "\t" collection items terminated by ",";

 
 数据加载到数据库 
 load data local inpath '/home/study/array.txt' into table person;
 
 一些查询操作 
 select * from person_array;

//array类型访问: A[n]
//操作类型: A为array类型，n为int类型
//说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'
select name,location[0],size(location ) from person;

select name from person  where array_contains(location ,'beijing');

select location[3],location[4] from person;
 
  
   
     
    
   
   
  
 2.map的用法 
 MAP：MAP包含key->value键值对，可以通过key来访问元素。比如”userlist”是一个map类型，其中username是key，password是value；那么我们可以通过userlist['username']来得到这个用户对应的password。 
 参考文章：Hive中复杂数据类型Map常用方法介绍 
 实例数据map.txt：姓名和学习成绩 
 huangbo yuwen:80,shuxue:89,yingyu:95
xuzheng yuwen:70,shuxue:65,yingyu:81
wangbaoqiang    yuwen:75,shuxue:100,yingyu:75
 
 创建数据库表 
 create table score(name string, scores map) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':';

desc formatted score;
 
 数据加载到数据库 
 load data local inpath '/home/study/map.txt' into table score;
 
 一些查询操作 
 select * from score;

select name from score; 

select scores from score; 
// map类型访问: M[key]
//语法: M[key]
//操作类型: M为map类型，key为map中的key值

 size(Map)函数：
 
  
   
     
    
   
   
  
 3.struct的用法 
 实例数据structtable.txt：学号、课程及得分 
 1   english,80
2   math,89
3   chinese,95
 
 创建数据库表 
 create table structtable(id int,course struct) row format delimited fields terminated by '\t' collection items terminated by ','; 
 
 数据加载到数据库 
 load data local inpath '/home/study/structtable.txt' into table structtable; 
 
 一些查询操作 
 select * from structtable;
select id from structtable;
select course from structtable;
select t.course.name from structtable t;
select t.course.score from structtable t;
 
  
   
     
    
   
   
  
 五、复杂类型访问操作 
 1. array类型访问: A[n]
 语法: A[n]
 操作类型: A为array类型，n为int类型
 说明：返回数组A中的第n个变量值。数组的起始下标为0。比如，A是个值为['foo', 'bar']的数组类型，那么A[0]将返回'foo',而A[1]将返回'bar'。 
 举例：
 select location[0],location[1],location[2] from person;
 
  
  
   
     
    
   
   
  
 
 2. map类型访问: M[key] 
 
 语法: M[key] 
 
 操作类型: M为map类型，key为map中的key值 
 
 说明：返回map类型M中，key值为指定值的value值。比如，M是值为{'f' -> 'foo', 'b'-> 'bar', 'all' -> 'foobar'}的map类型，那么M['all']将会返回'foobar' 
  
 举例：
select s.scores['shuxue'] from score s;
 
  
  
   
     
    
   
   
  
 
 3. struct类型访问: S.x 
 
 语法: S.x 
 
 操作类型: S为struct类型 
 
 说明：返回结构体S中的x字段。比如，对于结构体struct foobar {int foo, int bar}，foobar.foo返回结构体中的foo字段 
  
 举例：
select t.course.score from structtable t;
 
  
   
     
    
   
   
  
 六、复杂类型长度统计函数 
 1. Map类型长度函数: size(Map)
 语法: size(Map)
 返回值: int
 说明:返回map类型的长度 
 举例：
select size(map('100','tom','101','mary'));
select size(scores) from score ;
 
  
  
   
     
    
   
   
  
 
 2. array类型长度函数: size(Array) 
 
 语法: size(Array) 
 
 返回值: int 
 
 说明:返回array类型的长度 
  
 举例：
select size(array('100','101','102','103'));
select size(location) from person;
 
  
  
   
     
    
   
   
  
 
 3、struct不能使用size()统计类型的长度 
  
 七、复合类型构造函数 map struct array 
  
  Map类型构建: map
 语法: map (key1, value1, key2, value2,…)
 说明：根据输入的key和value对构建map类型 
  
 举例：
select map('100','tom','200','mary');
select map('yuwen',77,'shuxue',99);
 
  
   
     
    
   
   
  
  
  Struct类型构建: struct
 语法: struct(val1, val2, val3,…)
 说明：根据输入的参数构建结构体struct类型 
  
 举例：
select struct('tom','mary','tim');
 
  
   
     
    
   
   
  
  
  array类型构建: array
 语法: array(val1, val2,…)
 说明：根据输入的参数构建数组array类型 
  
 举例：
select array("tom","mary","tim");
 
  
   
     
    
   
   
  
 八、类型转换函数 
 1. 二进制转换：binary
 只有string、char、varchar或binary数据可以转换为二进制数据类型。 
 举例
select binary('3');
 
  
   
     
    
   
   
  
 2. 基础类型之间强制转换：cast
 CAST函数用于将某种数据类型的表达式显式转换为另一种数据类型。CAST()函数的参数是一个表达式，它包括用AS关键字分隔的源值和目标数据类型。
 语法：CAST (expression AS data_type) 
 举例
select  cast(123 as string);
select cast(345 AS double);
 
 九、日期函数 
  
  UNIX时间戳转日期函数: from_unixtime
 语法: from_unixtime(bigint unixtime[, string format])
 返回值: string
 说明:转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式。时间戳是指格林bai威治时间1970年01月01日00时00分00秒(北京du时间1970年01月01日08时00分00秒)起至现在的总秒数。  
  
 举例：
SELECT from_unixtime(1602034999, 'yyyy-MM-dd');
 
  
   
     
    
   
   
  
  
  获取当前UNIX时间戳函数: unix_timestamp
 语法: unix_timestamp()
 返回值: bigint
 说明:获得当前时区的UNIX时间戳 
  
 举例：
SELECT UNIX_TIMESTAMP();
 
  
   
     
    
   
   
  
  
  将当前时间转化为时间戳函数:unix_timestamp
 语法: unix_timestamp(string date)
 返回值: bigint
 说明:转换格式为"yyyy-MM-ddHH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回0。 
  
 举例：
select  unix_timestamp('2015-09-07 02:46:43');  //将当前时间转化为时间戳格式
 
  
   
     
    
   
   
  
  
  指定格式日期转UNIX时间戳函数:unix_timestamp
 语法: unix_timestamp(string date, string pattern)
 返回值: bigint
 说明:转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。 
  
 举例：
select unix_timestamp('20111207 13:01:03','yyyyMMddHH:mm:ss');
select unix_timestamp('20111207','yyyyMMdd');
 
  
   
     
    
   
   
  
  
  日期时间转日期函数: to_date
 语法: to_date(string timestamp)
 返回值: string
 说明:返回日期时间字段中的日期部分。 
  
 举例：
select to_date('2018-12-08 10:03:01');--2018-12-08  返回日期时间字段中的日期部分
 
  
   
     
    
   
   
  
  
  日期转年函数: year
 语法: year(string date)
 返回值: int
 说明:返回日期中的年。 
  
 举例：
select year('2018-12-08 10:03:01');--2018 返回日期时间字段中的年
select year('2018-12-08');--2018 返回日期时间字段中的年
 
  
  日期转月函数: month
 语法: month (string date)
 返回值: int
 说明:返回日期中的月份。 
  
 举例：
select month('2018-12-08 10:03:01');--2018 返回日期时间字段中的月
select month('2018-12-08');--12
 
  
  日期转天函数: day
 语法: day (string date)
 返回值: int
 说明:返回日期中的天。 
  
 举例：
select day('2018-12-08 10:03:01');--8 返回日期时间字段中的日
select day('2018-12-08');--8
 
  
  日期转小时函数: hour
 语法: hour (string date)
 返回值: int
 说明:返回日期中的小时。 
  
 举例：
select hour('2018-12-08 10:03:01');--10返回日期时间字段中的小时

 
  
  日期转分钟函数: minute
 语法: minute (string date)
 返回值: int
 说明:返回日期中的分钟。 
  
 举例：
select minute('2018-12-08 10:03:01');-----3 返回日期中的分钟
 
  
  日期转秒函数: second
 语法: second (string date)
 返回值: int
 说明:返回日期中的秒。 
  
 举例：
select second('2018-12-08 10:03:01');-----1 返回日期中的秒
 
  
  日期转周函数: weekofyear
 语法: weekofyear (string date)
 返回值: int
 说明:返回日期在当前年的周数。 
  
 举例：
select weekofyear('2018-01-08 10:03:01');--返回本年的第几周
 
  
  日期比较函数: datediff
 语法: datediff(string enddate, string startdate)
 返回值: int
 说明:返回结束日期减去开始日期的天数。 
  
 举例：
select datediff('2019-07-02','2019-07-23'),datediff('2020-07-02','2019-07-23');
//求第一个时间于第二个时间相差的时间天数
 
  
   
     
    
   
   
  
  
  日期增加函数: date_add
 语法: date_add(string startdate, int days)
 返回值: string
 说明:返回开始日期startdate增加days天后的日期。 
  
 举例：
select date_add('2019-07-02', 22);//当前日期增加22天
 
  
  日期减少函数: date_sub
 语法: date_sub (string startdate, int days)
 返回值: string
 说明:返回开始日期startdate减少days天后的日期。 
  
 举例：
select date_sub('2019-07-12',10);//当前日期减少10天
 
 16.获取当前时间：current_timestamp 
 select current_timestamp;//获取当前日期
 
  
   
     
    
   
   
  
 十、数值计算函数 
  
  取整函数: round
 语法: round(double a)
 返回值: BIGINT
 说明:返回double类型的整数值部分（遵循四舍五入） 
  
 举例：
select round(2.6);---3.0,四舍五入取整
 
  
   
     
    
   
   
  
  
  指定精度取整函数: round
 语法: round(double a, int d)
 返回值: DOUBLE
 说明:返回指定精度d的double类型 
  
 举例：
select round(1.23454,2);--1.23 四舍五入保留两位小数
select round(1213232,-2);--1213200
 
  
   
     
    
   
   
  
  
  向下取整函数: floor ，往下取整
 语法: floor(double a)
 返回值: BIGINT
 说明:返回等于或者小于该double变量的最大的整数 
  
 举例：
select  floor(1.3) ;-- 1
select  floor(1.99) ;-- 1
select  floor(-1.3) ;--    -2
select  floor(-1.99) ;--    -2
 
  
   
     
    
   
   
  
  
  向上取整函数: ceil
 语法: ceil(double a)
 返回值: BIGINT
 说明:返回等于或者大于该double变量的最小的整数 
  
 举例：
select  ceil(1.0)  ;--  1
select  ceil(1.0001) ;--  2
select  ceil(1.99) ;--  2 
select  ceil(1.29)  ;--  2 
select  ceil(-1.3)  ;--    -1
 
  
   
     
    
   
   
  
  
  向上取整函数: ceiling
 语法: ceiling(double a)
 返回值: BIGINT
 说明:与ceil功能相同 
  
 举例：
select  ceiling(1.0);--1
select  ceiling(1.0001);--2
select  ceiling(1.99);-- 2 
select  ceiling(1.29);-- 2 
select  ceiling(-1.3) ;--     -1
 
  
   
     
    
   
   
  
  
  取随机数函数: rand
 语法: rand(),rand(int seed)
 返回值: double
 说明:返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列 
  
 举例：
select rand();----返回值: double,返回一个0到1范围内的随机数
select rand(rand(int seed));
select rand(3);--------返回值: double,会等到一个稳定的随机数序列
 
  
   
     
    
   
   
  
  
  自然指数函数: exp
 语法: exp(double a)
 返回值: double
 说明:返回自然对数e的a次方 
  
 举例：
select exp(2);
 
  
   
     
    
   
  
    忘记自然数指数了…… 
   
  
  
  以10为底对数函数: log10
 语法: log10(double a)
 返回值: double
 说明:返回以10为底的a的对数 
  
 举例：
select  log10(35);
select  log10(100);
 
  
   
     
    
   
  
    忘记自然数以10为底对数了…… 
   
  
  
  以2为底对数函数: log2
 语法: log2(double a)
 返回值: double
 说明:返回以2为底的a的对数 
  
 举例：
select  log2(8);
 
  
   
     
    
   
   
  
  
  对数函数: log
 语法: log(double base, double a)
 返回值: double
 说明:返回以base为底的a的对数 
  
 举例：
select log(100);
 
  
   
     
    
   
   
  
  
  幂运算函数: pow
 语法: pow(double a, double p)
 返回值: double
 说明:返回a的p次幂 
  
 举例：select pow(2,3); ---计算2的3次幂

 
  
   
     
    
   
   
  
  
  幂运算函数: power
 语法: power(double a, double p)
 返回值: double
 说明:返回a的p次幂,与pow功能相同 
  
 举例：select power(2,4) ;
 
  
   
     
    
   
   
  
  
  开平方函数: sqrt
 语法: sqrt(double a)
 返回值: double
 说明:返回a的平方根 
  
 举例：select sqrt(16);----返回16的平方根

 
  
   
     
    
   
   
  
  
  二进制函数: bin
 语法: bin(BIGINT a)
 返回值: string
 说明:返回a的二进制代码表示 
  
 举例：select bin(8);
 
  
   
     
    
   
   
  
  
  十六进制函数: hex
 语法: hex(BIGINT a)
 返回值: string
 说明:如果变量是int类型，那么返回a的十六进制表示；如果变量是string类型，则返回该字符串的十六进制表示 
  
 举例：select hex(30);

 
  
   
     
    
   
   
  
  
  反转十六进制函数: unhex
 语法: unhex(string a)
 返回值: string
 说明:返回该十六进制字符串所代码的字符串 
  
 举例：
select unhex(616263);
 
  
   
     
    
   
  
    我也不知道这个怎么运用 
   
  
  
  进制转换函数: conv
 语法: conv(BIGINT num, int from_base, int to_base)
 返回值: string
 说明:将数值num从from_base进制转化到to_base进制 
  
 举例：
select conv(18,10,4);---将18从十进制转化成4进制
 
  
   
     
    
   
   
  
  
  绝对值函数: abs
 语法: abs(double a) abs(int a)
 返回值: double int
 说明:返回数值a的绝对值 
  
 举例：
select abs(-3.9);
 
  
   
     
    
   
   
  
  
  正取余函数: pmod
 语法: pmod(int a, int b),pmod(double a, double b)
 返回值: int double
 说明:返回正的a除以b的余数 
  
 举例：
select pmod(9,2);
 
  
   
     
    
   
   
  
  
  正弦函数: sin
 语法: sin(double a)
 返回值: double
 说明:返回a的正弦值 
  
 举例：
select sin(0);
 
  
   
     
    
   
   
  
  
  反正弦函数: asin
 语法: asin(double a)
 返回值: double
 说明:返回a的反正弦值 
  
 举例：
select asin(1);
 
  
   
     
    
   
   
  
  
  余弦函数: cos
 语法: cos(double a)
 返回值: double
 说明:返回a的余弦值 
  
 举例：
select cos(0);
 
  
   
     
    
   
   
  
  
  反余弦函数: acos
 语法: acos(double a)
 返回值: double
 说明:返回a的反余弦值 
  
 举例：
select acos(1);
 
  
   
     
    
   
   
  
  
  positive函数: positive
 语法: positive(int a), positive(double a)
 返回值: int double
 说明:返回a 
  
 举例：
select positive(10）;
 
  
   
     
    
   
   
  
  
  negative函数: negative
 语法: negative(int a), negative(double a)
 返回值: int double
 说明:返回-a 
  
 举例：
select negative(5);
 
  
   
     
    
   
   
  
 十一、条件函数 
  
  If函数: if
 语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
 返回值: T
 说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull 
  
 举例：
select if(1=2,100,200)；
select if(1=1,100,200)；
 
  
   
     
    
   
   
  
  
  非空查找函数: COALESCE
 语法: COALESCE(T v1, T v2,…)
 返回值: T
 说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL 
  
 举例：
select COALESCE(null,null,null) ;
select COALESCE(null,'100','50') ;
 
  
   
     
    
   
   
  
  
  nvl函数：空值转换函数。只能传2个参数
 若expr1为Null，则返回expr2，否则返回expr1。但是expr1和expr2的数据类型必须为相同类型。 
  
 select nvl('asc','asd'),nvl(null,'123'),nvl('123',null),nvl(null,null);
 
  
   
     
    
   
   
  
  
  条件判断函数：CASE
 语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
 返回值: T
 说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f 
  
 举例：
Select case 100 when 50 then 'tom' when 100 then 'mary'else 'tim' end;
Select case 200 when 50 then 'tom' when 100 then 'mary'else 'tim' end；
 
  
   
     
    
   
   
  
  
  条件判断函数：CASE
 语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
 返回值: T
 说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e 
  
 举例：
select case when 1=2 then 'tom' when 2=2 then 'mary' else'tim' end;
select case when 1=1 then 'tom' when 2=2 then 'mary' else'tim' end;
 
  
   
     
    
   
   
  
 十二、字符串函数 
  
  字符ascii码函数：ascii
 语法: ascii(string str)
 返回值: int
 说明：返回字符串str第一个字符的ascii码 
  
 举例：
select ascii('abcde')；
 
  
  base64字符串 
  字符串连接函数：concat
 语法: concat(string A, string B…)
 返回值: string
 说明：返回输入字符串连接后的结果，支持任意个输入字符串 
  
 举例：
select concat('abc','def','gh');
select concat('abc','-','def','-','gh');
 
  
  带分隔符字符串连接函数：concat_ws
 语法: concat_ws(string SEP, string A, string B…)
 返回值: string
 说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符 
  
 举例：
select concat_ws('-','abc','def','gh') ;
 
  
  数组转换成字符串的函数：concat_ws 
  小数位格式化成字符串函数：format_number 
  字符串截取函数：substr,substring
 语法: substr(string A, int start),substring(string A, int start)
 返回值: string
 说明：返回字符串A从start位置到结尾的字符串 
  
 举例：
select substr('abcde',3) ;
select substring('abcde',3);
 
  
   
     
    
   
   
  
  
  字符串截取函数：substr,substring
 语法: substr(string A, int start, int len),substring(string A, intstart, int len)
 返回值: string
 说明：返回字符串A从start位置开始，长度为len的字符串 
  
 举例：
select substr('abcde',3,2);---cd
 select substring('abcde',3,2);
select substring('abcde',-2,2);--de
 
  
  字符串查找函数：instr
 字符串位置查找函数 
  
 举例：
select instr('abc','b');--2
 
  
  字符串长度函数：length
 语法: length(string A)
 返回值: int
 说明：返回字符串A的长度 
  
 举例：
select length('abcedfg')；---7
 
  
  字符串查找函数：locate 
  字符串格式化函数：printf 
  字符串转换成map函数：str_to_map 
  base64解码函数：unbase64(string str) 
  字符串转大写函数：upper,ucase
 语法: upper(string A) ucase(string A)
 返回值: string
 说明：返回字符串A的大写格式 
  
 举例：
select upper('abSEd');
select ucase('abSEd');
 
  
  字符串转小写函数：lower,lcase
 语法: lower(string A) lcase(string A)
 返回值: string
 说明：返回字符串A的小写格式 
  
 举例：
select lower('abSEd');
select lcase('abSEd');
 
  
  去空格函数：trim
 语法: trim(string A)
 返回值: string
 说明：去除字符串两边的空格  
  
 举例：
select trim(' abc ');
 
  
  左边去空格函数：ltrim
 语法: ltrim(string A)
 返回值: string
 说明：去除字符串左边的空格 
  
 举例：
select ltrim(' abc ');
 
  
  右边去空格函数：rtrim
 语法: rtrim(string A)
 返回值: string
 说明：去除字符串右边的空格 
  
 举例：
select rtrim(' abc ');
 
  
  正则表达式替换函数：regexp_replace
 语法: regexp_replace(string A, string B, string C)
 返回值: string
 说明：将字符串A中的符合java正则表达式B的部分替换为C。注意，在有些情况下要使用转义字符,类似oracle中的regexp_replace函数。 
  
 举例：
select regexp_replace('foobar', 'oo|ar', '');---fb
 
  
  正则表达式解析函数：regexp_extract
 语法: regexp_extract(string subject, string pattern, int index)
 返回值: string
 说明：将字符串subject按照pattern正则表达式的规则拆分，返回index指定的字符。 
  
 举例：
select regexp_extract('foothebar', 'foo(.*?)(bar)', 1);---the
//不会正则
 
  
  URL解析函数：parse_url
 语法: parse_url(string urlString, string partToExtract [, stringkeyToExtract])
 返回值: string
 说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. 
  
 举例：
select parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST');---facebook.com
 
  
  json解析函数：get_json_object
 语法: get_json_object(string json_string, string path)
 返回值: string
 说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。 
  
  
  空格字符串函数：space
 语法: space(int n)
 返回值: string
 说明：返回长度为n的字符串 
  
 举例：
select space(10);
select length(space(10));---10
 
  
  重复字符串函数：repeat
 语法: repeat(string str, int n)
 返回值: string
 说明：返回重复n次后的str字符串 
  
 举例：
select repeat('abc',5);--abcabcabcabcabc
 
  
  左补足函数：lpad
 语法: lpad(string str, int len, string pad)
 返回值: string
 说明：将str进行用pad进行左补足到len位 
  
 举例：
select lpad('abc',10,'td');---tdtdtdtabc
//注意：与GP，ORACLE不同，pad不能默认
 
  
  右补足函数：rpad
 语法: rpad(string str, int len, string pad)
 返回值: string
 说明：将str进行用pad进行右补足到len位 
  
 举例：
select rpad('abc',10,'td');--abctdtdtdt
 
  
  分割字符串函数: split
 语法: split(string str, stringpat)
 返回值: array
 说明:按照pat字符串分割str，会返回分割后的字符串数组 
  
 举例：
select split('abtcdtef','t');--["ab","cd","ef"]
 
  
  集合查找函数: find_in_set
 语法: find_in_set(string str, string strList)
 返回值: int
 说明:返回str在strlist第一次出现的位置，strlist是用逗号分割的字符串。如果没有找该str字符，则返回0 
  
 举例：
select find_in_set('ab','ef,ab,de');
select find_in_set('at','ef,ab,de') ;
 
  
   
     
    
   
   
  
  
  分词函数：sentences
 将字符串中内容按语句分组，每个单词间以逗号分隔，最后返回数组。 
  
 举例：
select sentences('Hello there! How are you?');
select sentences('Hello there How are you?');
 
  
   
     
    
   
   
  
  
  分词后统计一起出现频次最高的TOP-K 
  分词后统计与指定单词一起出现频次最高的TOP-K 
  
 十三、混合函数 
  
  调用Java函数：java_method 
  调用Java函数：reflect 
  字符串的hash值：hash
 十四、XPath解析XML函数
 参考文章：Hive常用函数 -- 混合函数和XPath 解析 XML 函数 
  xpath
 xpath
 语法: xpath(string xmlstr,string xpath_expression)
 返回值: array
 说明: 从 xml 字符串中返回匹配到表达式的结果数组。 
  
 select xpath('b1b2c1','a/b/text()');
---["b1","b2"]
 
  
  xpath_string
 语法: xpath_string(string xmlstr,string xpath_expression)
 返回值: string
 说明: 默认情况下，从 xml 字符串中返回第一个匹配到表达式的节点的值。 
  
 SELECT xpath_string ('b1b2', '//b')；--b1

//指定返回匹配到哪一个节点
hive> SELECT xpath_string ('b1b2', '//b[2]');--b2

 
  
  xpath_boolean
 语法: xpath_boolean (string xmlstr,string xpath_expression)
 返回值: boolean
 说明: 返回 xml 字符串中是否匹配 xml 表达式 
  
 SELECT xpath_boolean ('b', 'a/b');--true
 
  
  xpath_short, xpath_int, xpath_long
 语法: xpath_short (string xmlstr,string xpath_expression)
 xpath_int (string xmlstr,string xpath_expression)
 xpath_long (string xmlstr,string xpath_expression)
 返回值: int
 说明: 返回 xml 字符串中经过 xml 表达式计算后的值，如果不匹配，则返回 0。 
  xpath_float, xpath_double, xpath_number
 语法: xpath_float (string xmlstr,string xpath_expression)
 xpath_double (string xmlstr,string xpath_expression)
 xpath_number (string xmlstr,string xpath_expression)
 返回值: number
 说明: 返回 xml 字符串中经过 xml 表达式计算后的值，如果不匹配，则返回 0。 
  
 select xpath_double('10.511.2','sum(a/*)');
--21.7
 
 十四、汇总统计函数（UDAF） 
  
  个数统计函数: count
 语法: count(), count(expr), count(DISTINCT expr[, expr_.])
 返回值: int
 说明: count()统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCTexpr[, expr_.])返回指定字段的不同的非空值的个数 
  总和统计函数: sum
 语法: sum(col), sum(DISTINCT col)
 返回值: double
 说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果 
  平均值统计函数: avg 
  最小值统计函数: min 
  最大值统计函数: max 
  非空集合总体变量函数: var_pop
 求指定列数值的方差 
  
 select  var_pop(age) from student;
 
  
  非空集合样本变量函数: var_samp
 求指定列数值的样本方差 
  
 select  var_samp(age) from student;
 
  
  总体标准偏离函数: stddev_pop
 求指定列数值的标准偏差 
  
 select  STDDEV_POP(age) from student;
 
  
  样本标准偏离函数: stddev_samp 
  
 select  stddev_samp(age) from student;
 
 10．中位数函数: percentile 
 select  percentile(age) from student;
 
  
  中位数函数: percentile
 参考文章：hive 分位数函数 percentile(col, p)  
  
 select  percentile(age) from student;
 
  
  近似中位数函数: percentile_approx 
  
 select  percentile_approx(age,0.95) from student;
---取得排位在倒数第5%的年龄。（使用时会对年龄进行排序,一般可以用于求中位数）
 
  
  近似中位数函数: percentile_approx 
  
 select  percentile_approx(age,0.5) from student;
 
  
  直方图: histogram_numeric
 语法: histogram_numeric(col, b)
 返回值: array
 说明:以b为基准计算col的直方图信息。 
  
 举例：
select histogram_numeric(100,5)
 
  
  集合去重数：collect_set
 collect_set 
  
 举例1：
select age,concat_ws('-',collect_set(department)) id,collect_set(department) id2,concat_ws('-',collect_set(cast(id as string))) from student group by age;

 
  
   
     
    
   
   
  
 举例2：
//将age转化为字符串，cast(age as string)
select  concat_ws('-',collect_set(cast(age as string))),collect_set(cast(age as string)) from student;

 
  
   
     
    
   
   
  
  
  集合不去重函数：collect_list 
  
 举例：
select age,concat_ws('-',collect_list(department)) id,concat_ws('-',collect_list(cast(id as string))) from student group by age;
 
  
   
     
    
   
   
  
 
 
 十六、表格生成函数Table-Generating Functions (UDTF) 
  
  数组拆分成多行：explode 
  Map拆分成多行：explode 
  
 select  explode(scores)  from score;
 
  
   
     
    
   
   
  
 十五、常用函数 
 1、Coalesce
 非空查找函数: COALESCE
 语法: COALESCE(T v1, T v2,…)
 返回值: T
 说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL 
 
 
  
   
     
    
   
   
  
 
 
 2、Explode 
 select  explode(scores)  from score;
 
 4、lateral view
 lateral view用于和split, explode等UDTF一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
 参考文章：hive中的 lateral view
 参考文章：hive函数之~hive当中的lateral view 与 explode 
 数据pageAds.txt 
 front_page  1,2,3
contact_page    3,4,5
 
 建表 
 //一个简单的例子，假设我们有一张表pageAds，它有两列数据，第一列是pageid string，第二列是adid_list，即用逗号分隔的广告
create table pageAds(pageid string,adid_list array) row format delimited fields terminated by "\t" collection items terminated by ",";
 
 加载数据 
 load data local inpath '/home/study/pageAds.txt' into table pageAds;
 
 要统计所有广告ID在所有页面中出现的次数。
 首先分拆广告ID： 
 select  *  from  pageAds ;

SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
 
  
   
     
    
   
   
  
 接下来就是一个聚合的统计： 
 SELECT adid, count(1) 
    FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;
 
  
   
     
    
   
   
  
 3、grouping sets
 参考文章：hive中grouping sets的使用
 参考文章：Hive SQL grouping sets 用法 
 grouping sets是一种将多个group by 逻辑写在一个sql语句中的便利写法。
 GROUPING SETS： 根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL
 GROUPING__ID：表示结果属于哪一个分组集合，属于虚字段
 CUBE： 根据GROUP BY的维度的所有组合进行聚合。
 ROLLUP： 为CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合。

Hive常用函数总结

一、关系运算：

三、逻辑运算：

四、复杂的数据类型 array、map、struct

1、数组array的用法

2.map的用法

3.struct的用法

五、复杂类型访问操作

六、复杂类型长度统计函数

七、复合类型构造函数 map struct array

八、类型转换函数

九、日期函数

十、数值计算函数

十一、条件函数

十二、字符串函数

十三、混合函数

十四、汇总统计函数（UDAF）

十五、常用函数

你可能感兴趣的:(Hive常用函数总结)