语法:substr(string str,int start,int end)
说明:str指定字符串(字段),start指定截取开始的位置,end指定截取几个字符
语法: split(string str, string pat)
返回值: array
说明: 按照pat字符串分割str,会返回分割后的字符串数组
举例:
1.基本用法
hive> select split('abcdef', 'c') from test;
["ab", "def"]
2.截取字符串中的某个值
//获取年月日
hive (default)> select split('2020-04-27 20:35:38',' ')[0];
OK
_c0
2020-04-27
Time taken: 0.827 seconds, Fetched: 1 row(s)
连续切分
//获取月份
hive (default)> select split(split('2020-04-27 20:35:38',' ')[0],'-')[1];
OK
_c0
04
Time taken: 0.069 seconds, Fetched: 1 row(s)
3.特殊字符
如正则表达式中的特殊符号作为分隔符时,需做转义 (前缀加上\)
hive> select split('ab_cd_ef', '\_')[0] from test;
ab
hive> select split('ab?cd_ef', '\\?')[0] from test;
ab
1
2
3
4
如果是在shell中运行,则(前缀加上\\)
hive -e "select split('ab?cd_ef', '\\\\?')[0] from test"
1
注:有些特殊字符转义只需\,而有些需\\,eg.?。可能在语句翻译过程中经历经历几次转义。
dayofweek(#date#)
hive原生未提供获取一个日期是星期几的方法,所以只有我们自己编写udf函数提供,udf就不说了,在这里给出了一个使用hive原生函数获取星期几的技巧。
pmod(datediff(#date#, '1920-01-01') - 3, 7)
#date#表示给的日期。
输出的结果为0-6的数,分别表示 日,一,二 ... 六。
hive (itcast)> select pmod(datediff('2020-04-27 21:08:12', '1920-01-01') - 3, 7);
OK
_c0
1
Time taken: 0.052 seconds, Fetched: 1 row(s)
hive (itcast)>
如果想让周一到周六对应数字1-7只需要将查询出来的数据进行判断就行了,如下:
IF(pmod(datediff(#date#, '1920-01-01') - 3, 7)='0', 7,pmod(datediff(#date#, '1920-01-01') - 3, 7))
hive (itcast)> select IF(pmod(datediff('2020-04-26 21:08:12', '1920-01-01') - 3, 7)='0', 7,pmod(datediff('2020-04-27 21:08:12', '1920-01-01') - 3, 7));
OK
_c0
7
Time taken: 0.05 seconds, Fetched: 1 row(s)
hive (itcast)>
hive (itcast)> select year('2020-04-26 21:08:12');
OK
_c0
2020
Time taken: 0.064 seconds, Fetched: 1 row(s)
hive (itcast)> select month('2020-04-26 21:08:12');
OK
_c0
4
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive (itcast)> select day('2020-04-26 21:08:12');
OK
_c0
26
Time taken: 0.047 seconds, Fetched: 1 row(s)
hive (itcast)> select hour('2020-04-26 21:08:12');
OK
_c0
21
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive (itcast)> select minute('2020-04-26 21:08:12');
OK
_c0
8
Time taken: 0.054 seconds, Fetched: 1 row(s)
hive (itcast)> select second('2020-04-26 21:08:12');
OK
_c0
12
Time taken: 0.049 seconds, Fetched: 1 row(s)
hive (itcast)>
> select weekofyear('2020-01-01 21:08:12');
OK
_c0
1
Time taken: 0.046 seconds, Fetched: 1 row(s)
hive (itcast)> select weekofyear('2020-01-02 21:08:12');
OK
_c0
1
Time taken: 0.039 seconds, Fetched: 1 row(s)
hive (itcast)> select weekofyear('2020-01-03 21:08:12');
OK
_c0
1
Time taken: 0.03 seconds, Fetched: 1 row(s)
hive (itcast)> select weekofyear('2020-01-04 21:08:12');
OK
_c0
1
Time taken: 0.043 seconds, Fetched: 1 row(s)
hive (itcast)> select weekofyear('2020-01-05 21:08:12');
OK
_c0
1
Time taken: 0.034 seconds, Fetched: 1 row(s)
hive (itcast)> select weekofyear('2020-01-06 21:08:12');
OK
_c0
2
Time taken: 0.044 seconds, Fetched: 1 row(s)
hive (itcast)> select weekofyear('2020-01-07 21:08:12');
OK
_c0
2
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive (itcast)>
年中第几周:weekofyear
月中第几天:dayofmonth
周中第几天dayofweek
语法:
select case
when month(#data#)=1 then 'quarter1'
when month(#data#)=2 then 'quarter1'
when month(#data#)=3 then 'quarter1'
when month(#data#)=4 then 'quarter2'
when month(#data#)=5 then 'quarter2'
when month(#data#)=6 then 'quarter2'
when month(#data#)=7 then 'quarter3'
when month(#data#)=8 then 'quarter3'
when month(#data#)=9 then 'quarter3'
else 'quarter4' end;
例如:2020-04-28是哪个季度
select case
when month('2020-04-28 12:24:50')=1 then 'quarter1'
when month('2020-04-28 12:24:50')=2 then 'quarter1'
when month('2020-04-28 12:24:50')=3 then 'quarter1'
when month('2020-04-28 12:24:50')=4 then 'quarter2'
when month('2020-04-28 12:24:50')=5 then 'quarter2'
when month('2020-04-28 12:24:50')=6 then 'quarter2'
when month('2020-04-28 12:24:50')=7 then 'quarter3'
when month('2020-04-28 12:24:50')=8 then 'quarter3'
when month('2020-04-28 12:24:50')=9 then 'quarter3'
else 'quarter4' end;
语法:
select month(#data#)/3.1+1;
例如:2020-04-28是哪个季度
select month('2020-04-28')/3.1+1;
//时间戳转成日期
select distinct from_unixtime(1441565203,'yyyy-MM-dd HH:mm:ss');
//日期转成时间戳
select distinct unix_timestamp('2020-04-28 12:10:10'); // 默认格式为“yyyy-MM-dd HH:mm:ss“
select distinct unix_timestamp('2020-04-28 12:10:10','yyyy-MM-dd HH:mm:ss');
20200428转成2020-04-28
select from_unixtime(unix_timestamp('20200428','yyyymmdd'),'yyyy-mm-dd');
2020-04-28转成20200428
select from_unixtime(unix_timestamp('2020-04-28','yyyy-mm-dd'),'yyyymmdd');
20200428转成2020-04-28
select concat(substr('20200428',1,4),'-',substr('20200428',5,2),'-',substr('20200428',7,2));
2020-04-28转成20200428
select concat(substr('2020-04-28',1,4),substr('2020-04-28',6,2),substr('2020-04-28',9,2));
注:
更多的函数请看这里:hive函数大全