HiveSql/SparkSQL常用函数

一、获取当前时间

  1. current_date获取当前日期
    2018-04-09
  2. current_timestamp/now()获取当前时间
    2018-04-09 15:20:49.247

二、从日期时间中提取字段

  1. year,month,day/dayofmonth,hour,minute,second
    Examples:
 > SELECT day('2009-07-30');
     30` 

*   1
*   2

  1. dayofweek (1 = Sunday, 2 = Monday, …, 7 = Saturday),dayofyear
    Examples:
 > SELECT dayofweek('2009-07-30');  
     5` 

*   1
*   2

  1. weekofyear
    weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
    Examples:
 > SELECT weekofyear('2008-02-20');  
      8` 

*   1
*   2

  1. trunc截取某部分的日期,其他部分默认为01
    第二个参数 [“year”, “yyyy”, “yy”, “mon”, “month”, “mm”]
    Examples:
 > SELECT trunc('2009-02-12', 'MM');
     2009-02-01
    > SELECT trunc('2015-10-27', 'YEAR');
     2015-01-01` 

*   1
*   2
*   3
*   4

  1. date_trunc [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
    Examples:
 > SELECT date_trunc('2015-03-05T09:32:05.359', 'HOUR');  
     2015-03-05T09:00:00` 

*   1
*   2

  1. date_format将时间转化为某种格式的字符串
    Examples:
 > SELECT date_format('2016-04-08', 'y');    
     2016` 

*   1
*   2

三、日期时间转换

  1. unix_timestamp返回当前时间的unix时间戳
    Examples:
 > SELECT unix_timestamp();  1476884637
    > SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');   1460041200` 

*   1
*   2

  1. from_unixtime将时间戳换算成当前时间,to_unix_timestamp将时间转化为时间戳
    Examples:
 > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');  1970-01-01 00:00:00
    > SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');  1460041200` 

*   1
*   2

  1. to_date/date将字符串转化为日期格式,to_timestamp(Since: 2.2.0)
 > SELECT to_date('2009-07-30 04:17:52');  2009-07-30
    > SELECT to_date('2016-12-31', 'yyyy-MM-dd');   2016-12-31
    > SELECT to_timestamp('2016-12-31 00:12:00');   2016-12-31 00:12:00` 

*   1
*   2
*   3

  1. quarter 将1年4等分(range 1 to 4)
    Examples:
 > SELECT quarter('2016-08-31'); 
     3` 

*   1
*   2

四、日期、时间计算

  1. months_between两个日期之间的月数
    months_between(timestamp1, timestamp2) - Returns number of months between timestamp1 and timestamp2.
    Examples:
 > SELECT months_between('1997-02-28 10:30:00', '1996-10-30');  
    3.94959677` 

*   1
*   2

  1. add_months返回日期后n个月后的日期
    Examples:
 > SELECT add_months('2016-08-31', 1);  
    2016-09-30` 

*   1
*   2
*   3

  1. last_day(date),next_day(start_date, day_of_week)
    Examples:
 > SELECT last_day('2009-01-12');  2009-01-31
    > SELECT next_day('2015-01-14', 'TU');  2015-01-20` 

*   1
*   2

  1. date_add,date_sub(减)
    date_add(start_date, num_days) - Returns the date that is num_days after start_date.
    Examples:
 > SELECT date_add('2016-07-30', 1);  2016-07-31` 

*   1

  1. datediff(两个日期间的天数)
    datediff(endDate, startDate) - Returns the number of days from startDate to endDate.
    Examples:
 > SELECT datediff('2009-07-31', '2009-07-30');
     1` 

*   1
*   2

  1. 关于UTC时间
  • to_utc_timestamp
    to_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.
    Examples:
 > SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');  
    2016-08-30 15:00:00` 

*   1
*   2

  • from_utc_timestamp
    from_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, ‘GMT+1’ would yield ‘2017-07-14 03:40:00.0’.
    Examples:
 > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');  
    2016-08-31 09:00:00

五、Hive常用函数

1.数学函数

round(DOUBLE d) : 返回DOUBLE型的d的BIGINT类型的近似值
round(DOUBLE d,INT) : 返回DOUBLE型的d的保留n位小数的DOUBLE类型的近似值
floor(DOUBLE d): d是DOUBLE类型的,返回<=d的最大的BIGINT值
ceil(DOUBLE d): d是DOUBLE类型的,返回>=d的最小的BIGINT值
rand() rand(INT seed): 每行返回一个DOUBLE型的随机数,整数seed是随机因子
exp(DOUBLE d):返回e的d幂次方
ln(DOUBLE d): 以自然数为底d的对数
log10(DOUBLE d): 以10为底的d的对数
log2(DOUBLE d): 以2为底的d的对数
log(DOUBLE base,DOUBLE d)以base为底的d的对数
pow(DOUBLE d,DOUBLE p) power(DOUBLE d,DOUBLE p): 计算d的p次幂
sqrt(DOUBLE d): d的平方根
bin(DOUBLE i): 计算二进制值i的STRING类型值
hex(BIGINT i): 计算十六进制值i的STRING类型值
unhex(STRING i): hex的逆方法
conv(STRING num,INT from_base,INT to_base): 将STRING类型的num 从from_base进制转化为to_base 进制
abs(DOUBLE d): 计算d的绝对值
pmod(INT i1 ,INT i2): i1对i2取模
sin(DOUBLE d): 返回d的正弦值
cos(DOUBLE d): 返回d的余弦值
asin(DOUBLE d): 返回d的反正弦值
acos(DOUBLE d): 返回d的反余弦值
tan(DOUBLE d): 返回d的正切值
atan(DOUBLE d): 返回d的反正切值
degrees(DOUBLE d): 将弧度值d转化为角度值
radians(DOUBLE d): 将角度值d转化为弧度值
positive(DOUBLE d): 返回+d
negative(DOUBLE d): 返回-d
sign(DOUBLE d): 如果d是正数,则返回+1.0,如果d是负数,则返回-1.0,否则为0

2.日期函数

to_date(string timestamp):返回时间字符串中的日期部分,如to_date('1970-01-01 00:00:00')='1970-01-01'
current_date:返回当前日期
year(date):返回日期date的年,类型为int如year('2019-01-01')=2019
month(date):返回日期date的月,类型为int,如month('2019-01-01')=1
day(date):  返回日期date的天,类型为int,如day('2019-01-01')=1
weekofyear(date1):返回日期date1位于该年第几周。如weekofyear('2019-03-06')=10
datediff(date1,date2):返回日期date1与date2相差的天数,如datediff('2019-03-06','2019-03-05')=1
date_add(date1,int1):返回日期date1加上int1的日期,如date_add('2019-03-06',1)='2019-03-07'
date_sub(date1,int1):返回日期date1减去int1的日期,如date_sub('2019-03-06',1)='2019-03-05'
months_between(date1,date2):返回date1与date2相差月份,如months_between('2019-03-06','2019-01-01')=2
add_months(date1,int1):返回date1加上int1个月的日期,int1可为负数。如add_months('2019-02-11',-1)='2019-01-11'
last_day(date1):返回date1所在月份最后一天。如last_day('2019-02-01')='2019-02-28'
next_day(date1,day1):返回日期date1的下个星期day1的日期。day1为星期X的英文前两字母如next_day('2019-03-06','MO') 返回'2019-03-11'
trunc(date1,string1):返回日期最开始年份或月份。string1可为年(YYYY/YY/YEAR)或月(MONTH/MON/MM)。如trunc('2019-03-06','MM')='2019-03-01',trunc('2019-03-06','YYYY')='2019-01-01'
unix_timestamp():返回当前时间的unix时间戳,可指定日期格式。如unix_timestamp('2019-03-06','yyyy-mm-dd')=1546704180
from_unixtime():返回unix时间戳的日期,可指定格式。如select from_unixtime(unix_timestamp('2019-03-06','yyyy-mm-dd'),'yyyymmdd')='20190306'

3.条件函数

if(boolean,t1,t2):若布尔值成立,则返回t1,反正返回t2。如if(1>2,100,200)返回200
case when boolean then t1 else t2 end:若布尔值成立,则t1,否则t2,可加多重判断
coalesce(v0,v1,v2):返回参数中的第一个非空值,若所有值均为null,则返回null。如coalesce(null,1,2)返回1
isnull(a):若a为null则返回true,否则返回false

4.字符串函数

length(string1):返回字符串长度
concat(string1,string2):返回拼接string1及string2后的字符串
concat_ws(sep,string1,string2):返回按指定分隔符拼接的字符串
lower(string1):返回小写字符串,同lcase(string1)。upper()/ucase():返回大写字符串
trim(string1):去字符串左右空格,ltrim(string1):去字符串左空格。rtrim(string1):去字符串右空格
repeat(string1,int1):返回重复string1字符串int1次后的字符串
reverse(string1):返回string1反转后的字符串。如reverse('abc')返回'cba'
rpad(string1,len1,pad1):以pad1字符右填充string1字符串,至len1长度。如rpad('abc',5,'1')返回'abc11'。lpad():左填充
split(string1,pat1):以pat1正则分隔字符串string1,返回数组。如split('a,b,c',',')返回["a","b","c"]
substr(string1,index1,int1):以index位置起截取int1个字符。如substr('abcde',1,2)返回'ab'

5.聚合函数

count():统计行数
sum(col1):统计指定列和
avg(col1):统计指定列平均值
min(col1):返回指定列最小值
max(col1):返回指定列最大值

6.表生成函数

explode (array):返回多行array中对应的元素。如explode(array('A','B','C'))返回 
explode(map):返回多行map键值对对应元素。如explode(map(1,'A',2,'B',3,'C'))返回

7.窗口函数

row_number()  over(partitiion by .. order by .. ):根据partition排序,相同值取不同序号,不存在序号跳跃
rank() over(partition by ..  order by .):根据partition排序,相同值取相同序号,存在序号跳跃
dense_rank() over(partition by .. order by ..):根据partition排序,相同值取相同序号,不存在序号跳跃
sum() over(partition by .. order by ..)
count() over(partition by .. order by ..)
lag(col,n) over(partition by .. order by ..) :查看当前行的上第n行
lead(col,n) over(partition by .. order by ..):查看当前行的下第n行
first_value() over(partition by .. order by ..):满足partition及排序的第一个值
last_value() over(partition by .. order by ..):满足partition及排序的最后值
ntile(n) over(partition by .. order by ..):满足partition及排序的数据分成n份

partition内更细的划分,可使用windows子句。常见子句为:
preceding:往前
following:往后
current row:当前行
unbounded:起点,unbounded preceding 表示从前面的起点, unbounded following:表示到后面的终点
使用如:、
sum(col) over(partition by .. order by .. rows between 1 preceding and current row):当前行与前一行做聚合

8.行列转换

concat_ws(sep, collect_set(col1)) :同组不同行合并成一列,以sep分隔符分隔。collect_set在无重复的情况下也可以collect_list()代替。collect_set()去重,collect_list()不去重
lateral  view explode(split(col1,',')) :同组同列的数据拆分成多行,以sep分隔符区分

你可能感兴趣的:(spark)