Spark SQL内置函数

Spark版本号:2.4.3,Spark SQL命令行输入

show functions;

示例

  • abs:取绝对值
> select abs(-10);

+------+
| _c0  |
+------+
| 10   |
+------+
  • acos:反余弦函数
> select acos(0.5);

+---------------------+
|         _c0         |
+---------------------+
| 1.0471975511965979  |
+---------------------+
  • add_months(start_date, num_months),增加月份
> select add_months('2016-02-29', 1);

+-------------+
|     _c0     |
+-------------+
| 2016-03-31  |
+-------------+
  • aggregate

  • and

  • approx_count_distinct(expr[, relativeSD])

  • approx_percentile(col, percentage [, accuracy])

  • array(expr, …) - 返回给定值组成的数组。

> select array(1, 2 , 3, 4, 5, 6);

+----------------+
|      _c0       |
+----------------+
| [1,2,3,4,5,6]  |
+----------------+
  • array_contains(array, value) - 如果数组包含了 value,则返回 true。
> select array_contains(array(1, 2, 3, 4, 5, 6), 1);

+------------------------------------------+
|array_contains(array(1, 2, 3, 4, 5, 6), 1)|
+------------------------------------------+
|                                      true|
+------------------------------------------+
  • array_distinct(array) - 数组去重
> select array_distinct(array(1, 1, 3, 3));

+---------------------------------+
|array_distinct(array(1, 1, 3, 3))|
+---------------------------------+
|                           [1, 3]|
+---------------------------------+
  • array_except(array1, array2) - 数组移除
> select array_except(array(1, 2, 3) , array(1));

+--------------------------------------+
|array_except(array(1, 2, 3), array(1))|
+--------------------------------------+
|                                [2, 3]|
+--------------------------------------+
  • array_intersect(array1, array2) - 数组求交
> select array_intersect(array(1, 2, 3) , array(1, 2));

+--------------------------------------------+
|array_intersect(array(1, 2, 3), array(1, 2))|
+--------------------------------------------+
|                                      [1, 2]|
+--------------------------------------------+
  • array_join(array, String[, String]): String:使用分隔符将数组的元素进行拼接。另外,我们还可以指定可选的字符来替换 null 值。如果我们没有指定 null 值的替换字符串,那么结果中将会把 null 值替换掉。
> select array_join(array('hello', null, 'world'), ',', 'null');

+--------------------------------------------------------------+
|array_join(array(hello, CAST(NULL AS STRING), world), ,, null)|
+--------------------------------------------------------------+
|                                              hello,null,world|
+--------------------------------------------------------------+

第一个参数为:stirng的数组
第二个参数为:拼接字符串
第三个参数为:null替换为字符串
  • array_max(array): T:返回数组中的最大值, null 元素将会被忽略。
> select array_max(array(1, 2, 10, null, 3));

+------------------------------------------------+
|array_max(array(1, 2, 10, CAST(NULL AS INT), 3))|
+------------------------------------------------+
|                                              10|
+------------------------------------------------+
  • array_min(array): T:返回数组中的最小值, null 元素将会被忽略。
> select array_min(array(1, 2, 10, null, 3));

+------------------------------------------------+
|array_min(array(1, 2, 10, CAST(NULL AS INT), 3))|
+------------------------------------------------+
|                                               1|
+------------------------------------------------+
  • array_position(array, T): Long:返回元素 T 在数组第一次出现的位置,从1开始算。
> select array_position(array(1, 3, 4, 1), 1);

+------------------------------------+
|array_position(array(1, 3, 4, 1), 1)|
+------------------------------------+
|                                   1|
+------------------------------------+
  • array_remove(array, T): array:从给定数组中删除所有与给定元素相等的元素。
> select array_remove(array(1, 3, 4, 1, 1), 1);

+-------------------------------------+
|array_remove(array(1, 3, 4, 1, 1), 1)|
+-------------------------------------+
|                               [3, 4]|
+-------------------------------------+
  • array_repeat(T, Int): array:返回包含元素计数次数的数组。
> select array_repeat('1', 2);

+------------------+
|array_repeat(1, 2)|
+------------------+
|            [1, 1]|
+------------------+
  • array_sort(array): array:按升序对输入数组进行排序。输入数组的元素必须是可排序的。null 元素将放置在返回数组的末尾。
> select array_sort(array(3, 4, 1, null, 0));

+------------------------------------------------+
|array_sort(array(3, 4, 1, CAST(NULL AS INT), 0))|
+------------------------------------------------+
|                                   [0, 1, 3, 4,]|
+------------------------------------------------+
  • array_union(array, array): array:返回给定两个数组元素的并集,结果已经去重了。使用如下
> select array_union(array(1, 2, 3), array(1, 3, 5));

+-------------------------------------------+
|array_union(array(1, 2, 3), array(1, 3, 5))|
+-------------------------------------------+
|                               [1, 2, 3, 5]|
+-------------------------------------------+
  • arrays_overlap(array, array): array:如果数组1最少包含数组2中一个非空的元素,则返回 true;如果两个数组没有公共元素,而且两个数组均为空则返回 false,如果其中任何一个数组包含 null 元素则返回 null。
> select arrays_overlap(array(1, 2, 3), array(3, 4, 5));

+----------------------------------------------+
|arrays_overlap(array(1, 2, 3), array(3, 4, 5))|
+----------------------------------------------+
|                                          true|
+----------------------------------------------+
  • array_zip(array …):数组组合
> select arrays_zip(array(1, 2), array(2, 3));

+------------------------------------+
|arrays_zip(array(1, 2), array(2, 3))|
+------------------------------------+
|                    [[1, 2], [2, 3]]|
+------------------------------------+
  • ascii(string):返回str中第一个字符的ascii值
> select ascii('China 100');

+----------------+
|ascii(China 100)|
+----------------+
|              67|
+----------------+
  • asin(string):反正弦
> select ascii('0.5');

+----------+
|ascii(0.5)|
+----------+
|        48|
+----------+
  • assert_true(expr):如果 expr 表达式的返回值不是 true 则抛出异常。
> select assert_true(0 < 1);

+--------------------+
|assert_true((0 < 1))|
+--------------------+
|                null|
+--------------------+
  • atan(double):反正切函数
> select atan(0.5);

spark.sql("select atan(0.5)").show();
+-------------------------+
|ATAN(CAST(0.5 AS DOUBLE))|
+-------------------------+
|       0.4636476090008061|
+-------------------------+
  • atan2(double, double): 返回平面的正 x 轴与由坐标(expr1,expr2)点之间的弧度角度。
> select atan2(1, 1);

+-------------------------------------------+
|ATAN2(CAST(1 AS DOUBLE), CAST(1 AS DOUBLE))|
+-------------------------------------------+
|                         0.7853981633974483|
+-------------------------------------------+
  • avg:求平均
  • base64(bin):将参数从二进制文件转换为 base64 的字符串
> select base64('hello world');

+-----------------------------------+
|base64(CAST(hello world AS BINARY))|
+-----------------------------------+
|                   aGVsbG8gd29ybGQ=|
+-----------------------------------+
  • bigint(expr):将值 expr 转换为 bigint 数据类型
> select bigint('121');

+-------------------+
|CAST(121 AS BIGINT)|
+-------------------+
|                121|
+-------------------+
  • bin(expr):返回 long 类型的参数 expr 的二进制字符串表示形式
> select bin(100);

+------------------------+
|bin(CAST(100 AS BIGINT))|
+------------------------+
|                 1100100|
+------------------------+
  • binary(expr): 将值 expr 转换为 binary 数据类型。
> select binary(bin(100));

+----------------------------------------+
|CAST(bin(CAST(100 AS BIGINT)) AS BINARY)|
+----------------------------------------+
|                    [31 31 30 30 31 3...|
+----------------------------------------+
  • bit_length(expr):返回字符串数据的位长度或二进制数据的位数。
> select bit_length(binary(bin(100)));

+----------------------------------------------------+
|bit_length(CAST(bin(CAST(100 AS BIGINT)) AS BINARY))|
+----------------------------------------------------+
|                                                  56|
+----------------------------------------------------+


> select bit_length('hello world');

+-----------------------+
|bit_length(hello world)|
+-----------------------+
|                     88|
+-----------------------+
  • boolean(expr):将值 expr 转换为 boolean 数据类型
> select boolean(0);

+------------------+
|CAST(0 AS BOOLEAN)|
+------------------+
|             false|
+------------------+
  • bround(expr, len):四舍五入,保留len长度的小数位
> select bround(3.1415926, 2);

+--------------------+
|bround(3.1415926, 2)|
+--------------------+
|                3.14|
+--------------------+
  • cardinality
  • cast(expr AS type):将 expr 转换成 type 类型的数据
> select cast('10' as integer);

+---------------+
|CAST(10 AS INT)|
+---------------+
|             10|
+---------------+
  • cbrt(expr):返回expr的立方根
> select cbrt(8);

+-----------------------+
|CBRT(CAST(8 AS DOUBLE))|
+-----------------------+
|                    2.0|
+-----------------------+
  • ceil(expr):向上取整
> select ceil(-3.1415926);

+----------------+
|CEIL(-3.1415926)|
+----------------+
|              -3|
+----------------+
  • ceiling(expr):向上取整
> select ceiling(-3.1415926);

+----------------+
|CEIL(-3.1415926)|
+----------------+
|              -3|
+----------------+
  • char(expr):返回二进制等效于 expr 的 ASCII 字符。 如果 n 大于256,则结果等于 chr(n%256)
> select char(100);

+------------------------+
|chr(CAST(100 AS BIGINT))|
+------------------------+
|                       d|
+------------------------+
  • char_length(expr):返回字符串数据的字符长度或二进制数据的字节数。 字符串数据的长度包括尾随空格,二进制数据的长度包括二进制零。
> select char_length('hello world');

+-------------------+
|length(hello world)|
+-------------------+
|                 11|
+-------------------+
  • character_length(expr):返回字符串数据的字符长度或二进制数据的字节数。 字符串数据的长度包括尾随空格,二进制数据的长度包括二进制零。
> select character_length('hello world');

+-------------------+
|length(hello world)|
+-------------------+
|                 11|
+-------------------+
  • chr(expr) : 返回二进制等效于 expr 的 ASCII 字符。 如果 n 大于256,则结果等于 chr(n%256)
> select chr(100);

+------------------------+
|chr(CAST(100 AS BIGINT))|
+------------------------+
|                       d|
+------------------------+
  • coalesce(expr1, expr2, …): 返回第一个非空参数(如果存在)。 否则,返回 null。
> select coalesce(null, 1, null, 2);

+----------------------------------------------------+
|coalesce(CAST(NULL AS INT), 1, CAST(NULL AS INT), 2)|
+----------------------------------------------------+
|                                                   1|
+----------------------------------------------------+
  • collect_list(expr):返回list,不去重
> select collect_list(carstate) from gps_car;

+----------------------+
|collect_list(carstate)|
+----------------------+
|  [2.0, 2.0, 2.0, 2...|
+----------------------+
  • collect_set(expr):返回set,去重
> select collect_set(carstate) from gps_car;

+---------------------+
|collect_set(carstate)|
+---------------------+
|      [2.0, 0.0, 1.0]|
+---------------------+
  • concat(str1, str2,…,strN):返回由 str1, str2, …, strN 组成的字符串
> select concat('hello', 'world');

+--------------------+
|concat(hello, world)|
+--------------------+
|          helloworld|
+--------------------+
  • concat_ws(sep, [str | array(str)]+) : 返回由 sep 分隔组成的字符串连接。
> select concat_ws(' ', 'hello', 'world');

+--------------------------+
|concat_ws( , hello, world)|
+--------------------------+
|               hello world|
+--------------------------+


> select concat_ws(' ', array('hello', 'world'));

+---------------------------------+
|concat_ws( , array(hello, world))|
+---------------------------------+
|                      hello world|
+---------------------------------+
  • conv(num, from_base, to_base) : 将 num 从 from_base 进制转换为 to_base 进制。
> select conv(100, 2, 10);

+--------------------------------+
|conv(CAST(100 AS STRING), 2, 10)|
+--------------------------------+
|                               4|
+--------------------------------+
  • corr(expr1, expr2)
  • cos(expr):余弦函数
> select cos(0.5);

+------------------------+
|COS(CAST(0.5 AS DOUBLE))|
+------------------------+
|      0.8775825618903728|
+------------------------+
  • cosh(expr): 双曲余弦
> select cosh(0);

+-----------------------+
|COSH(CAST(0 AS DOUBLE))|
+-----------------------+
|                    1.0|
+-----------------------+
  • cot(expr): 余切
> select cot(1);

+----------------------+
|COT(CAST(1 AS DOUBLE))|
+----------------------+
|    0.6420926159343306|
+----------------------+
  • count
  • count_min_sketch
  • covar_pop
  • covar_samp
  • crc32
  • cube
  • cume_dist
  • current_database:查询当前数据库
> select current_database();

+------------------+
|current_database()|
+------------------+
|           default|
+------------------+
  • current_date:查询当前时间
> select current_date();

+--------------+
|current_date()|
+--------------+
|    2020-03-11|
+--------------+
  • current_timestamp():当前时间戳
> select current_timestamp();

+--------------------+
| current_timestamp()|
+--------------------+
|2020-03-11 04:26:...|
+--------------------+
  • date(expr):
> select date('2010-10-10');

+------------------------+
|CAST(2010-10-10 AS DATE)|
+------------------------+
|              2010-10-10|
+------------------------+
  • date_add(start_date, num_days):增加时间
> select date_add('2010-10-10', 1);

+-------------------------------------+
|date_add(CAST(2010-10-10 AS DATE), 1)|
+-------------------------------------+
|                           2010-10-11|
+-------------------------------------+
  • date_format(timestamp, fmt):时间格式化
> select date_format('2010-10-10', 'yyyy');

+------------------------------------------------+
|date_format(CAST(2010-10-10 AS TIMESTAMP), yyyy)|
+------------------------------------------------+
|                                            2010|
+------------------------------------------------+
  • date_sub(start_date, num_days):时间减少天数
> select date_sub('2020-10-10', 1);

+-------------------------------------+
|date_sub(CAST(2020-10-10 AS DATE), 1)|
+-------------------------------------+
|                           2020-10-09|
+-------------------------------------+
  • date_trunc(fmt, ts):时间精确,fmt:[“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
> select date_trunc('yyyy', '2011-11-11 11:11:11.111');

+------------------------------------------------------------+
|date_trunc(yyyy, CAST(2011-11-11 11:11:11.111 AS TIMESTAMP))|
+------------------------------------------------------------+
|                                         2011-01-01 00:00:00|
+------------------------------------------------------------+


> select date_trunc('month', '2011-11-11 11:11:11.111');

+-------------------------------------------------------------+
|date_trunc(month, CAST(2011-11-11 11:11:11.111 AS TIMESTAMP))|
+-------------------------------------------------------------+
|                                          2011-11-01 00:00:00|
+-------------------------------------------------------------+


> select date_trunc('day', '2011-11-11 11:11:11.111');

+-----------------------------------------------------------+
|date_trunc(day, CAST(2011-11-11 11:11:11.111 AS TIMESTAMP))|
+-----------------------------------------------------------+
|                                        2011-11-11 00:00:00|
+-----------------------------------------------------------+
  • datediff(start_date, end_date):返回天数差
> select datediff('2011-11-11', '2011-11-10');

+------------------------------------------------------------+
|datediff(CAST(2011-11-11 AS DATE), CAST(2011-11-10 AS DATE))|
+------------------------------------------------------------+
|                                                           1|
+------------------------------------------------------------+
  • day(date):返回天数
> select day('2020-2-11');

+-----------------------------------+
|dayofmonth(CAST(2020-2-11 AS DATE))|
+-----------------------------------+
|                                 11|
+-----------------------------------+
  • dayofmonth(date):返回天数
> select dayofmonth('2020-2-11');

+-----------------------------------+
|dayofmonth(CAST(2020-2-11 AS DATE))|
+-----------------------------------+
|                                 11|
+-----------------------------------+
  • dayofweek(date):返回天数
> select dayofmonth('2020-2-11');

+-----------------------------------+
|dayofmonth(CAST(2020-2-11 AS DATE))|
+-----------------------------------+
|                                 11|
+-----------------------------------+
  • dayofyear(date):返回天数
> select dayofyear('2020-2-11');

+----------------------------------+
|dayofyear(CAST(2020-2-11 AS DATE))|
+----------------------------------+
|                                42|
+----------------------------------+
  • decimal(expr):转换为decimal
> select decimal(1.0);

+--------------------------+
|CAST(1.0 AS DECIMAL(10,0))|
+--------------------------+
|                         1|
+--------------------------+
  • decode(bin, charset):解码
> select decode('abcd', 'utf-8');

+-----------------------------------+
|decode(CAST(abcd AS BINARY), utf-8)|
+-----------------------------------+
|                               abcd|
+-----------------------------------+
  • degrees
  • dense_rank
  • double(expr):转成double
> select double(1);

+-----------------+
|CAST(1 AS DOUBLE)|
+-----------------+
|              1.0|
+-----------------+
  • e():返回e的值
> select e();

+-----------------+
|              E()|
+-----------------+
|2.718281828459045|
+-----------------+
  • element_att(n, input1, input2, …):返回第几个
  • elt
> select elt(2, 'java', 'php', 'R');

+--------------------+
|elt(2, java, php, R)|
+--------------------+
|                 php|
+--------------------+
  • encode(str, charset):编码
> select encode('abcd', 'utf-8');

+-------------------+
|encode(abcd, utf-8)|
+-------------------+
|      [61 62 63 64]|
+-------------------+
  • exists
  • exp
  • explode(expr):拆分数组等
> select explode(array(1, 3, 5));

+---+
|col|
+---+
|  1|
|  3|
|  5|
+---+
  • explode_outer(expr):拆分数组等
> select explode_outer(array(1, 3, 5));

+---+
|col|
+---+
|  1|
|  3|
|  5|
+---+
  • expm1
  • factorial(expr):阶乘
> select factorial(5);

+------------+
|factorial(5)|
+------------+
|         120|
+------------+
  • filter
  • find_in_set(str, str_array):查询索引
> select find_in_set('ab','abc,b,ab,c,def');

+-------------------------------+
|find_in_set(ab, abc,b,ab,c,def)|
+-------------------------------+
|                              3|
+-------------------------------+
  • first
  • first_value
  • flatten
  • float(expr):转成float
> select float(1);

+----------------+
|CAST(1 AS FLOAT)|
+----------------+
|             1.0|
+----------------+
  • floor(expr):向下取整
> select floor(-0.1);

+-----------+
|FLOOR(-0.1)|
+-----------+
|         -1|
+-----------+
  • format_number(expr1, expr2):数字格式化
> select format_number(3141.5926, 3);

+---------------------------+
|format_number(3141.5926, 3)|
+---------------------------+
|                  3,141.593|
+---------------------------+
  • format_string(strfmt, obj, …):字符串格式化
> select format_string('Hello World %d %s', 100, 'days');

+-------------------------------------------+
|format_string(Hello World %d %s, 100, days)|
+-------------------------------------------+
|                       Hello World 100 days|
+-------------------------------------------+
  • from_json(jsonStr, schema[, options]):
> select from_json("{'id': 1, 'name': 'zhangsan'}", "id int, name string");

+--------------------------------------------+
|jsontostructs({'id': 1, 'name': 'zhangsan'})|
+--------------------------------------------+
|                               [1, zhangsan]|
+--------------------------------------------+
  • from_unixtime(unix_time, format):unix_time为java.util.Date().getTime() / 1000
> select from_unixtime(1583888136, 'yyyy-MM-dd HH:mm:ss');

+--------------------------------------------------------------+
|from_unixtime(CAST(1583888136 AS BIGINT), yyyy-MM-dd HH:mm:ss)|
+--------------------------------------------------------------+
|                                           2020-03-11 08:55:36|
+--------------------------------------------------------------+

  • from_utc_timestamp(timestamp, timezone)
> select from_utc_timestamp('2020-01-01', 'Asia/Seoul');

+-------------------------------------------------------------+
|from_utc_timestamp(CAST(2020-01-01 AS TIMESTAMP), Asia/Seoul)|
+-------------------------------------------------------------+
|                                          2020-01-01 09:00:00|
+-------------------------------------------------------------+
  • get_json_object(json_txt, path):取最大值
  • greatest(expr, …):
> select greatest(1, 3, 4, 10);

+---------------------+
|greatest(1, 3, 4, 10)|
+---------------------+
|                   10|
+---------------------+
  • grouping
  • grouping_id
  • hash(expr1, …):
> select hash('hello', 'world');

+------------------+
|hash(hello, world)|
+------------------+
|        -165041762|
+------------------+
  • hex(expr):转成hexadecimal
> select hex('hello world');

+--------------------+
|    hex(hello world)|
+--------------------+
|68656C6C6F20776F7...|
+--------------------+
  • hour(timestamp):返回小时
> select hour('2020-11-11 11:11:11');

+--------------------------------------------+
|hour(CAST(2020-11-11 11:11:11 AS TIMESTAMP))|
+--------------------------------------------+
|                                          11|
+--------------------------------------------+
  • hypot(expr1, expr2):返回值为
sqrt(expr^2 + expr2^2)
> select hypot(3, 4);

+-------------------------------------------+
|HYPOT(CAST(3 AS DOUBLE), CAST(4 AS DOUBLE))|
+-------------------------------------------+
|                                        5.0|
+-------------------------------------------+
  • if(expr1, expr2, expr3):expr1=true返回expr2,否则返回expr3
> select if(1 > 2, '1', '2');

+-------------------+
|(IF((1 > 2), 1, 2))|
+-------------------+
|                  2|
+-------------------+
  • ifnull(expr1, expr2):如果expr1为返回expr2
> select ifnull(null, 'expr2');

+---------------------+
|ifnull(NULL, 'expr2')|
+---------------------+
|                expr2|
+---------------------+
  • in:expr1 in(expr2, expr3, …)
> select 1 in(1, 3, 4);

+----------------+
|(1 IN (1, 3, 4))|
+----------------+
|            true|
+----------------+
  • initcap(str):每个单词首字母大写
> select initcap('hello world');

+--------------------+
|initcap(hello world)|
+--------------------+
|         Hello World|
+--------------------+
  • inline
> select inline(array(struct(1, 'a'), struct(2, 'b')));

+----+----+
|col1|col2|
+----+----+
|   1|   a|
|   2|   b|
+----+----+
  • inline_outer
> select inline_outer(array(struct(1, 'a'), struct(2, 'b')));

+----+----+
|col1|col2|
+----+----+
|   1|   a|
|   2|   b|
+----+----+
  • input_file_block_length()
  • input_file_block_start
  • input_file_name
  • instr(str, substr):查询索引
> select instr('hello world', 'world');

+-------------------------+
|instr(hello world, world)|
+-------------------------+
|                        7|
+-------------------------+
  • int(expr):转成int
> select int(3.14);

+-----------------+
|CAST(3.14 AS INT)|
+-----------------+
|                3|
+-----------------+
  • isnan
  • isnotnull(expr):非空返回true
> select isnotnull(1);

+---------------+
|(1 IS NOT NULL)|
+---------------+
|           true|
+---------------+
  • isnull:空返回true
> select isnull(null);

+--------------+
|(NULL IS NULL)|
+--------------+
|          true|
+--------------+
  • java_method(class, method[, arg1[, arg2 …]]):调用java方法
> select java_method('java.util.UUID', 'randomUUID');

+-----------------------------------+
|reflect(java.util.UUID, randomUUID)|
+-----------------------------------+
|               3aaebaa8-6335-440...|
+-----------------------------------+
  • json_tuple
  • kurtosis
  • lag
  • last
> select last(array(1, 3, null, 5));

+----------------------------------------------+
|last(array(1, 3, CAST(NULL AS INT), 5), false)|
+----------------------------------------------+
|                                    [1, 3,, 5]|
+----------------------------------------------+
  • last_day(date):返回最后一天
> select last_day('2020-02-01');

+----------------------------------+
|last_day(CAST(2020-02-01 AS DATE))|
+----------------------------------+
|                        2020-02-29|
+----------------------------------+
  • last_value
  • lcase(str):转换为小写
> select lcase('Hello World');

+------------------+
|lower(Hello World)|
+------------------+
|       hello world|
+------------------+
  • lead
  • least(expr, …):返回最小值
> select least(null, 10, 3, 4);

+----------------------------------+
|least(CAST(NULL AS INT), 10, 3, 4)|
+----------------------------------+
|                                 3|
+----------------------------------+
  • left(str, len):返回左边几个字符
> select left('hello world', 3);

+----------------------+
|left('hello world', 3)|
+----------------------+
|                   hel|
+----------------------+
  • length(expr):返回长度
> select length('hello world');

+-------------------+
|length(hello world)|
+-------------------+
|                 11|
+-------------------+
  • levenshtein
  • like
  • ln
  • locate(substr, str[, pos]):
  • log
  • log10
  • log1p
  • log2
  • lower(str):转成小写
> select lower('Hello World');

+------------------+
|lower(Hello World)|
+------------------+
|       hello world|
+------------------+
  • lpad
  • ltrim(trimStr, str):去除前后trimStr,不传责去除前空格
> select ltrim('ha', 'haHello worldha');

+--------------------------+
|ltrim(haHello worldha, ha)|
+--------------------------+
|             Hello worldha|
+--------------------------+
  • map(key0, value0, key1, value1, …):创建map
> select map(1.0, '2', 3.0, '4');

+--------------------+
| map(1.0, 2, 3.0, 4)|
+--------------------+
|[1.0 -> 2, 3.0 -> 4]|
+--------------------+
  • map_concat(map, …):返回多个 map 的并集
> select map_concat(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd'));

+--------------------------------------------+
|map_concat(map(1, a, 2, b), map(2, c, 3, d))|
+--------------------------------------------+
|                        [1 -> a, 2 -> b, ...|
+--------------------------------------------+
  • map_from_arrays(array, array)
> select map_from_arrays(array(1.0, 3.0), array('2', '4'));

+---------------------------------------------+
|map_from_arrays(array(1.0, 3.0), array(2, 4))|
+---------------------------------------------+
|                         [1.0 -> 2, 3.0 -> 4]|
+---------------------------------------------+
  • map_from_entries(array>)
> select map_from_entries(array(struct(1, 'a'), struct(2, 'b')));

+---------------------------------------------------------------------------------------+
|map_from_entries(array(named_struct(col1, 1, col2, a), named_struct(col1, 2, col2, b)))|
+---------------------------------------------------------------------------------------+
|                                                                       [1 -> a, 2 -> b]|
+---------------------------------------------------------------------------------------+
  • map_keys(map)
> select map_keys(map(1, 'a', 2, 'b'));

+-------------------------+
|map_keys(map(1, a, 2, b))|
+-------------------------+
|                   [1, 2]|
+-------------------------+
  • map_values(map)
> select map_values(map(1, 'a', 2, 'b'));

+---------------------------+
|map_values(map(1, a, 2, b))|
+---------------------------+
|                     [a, b]|
+---------------------------+
  • max
  • md5(expr):md5加密
> select md5('hello world');

+--------------------------------+
|md5(CAST(hello world AS BINARY))|
+--------------------------------+
|            5eb63bbbe01eeed09...|
+--------------------------------+
  • mean
  • min
  • minute(timestamp):返回分钟数
> select minute('2011-11-11 11:11:11');

+----------------------------------------------+
|minute(CAST(2011-11-11 11:11:11 AS TIMESTAMP))|
+----------------------------------------------+
|                                            11|
+----------------------------------------------+
  • mod(expr1, expr2):%
> select mod(11, 2);

+--------+
|(11 % 2)|
+--------+
|       1|
+--------+
  • monotonically_increasing_id
  • month(date):返回月份
> select month('2011-11-11');

+-------------------------------+
|month(CAST(2011-11-11 AS DATE))|
+-------------------------------+
|                             11|
+-------------------------------+
  • months_between(timestamp1, timestamp2):月份差
> select months_between('2000-10-1', '2000-11-2');

+--------------------------------------------------------------------------------+
|months_between(CAST(2000-10-1 AS TIMESTAMP), CAST(2000-11-2 AS TIMESTAMP), true)|
+--------------------------------------------------------------------------------+
|                                                                     -1.03225806|
+--------------------------------------------------------------------------------+
  • named_struct(name1, val1, name2, val2, …)
> select named_struct('a', 1, 'b', 2, 'c', 3);

+------------------------------+
|named_struct(a, 1, b, 2, c, 3)|
+------------------------------+
|                     [1, 2, 3]|
+------------------------------+
  • nanvl
  • negative(expr):取反
> select negative(100);

+-------+
|(- 100)|
+-------+
|   -100|
+-------+
  • next_day(start_date, day_of_week)
> select next_day('2020-03-11', 'FRI');

+---------------------------------------+
|next_day(CAST(2020-03-11 AS DATE), FRI)|
+---------------------------------------+
|                             2020-03-13|
+---------------------------------------+
  • not
  • now()
> select now();

+--------------------+
| current_timestamp()|
+--------------------+
|2020-03-11 23:07:...|
+--------------------+
  • ntile
  • nullif
  • nvl(expr1, expr2):expr1为空返回expr2
> select nvl(null, 'hello');

+------------------+
|nvl(NULL, 'hello')|
+------------------+
|             hello|
+------------------+
  • nvl2(expr1, expr2, expr3):
  • octet_length(expr):长度
> select octet_length('hello world');

+-------------------------+
|octet_length(hello world)|
+-------------------------+
|                       11|
+-------------------------+
  • or
  • parse_url
  • percent_rank
  • percentile
  • percentile_approx
  • pi():圆周率
> select pi();

+-----------------+
|             PI()|
+-----------------+
|3.141592653589793|
+-----------------+
  • pmod
  • posexplode(expr):分割数组
> select posexplode(array(1, 3, 4));

+---+---+
|pos|col|
+---+---+
|  0|  1|
|  1|  3|
|  2|  4|
+---+---+
  • posexplode_outer(expr):分割数组
> select posexplode_outer(array(1, 3, 4));

+---+---+
|pos|col|
+---+---+
|  0|  1|
|  1|  3|
|  2|  4|
+---+---+
  • position(substr, str[, pos]):返回位置
> select position('he', 'world hello', 1);

+--------------------------+
|locate(he, world hello, 1)|
+--------------------------+
|                         7|
+--------------------------+
  • positive
  • pow(expr1, expr2):乘方
> select pow(2, 4);

+-------------------------------------------+
|POWER(CAST(2 AS DOUBLE), CAST(4 AS DOUBLE))|
+-------------------------------------------+
|                                       16.0|
+-------------------------------------------+
  • power(expr1, expr2):乘方
> select power(2, 4);

+-------------------------------------------+
|POWER(CAST(2 AS DOUBLE), CAST(4 AS DOUBLE))|
+-------------------------------------------+
|                                       16.0|
+-------------------------------------------+
  • printf(strfmt, obj, …):字符串格式化
> select printf('Hello World %d %s', 100, 'days');

+-------------------------------------------+
|format_string(Hello World %d %s, 100, days)|
+-------------------------------------------+
|                       Hello World 100 days|
+-------------------------------------------+
  • quarter(date):
> select quarter('2020-11-11');

+---------------------------------+
|quarter(CAST(2020-11-11 AS DATE))|
+---------------------------------+
|                                4|
+---------------------------------+
  • radians(expr):
> select radians(180);

+----------------------------+
|RADIANS(CAST(180 AS DOUBLE))|
+----------------------------+
|           3.141592653589793|
+----------------------------+
  • rand([seed]):随机数
> select rand();

+-------------------------+
|rand(7896507467301193574)|
+-------------------------+
|       0.1643759947938076|
+-------------------------+
  • randn([seed]):随机数
> select randn();

+---------------------------+
|randn(-2498753337637632801)|
+---------------------------+
|        -0.4827279803917336|
+---------------------------+

  • rank
  • reflect(class, method[, arg1[, arg2 …]]),反射调用java方法
> select reflect('java.util.UUID', 'randomUUID');

+-----------------------------------+
|reflect(java.util.UUID, randomUUID)|
+-----------------------------------+
|               3930dab2-d5e0-49e...|
+-----------------------------------+
  • regexp_extract
  • regexp_replace(str, regexp, rep):
  • repeat(str, n):重复
> select repeat('ha', 4);

+-------------+
|repeat(ha, 4)|
+-------------+
|     hahahaha|
+-------------+
  • replace(str, search[, replace]):替换
> select replace('hello world', 'lo', 'ol');

+----------------------------+
|replace(hello world, lo, ol)|
+----------------------------+
|                 helol world|
+----------------------------+
  • reverse(str):反转
> select reverse('hello world');

+--------------------+
|reverse(hello world)|
+--------------------+
|         dlrow olleh|
+--------------------+
  • right(str, len):从右截取字符串
> select right('hello world', 3);

+-----------------------+
|right('hello world', 3)|
+-----------------------+
|                    rld|
+-----------------------+
  • rint(expr):返回int部分的值
> select rint(3.14159);

+------------------------------+
|ROUND(CAST(3.14159 AS DOUBLE))|
+------------------------------+
|                           3.0|
+------------------------------+
  • rlike
  • rollup
  • round(expr, d):
> select round(2.5, 0);

+-------------+
|round(2.5, 0)|
+-------------+
|            3|
+-------------+
  • row_number
  • rpad
  • rtrim(trimStr, str):字符串右边截取
> select rtrim('ha', 'hahello worldha');

+--------------------------+
|rtrim(hahello worldha, ha)|
+--------------------------+
|             hahello world|
+--------------------------+
  • schema_of_json
  • second(timestamp):查询秒
> select second('2011-11-11 11:11:11');

+----------------------------------------------+
|second(CAST(2011-11-11 11:11:11 AS TIMESTAMP))|
+----------------------------------------------+
|                                            11|
+----------------------------------------------+
  • sentences(str[, lang, country])
> select sentences('hello world!my dreams');

+------------------------------------+
|sentences(hello world!my dreams, , )|
+------------------------------------+
|                [[hello, world], ...|
+------------------------------------+
  • sequence
  • sha(expr):加密
> select sha('hello world');

+---------------------------------+
|sha1(CAST(hello world AS BINARY))|
+---------------------------------+
|             2aae6c35c94fcfb41...|
+---------------------------------+
  • sha1(expr):加密
> select sha1('hello world');

+---------------------------------+
|sha1(CAST(hello world AS BINARY))|
+---------------------------------+
|             2aae6c35c94fcfb41...|
+---------------------------------+
  • sha2(expr, bitLength):加密
> select sha2('hello world', 256);

+--------------------------------------+
|sha2(CAST(hello world AS BINARY), 256)|
+--------------------------------------+
|                  b94d27b9934d3e08a...|
+--------------------------------------+
  • shiftleft
  • shiftright
  • shiftrightunsigned
  • shuffle
  • sign
  • signum
  • sin(expr):正弦函数
> select sin(0);

+----------------------+
|SIN(CAST(0 AS DOUBLE))|
+----------------------+
|                   0.0|
+----------------------+
  • sinh
  • size(expr):数据大小
> select size(array(1, 2, 3));

+--------------------+
|size(array(1, 2, 3))|
+--------------------+
|                   3|
+--------------------+
  • skewness
  • slice
  • smallint(expr):转成smallint
> select smallint(100);

+---------------------+
|CAST(100 AS SMALLINT)|
+---------------------+
|                  100|
+---------------------+
  • sort_array(array[, ascendingOrder]):数组排序
> select sort_array(array(2, 3, 1, 0), true);

+-----------------------------------+
|sort_array(array(2, 3, 1, 0), true)|
+-----------------------------------+
|                       [0, 1, 2, 3]|
+-----------------------------------+
  • soundex
  • space
  • spark_partition_id()
  • split(str, regex):字符串分割
> select split('hehahehahehaa', 'ha');

+------------------------+
|split(hehahehahehaa, ha)|
+------------------------+
|         [he, he, he, a]|
+------------------------+
  • sqrt(expr):平方根
> select sqrt(9);

+-----------------------+
|SQRT(CAST(9 AS DOUBLE))|
+-----------------------+
|                    3.0|
+-----------------------+
  • stack
  • std
  • stddev
  • stddev_pop
  • stddev_samp
  • str_to_map(text[, pairDelim[, keyValueDelim]]):转成map
> select str_to_map('a:1,b:2,c:3', ',', ':');

+-----------------------------+
|str_to_map(a:1,b:2,c:3, ,, :)|
+-----------------------------+
|         [a -> 1, b -> 2, ...|
+-----------------------------+
  • string(expr):转成string
  • struct(col1, col2, col3, …)
> select struct(1, 3, 4);

+---------------------------------------+
|named_struct(col1, 1, col2, 3, col3, 4)|
+---------------------------------------+
|                              [1, 3, 4]|
+---------------------------------------+
  • substr(str, pos[, len]):字符串截取
> select substr('hello world', 5, 1);

+----------------------------+
|substring(hello world, 5, 1)|
+----------------------------+
|                           o|
+----------------------------+
  • substring(str, pos[, len]):字符串截取
> select substring('hello world', 5, 1);

+----------------------------+
|substring(hello world, 5, 1)|
+----------------------------+
|                           o|
+----------------------------+
  • substring_index
  • sum(expr):求和
  • tan(expr):正切函数
> select tan(0);

+----------------------+
|TAN(CAST(0 AS DOUBLE))|
+----------------------+
|                   0.0|
+----------------------+
  • tanh
  • timestamp(expr):转成时间戳
  • tinyint(expr):转成tinyint
  • to_date(date_str[, fmt]):转成时间
> select to_date('2011-11-11 11:11:11');

+------------------------------+
|to_date('2011-11-11 11:11:11')|
+------------------------------+
|                    2011-11-11|
+------------------------------+
  • to_json(expr[, options]):生成json
> select to_json(named_struct('a', 1, 'b', 2));

+---------------------------------------+
|structstojson(named_struct(a, 1, b, 2))|
+---------------------------------------+
|                          {"a":1,"b":2}|
+---------------------------------------+
  • to_timestamp(timestamp[, fmt]):
> select to_timestamp('2011-11-11 11:11:11', 'yyyy-MM-dd HH:mm:ss');

+----------------------------------------------------------+
|to_timestamp('2011-11-11 11:11:11', 'yyyy-MM-dd HH:mm:ss')|
+----------------------------------------------------------+
|                                       2011-11-11 11:11:11|
+----------------------------------------------------------+
  • to_unix_timestamp(expr[, pattern])
> select to_unix_timestamp('2011-11-11 11:11:11', 'yyyy-MM-dd HH:mm:ss');

+-----------------------------------------------------------+
|to_unix_timestamp(2011-11-11 11:11:11, yyyy-MM-dd HH:mm:ss)|
+-----------------------------------------------------------+
|                                                 1320981071|
+-----------------------------------------------------------+
  • to_utc_timestamp(timestamp, timezone):
> select to_utc_timestamp('2011-11-11 11:11:11', 'Asia/Seoul');

+--------------------------------------------------------------------+
|to_utc_timestamp(CAST(2011-11-11 11:11:11 AS TIMESTAMP), Asia/Seoul)|
+--------------------------------------------------------------------+
|                                                 2011-11-11 02:11:11|
+--------------------------------------------------------------------+
  • transform
  • translate
  • trim(str):字符串去除前后空格
> select trim('ha', 'hahello worldha');

+-------------------------+
|trim(hahello worldha, ha)|
+-------------------------+
|               ello world|
+-------------------------+
  • trunc(date, fmt):fmt - [“year”, “yyyy”, “yy”, “mon”, “month”, “mm”]
> select trunc('2011-11-11 02:11:11', 'yyyy');

+----------------------------------------------+
|trunc(CAST(2011-11-11 02:11:11 AS DATE), yyyy)|
+----------------------------------------------+
|                                    2011-01-01|
+----------------------------------------------+
  • ucase(str):转换为大写
> select ucase('hello world');

+------------------+
|upper(hello world)|
+------------------+
|       HELLO WORLD|
+------------------+
  • unbase64(str):
> select unbase64('aGVsbG8gd29ybGQ=');

+--------------------------+
|unbase64(aGVsbG8gd29ybGQ=)|
+--------------------------+
|      [68 65 6C 6C 6F 2...|
+--------------------------+
  • unhex(str):
> select unhex('68656C6C6F');

+-----------------+
|unhex(68656C6C6F)|
+-----------------+
| [68 65 6C 6C 6F]|
+-----------------+
  • unix_timestamp([expr[, pattern]]):
> select unix_timestamp('2020-01-01', 'yyyy-MM-dd');

+--------------------------------------+
|unix_timestamp(2020-01-01, yyyy-MM-dd)|
+--------------------------------------+
|                            1577808000|
+--------------------------------------+
  • upper(str):转换为大写
> select upper('hello world');

+------------------+
|upper(hello world)|
+------------------+
|       HELLO WORLD|
+------------------+
  • uuid()
> select uuid();

+--------------------+
|              uuid()|
+--------------------+
|bd8aa547-9d08-46c...|
+--------------------+
  • var_pop
  • var_samp
  • variance
  • weekday
  • weekofyear(date):
> select weekofyear('2011-11-11');

+------------------------------------+
|weekofyear(CAST(2011-11-11 AS DATE))|
+------------------------------------+
|                                  45|
+------------------------------------+
  • when
> select case when 1 > 0 then 1 when 2 > 0 then 2.0 else 1.2 end;

+--------------------------------------------------------------------------------------------------------------------------------+
|CASE WHEN (1 > 0) THEN CAST(1 AS DECIMAL(11,1)) WHEN (2 > 0) THEN CAST(2.0 AS DECIMAL(11,1)) ELSE CAST(1.2 AS DECIMAL(11,1)) END|
+--------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                             1.0|
+--------------------------------------------------------------------------------------------------------------------------------+
  • window
  • xpath
  • xpath_boolean
  • xpath_double
  • xpath_float
  • xpath_int
  • xpath_long
  • xpath_number
  • xpath_short
  • xpath_string
  • year(date):
> select year('2011-11-11');

+------------------------------+
|year(CAST(2011-11-11 AS DATE))|
+------------------------------+
|                          2011|
+------------------------------+
  • zip_with

你可能感兴趣的:(Spark SQL内置函数)