! expr - Logical not.
不的意思
Examples:
> SELECT ! true;
false
> SELECT ! false;
true
> SELECT ! NULL;
NULL
Since: 1.0.0
expr1 != expr2 - Returns true if expr1
is not equal to expr2
, or false otherwise.
如果
expr1
不等于expr2
则返回 true,否则返回 false。
Arguments:
Examples:
> SELECT 1 != 2;
true
> SELECT 1 != '2';
true
> SELECT true != NULL;
NULL
> SELECT NULL != NULL;
NULL
Since: 1.0.0
expr1 % expr2 - Returns the remainder after expr1
/expr2
.
expr1 % expr2 - 返回
expr1``expr2
之后的余数。
Examples:
> SELECT 2 % 1.8;
0.2
> SELECT MOD(2, 1.8);
0.2
Since: 1.0.0
expr1 & expr2 - Returns the result of bitwise AND of expr1
and expr2
.
expr1 & expr2 - 返回
expr1
和expr2
的按位与结果。
Examples:
> SELECT 3 & 5;
1
Since: 1.4.0
expr1 * expr2 - Returns expr1
*expr2
.
返回
expr1
乘以expr2
的结果。
Examples:
> SELECT 2 * 3;
6
Since: 1.0.0
expr1 + expr2 - Returns expr1
+expr2
.
Examples:
> SELECT 1 + 2;
3
Since: 1.0.0
expr1 - expr2 - Returns expr1
-expr2
.
Examples:
> SELECT 2 - 1;
1
Since: 1.0.0
expr1 / expr2 - Returns expr1
/expr2
. It always performs floating point division.
Examples:
> SELECT 3 / 2;
1.5
> SELECT 2L / 2L;
1.0
Since: 1.0.0
expr1 < expr2 - Returns true if expr1
is less than expr2
.
Arguments:
Examples:
> SELECT 1 < 2;
true
> SELECT 1.1 < '1';
false
> SELECT to_date('2009-07-30 04:17:52') < to_date('2009-07-30 04:17:52');
false
> SELECT to_date('2009-07-30 04:17:52') < to_date('2009-08-01 04:17:52');
true
> SELECT 1 < NULL;
NULL
Since: 1.0.0
expr1 <= expr2 - Returns true if expr1
is less than or equal to expr2
.
Arguments:
Examples:
> SELECT 2 <= 2;
true
> SELECT 1.0 <= '1';
true
> SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-07-30 04:17:52');
true
> SELECT to_date('2009-07-30 04:17:52') <= to_date('2009-08-01 04:17:52');
true
> SELECT 1 <= NULL;
NULL
Since: 1.0.0
expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
expr1 <=> expr2 - 对于非空操作数返回与 EQUAL(=) 运算符相同的结果,但如果两者都为空则返回 true,如果其中一个为空则返回 false。
Arguments:
Examples:
> SELECT 2 <=> 2;
true
> SELECT 1 <=> '1';
true
> SELECT true <=> NULL;
false
> SELECT NULL <=> NULL;
true
Since: 1.1.0
expr1 != expr2 - Returns true if expr1
is not equal to expr2
, or false otherwise.
Arguments:
Examples:
> SELECT 1 != 2;
true
> SELECT 1 != '2';
true
> SELECT true != NULL;
NULL
> SELECT NULL != NULL;
NULL
Since: 1.0.0
expr1 = expr2 - Returns true if expr1
equals expr2
, or false otherwise.
Arguments:
Examples:
> SELECT 2 = 2;
true
> SELECT 1 = '1';
true
> SELECT true = NULL;
NULL
> SELECT NULL = NULL;
NULL
Since: 1.0.0
expr1 == expr2 - Returns true if expr1
equals expr2
, or false otherwise.
Arguments:
Examples:
> SELECT 2 == 2;
true
> SELECT 1 == '1';
true
> SELECT true == NULL;
NULL
> SELECT NULL == NULL;
NULL
Since: 1.0.0
expr1 > expr2 - Returns true if expr1
is greater than expr2
.
Arguments:
Examples:
> SELECT 2 > 1;
true
> SELECT 2 > '1.1';
true
> SELECT to_date('2009-07-30 04:17:52') > to_date('2009-07-30 04:17:52');
false
> SELECT to_date('2009-07-30 04:17:52') > to_date('2009-08-01 04:17:52');
false
> SELECT 1 > NULL;
NULL
Since: 1.0.0
expr1 >= expr2 - Returns true if expr1
is greater than or equal to expr2
.
Arguments:
Examples:
> SELECT 2 >= 1;
true
> SELECT 2.0 >= '2.1';
false
> SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-07-30 04:17:52');
true
> SELECT to_date('2009-07-30 04:17:52') >= to_date('2009-08-01 04:17:52');
false
> SELECT 1 >= NULL;
NULL
Since: 1.0.0
expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1
and expr2
.
Examples:
> SELECT 3 ^ 5;
6
Since: 1.4.0
abs(expr) - Returns the absolute value of the numeric value.
返回
expr
的绝对值。
Examples:
> SELECT abs(-1);
1
Since: 1.2.0
acos(expr) - Returns the inverse cosine (a.k.a. arc cosine) of expr
, as if computed by java.lang.Math.acos
.
acos(expr) - 返回
expr
的反余弦(也称为反余弦),就像由java.lang.Math.acos
计算的一样。
Examples:
> SELECT acos(1);
0.0
> SELECT acos(2);
NaN
Since: 1.4.0
acosh(expr) - Returns inverse hyperbolic cosine of expr
.
返回
expr
的反双曲余弦值。
Examples:
> SELECT acosh(1);
0.0
> SELECT acosh(0);
NaN
Since: 3.0.0
add_months(start_date, num_months) - Returns the date that is num_months
after start_date
.
如果
num_months
为正值表示加多少个月,如果num_months
为负值表示减去多少个月,返回的就是加/减后的时间
Examples:
> SELECT add_months('2016-08-31', 1);
2016-09-30
Since: 1.5.0
aggregate(expr, start, merge, finish) - Applies a binary operator to an initial state and all elements in the array, and reduces this to a single state. The final state is converted into the final result by applying a finish function.
aggregate(expr, start, merge, finish) - 将二元运算符应用于初始状态和数组中的所有元素,并将其简化为单个状态。通过应用完成函数将最终状态转换为最终结果。
Examples:
> SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x);
6
> SELECT aggregate(array(1, 2, 3), 0, (acc, x) -> acc + x, acc -> acc * 10);
60
Since: 2.4.0
expr1 and expr2 - Logical AND.
逻辑与。
Examples:
> SELECT true and true;
true
> SELECT true and false;
false
> SELECT true and NULL;
NULL
> SELECT false and NULL;
false
Since: 1.0.0
any(expr) - Returns true if at least one value of expr
is true.
如果
expr
的值有一个值为真,则返回真。
Examples:
> SELECT any(col) FROM VALUES (true), (false), (false) AS tab(col);
true
> SELECT any(col) FROM VALUES (NULL), (true), (false) AS tab(col);
true
> SELECT any(col) FROM VALUES (false), (false), (NULL) AS tab(col);
false
Since: 3.0.0
approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD
defines the maximum relative standard deviation allowed.
统计传入列不重复的元素个数(也就是去重之后的元素个数)
Examples:
> SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1);
3
Since: 1.6.0
求多个分位数,结果为数组样式,可以通过取下标的方式获取数组中的单个结果。
approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile
of the numeric column col
which is the smallest value in the ordered col
values (sorted from least to greatest) such that no more than percentage
of col
values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy
yields better accuracy, 1.0/accuracy
is the relative error of the approximation. When percentage
is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col
at the given percentage array.
返回数字列
col
的近似percentile
,它是有序col
值中的最小值(从最小到最大排序),这样不超过percentage
col值的
小于或等于该值。百分比的值必须介于 0.0 和 1.0 之间。accuracy
参数(默认值:10000)是一个正数值文字,它以内存为代价控制近似精度。accuracy
的值越高,精度越高,1.0accuracy
是近似值的相对误差。当percentage
为数组时,百分比数组的每个值必须介于 0.0 和 1.0 之间。在这种情况下,返回给定百分比数组中列col
的近似百分比数组。
Examples:
> SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
[1,1,0]
> SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
7
Since: 2.1.0
array(expr, …) - Returns an array with the given elements.
返回给定元素的数组。
Examples:
> SELECT array(1, 2, 3);
[1,2,3]
Since: 1.1.0
array_contains(array, value) - Returns true if the array contains the value.
如果数组包含给定的元素,则返回 true,否则返回false。
Examples:
> SELECT array_contains(array(1, 2, 3), 2);
true
Since: 1.5.0
array_distinct(array) - Removes duplicate values from the array.
去除数组中的重复值
Examples:
> SELECT array_distinct(array(1, 2, 3, null, 3));
[1,2,3,null]
Since: 2.4.0
array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates.
返回集合1对于集合2的差集,也就是显示集合1独有的部分
Examples:
> SELECT array_except(array(1, 2, 3), array(1, 3, 5));
[2]
Since: 2.4.0
array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates.
显示集合1和集合2的并集,也就是集合1和集合2都有的部分,会对并集自动去重
Examples:
> SELECT array_intersect(array(1, 2, 3), array(1, 3, 5));
[1,3]
Since: 2.4.0
array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.
将集合内的元素用指定的字符拼接在一起,如果没有为 nullReplacement 设置值,则过滤所有null值。
Examples:
> SELECT array_join(array('hello', 'world'), ' ');
hello world
> SELECT array_join(array('hello', null ,'world'), ' ');
hello world
> SELECT array_join(array('hello', null ,'world'), ' ', ',');
hello , world
Since: 2.4.0
array_max(array) - Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.
返回数组中的最大值。对于 doublefloat 类型,NaN 大于任何非 NaN 元素。 NULL 元素被跳过。
Examples:
> SELECT array_max(array(1, 20, null, 3));
20
Since: 2.4.0
array_min(array) - Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.
返回数组中的最小值。对于 doublefloat 类型,NaN 大于任何非 NaN 元素。 NULL 元素被跳过。
Examples:
> SELECT array_min(array(1, 20, null, 3));
1
Since: 2.4.0
array_position(array, element) - Returns the (1-based) index of the first element of the array as long.
返回数组中给定元素的索引下标,不存在返回0(索引下标从1开始)
Examples:
> SELECT array_position(array(3, 2, 1), 1);
3
Since: 2.4.0
array_remove(array, element) - Remove all elements that equal to element from array.
从数组中移除所有等于element的元素。
Examples:
> SELECT array_remove(array(1, 2, 3, null, 3), 3);
[1,2,null]
Since: 2.4.0
array_repeat(element, count) - Returns the array containing element count times.
将
element
复制count
份,合并成一个数组
Examples:
> SELECT array_repeat('123', 2);
["123","123"]
Since: 2.4.0
array_sort(expr, func) - Sorts the input array. If func is omitted, sort in ascending order. The elements of the input array must be orderable. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the end of the returned array. Since 3.0.0 this function also sorts and returns the array based on the given comparator function. The comparator will take two arguments representing two elements of the array. It returns -1, 0, or 1 as the first element is less than, equal to, or greater than the second element. If the comparator function returns other values (including null), the function will fail and raise an error.
array_sort(expr, func) - 对输入数组进行排序。如果省略 func,则按升序排序。输入数组的元素必须是可排序的。对于 doublefloat 类型,NaN 大于任何非 NaN 元素。空元素将放置在返回数组的末尾。从 3.0.0 开始,此函数还根据给定的比较器函数对数组进行排序并返回。比较器将采用两个参数表示数组的两个元素。当第一个元素小于、等于或大于第二个元素时,它返回 -1、0 或 1。如果比较器函数返回其他值(包括 null),则该函数将失败并引发错误。
Examples:
> SELECT array_sort(array(5, 6, 1), (left, right) -> case when left < right then -1 when left > right then 1 else 0 end);
[1,5,6]
> SELECT array_sort(array('bc', 'ab', 'dc'), (left, right) -> case when left is null and right is null then 0 when left is null then -1 when right is null then 1 when left < right then 1 when left > right then -1 else 0 end);
["dc","bc","ab"]
> SELECT array_sort(array('b', 'd', null, 'c', 'a'));
["a","b","c","d",null]
Since: 2.4.0
array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.
array_union(array1, array2) - 返回 array1 和 array2 联合中元素的数组,没有重复
Examples:
> SELECT array_union(array(1, 2, 3), array(1, 3, 5));
[1,2,3,5]
Since: 2.4.0
arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.
arrays_overlap(a1, a2) - 如果 a1 至少包含一个也存在于 a2 中的非空元素,则返回 true。如果数组没有公共元素,并且它们都是非空的,并且它们中的任何一个都包含空元素,则返回 null,否则返回 false。
Examples:
> SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
true
Since: 2.4.0
arrays_zip(a1, a2, …) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
arrays_zip(a1, a2, …) - 返回结构的合并数组,其中第 N 个结构包含输入数组的所有第 N 个值。
Examples:
> SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4));
[{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]
> SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4));
[{"0":1,"1":2,"2":3},{"0":2,"1":3,"2":4}]
Since: 2.4.0
ascii(str) - Returns the numeric value of the first character of str
.
ascii(str) - 返回
str
的第一个字符的数值。
Examples:
> SELECT ascii('222');
50
> SELECT ascii(2);
50
Since: 1.5.0
asin(expr) - Returns the inverse sine (a.k.a. arc sine) the arc sin of expr
, as if computed by java.lang.Math.asin
.
asin(expr) - 返回反正弦(也称为反正弦)
expr
的反正弦,如同由java.lang.Math.asin
计算。
Examples:
> SELECT asin(0);
0.0
> SELECT asin(2);
NaN
Since: 1.4.0
asinh(expr) - Returns inverse hyperbolic sine of expr
.
asinh(expr) - 返回
expr
的反双曲正弦值。
Examples:
> SELECT asinh(0);
0.0
Since: 3.0.0
assert_true(expr) - Throws an exception if expr
is not true.
assert_true(expr) - 如果
expr
不为真,则抛出异常。
Examples:
> SELECT assert_true(0 < 1);
NULL
Since: 2.0.0
atan(expr) - Returns the inverse tangent (a.k.a. arc tangent) of expr
, as if computed by java.lang.Math.atan
atan(expr) - 返回
expr
的反正切(又名反正切),如同由java.lang.Math.atan
计算
Examples:
> SELECT atan(0);
0.0
Since: 1.4.0
atan2(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (exprX
, exprY
), as if computed by java.lang.Math.atan2
.
atan2(exprY, exprX) - 返回平面的正 x 轴和坐标(
exprX
,exprY
)给定的点之间的弧度角,就像由java.lang.Math.atan2 计算的一样
。
Arguments:
Examples:
> SELECT atan2(0, 0);
0.0
Since: 1.4.0
atanh(expr) - Returns inverse hyperbolic tangent of expr
.
atanh(expr) - 返回
expr
的反双曲正切值。
Examples:
> SELECT atanh(0);
0.0
> SELECT atanh(2);
NaN
Since: 3.0.0
avg(expr) - Returns the mean calculated from values of a group.
avg(expr) - 返回根据组值计算的平均值。
Examples:
> SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col);
2.0
> SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
1.5
Since: 1.0.0
base64(bin) - Converts the argument from a binary bin
to a base 64 string.
base64(bin) - 将参数从二进制
bin
转换为 base 64 字符串。
Examples:
> SELECT base64('Spark SQL');
U3BhcmsgU1FM
Since: 1.5.0
expr1 [NOT] BETWEEN expr2 AND expr3 - evaluate if expr1
is [not] in between expr2
and expr3
.
expr1 [NOT] BETWEEN expr2 AND expr3 - 评估
expr1
是否 [not] 在expr2
和expr3
之间。
Examples:
> SELECT col1 FROM VALUES 1, 3, 5, 7 WHERE col1 BETWEEN 2 AND 5;
3
5
Since: 1.0.0
bigint(expr) - Casts the value expr
to the target data type bigint
.
bigint(expr) - 将值
expr
转换为目标数据类型bigint
。
Since: 2.0.1
bin(expr) - Returns the string representation of the long value expr
represented in binary.
bin(expr) - 返回以二进制表示的长值
expr
的字符串表示形式。
Examples:
> SELECT bin(13);
1101
> SELECT bin(-13);
1111111111111111111111111111111111111111111111111111111111110011
> SELECT bin(13.3);
1101
Since: 1.5.0
binary(expr) - Casts the value expr
to the target data type binary
.
binary(expr) - 将值
expr
转换为目标数据类型binary
。
Since: 2.0.1
bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none.
bit_and(expr) - 返回所有非空输入值的按位与,如果没有则返回空。
Examples:
> SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col);
1
Since: 3.0.0
bit_count(expr) - Returns the number of bits that are set in the argument expr as an unsigned 64-bit integer, or NULL if the argument is NULL.
bit_count(expr) - 返回在参数 expr 中设置为无符号 64 位整数的位数,如果参数为 NULL,则返回 NULL。
Examples:
> SELECT bit_count(0);
0
Since: 3.0.0
bit_get(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.
bit_get(expr, pos) - 返回指定位置的位(0 或 1)的值。位置从右到左编号,从零开始。位置参数不能为负。
Examples:
> SELECT bit_get(11, 0);
1
> SELECT bit_get(11, 2);
0
Since: 3.2.0
bit_length(expr) - Returns the bit length of string data or number of bits of binary data.
bit_length(expr) - 返回字符串数据的位长度或二进制数据的位数。
Examples:
> SELECT bit_length('Spark SQL');
72
Since: 2.3.0
bit_or(expr) - Returns the bitwise OR of all non-null input values, or null if none.
bit_or(expr) - 返回所有非空输入值的按位或,如果没有则返回空。
Examples:
> SELECT bit_or(col) FROM VALUES (3), (5) AS tab(col);
7
Since: 3.0.0
bit_xor(expr) - Returns the bitwise XOR of all non-null input values, or null if none.
bit_xor(expr) - 返回所有非空输入值的按位异或,如果没有则返回空。
Examples:
> SELECT bit_xor(col) FROM VALUES (3), (5) AS tab(col);
6
Since: 3.0.0
bool_and(expr) - Returns true if all values of expr
are true.
bool_and(expr) - 如果
expr
的所有值都为真,则返回真。
Examples:
> SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col);
true
> SELECT bool_and(col) FROM VALUES (NULL), (true), (true) AS tab(col);
true
> SELECT bool_and(col) FROM VALUES (true), (false), (true) AS tab(col);
false
Since: 3.0.0
bool_or(expr) - Returns true if at least one value of expr
is true.
bool_or(expr) - 如果
expr
的至少一个值为真,则返回真。
Examples:
> SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col);
true
> SELECT bool_or(col) FROM VALUES (NULL), (true), (false) AS tab(col);
true
> SELECT bool_or(col) FROM VALUES (false), (false), (NULL) AS tab(col);
false
Since: 3.0.0
boolean(expr) - Casts the value expr
to the target data type boolean
.
boolean(expr) - 将值
expr
转换为目标数据类型boolean
。
Since: 2.0.1
bround(expr, d) - Returns expr
rounded to d
decimal places using HALF_EVEN rounding mode.
bround(expr, d) - 返回
expr
使用 HALF_EVEN 舍入模式舍入到d
小数位。
Examples:
> SELECT bround(2.5, 0);
2
Since: 2.0.0
btrim(str) - Removes the leading and trailing space characters from str
.
btrim(str) - 从
str
中删除前导和尾随空格字符。
btrim(str, trimStr) - Remove the leading and trailing trimStr
characters from str
.
btrim(str, trimStr) - 从
str
中删除前导和尾随trimStr
字符。
Arguments:
Examples:
> SELECT btrim(' SparkSQL ');
SparkSQL
> SELECT btrim(encode(' SparkSQL ', 'utf-8'));
SparkSQL
> SELECT btrim('SSparkSQLS', 'SL');
parkSQ
> SELECT btrim(encode('SSparkSQLS', 'utf-8'), encode('SL', 'utf-8'));
parkSQ
Since: 3.2.0
cardinality(expr) - Returns the size of an array or a map. The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.
cardinality(expr) - 返回数组或映射的大小。如果 spark.sql.legacy.sizeOfNull 设置为 false 或 spark.sql.ansi.enabled 设置为 true,则该函数为 null 输入返回 null。否则,该函数为空输入返回 -1。在默认设置下,该函数为空输入返回 -1。
Examples:
> SELECT cardinality(array('b', 'd', 'c', 'a'));
4
> SELECT cardinality(map('a', 1, 'b', 2));
2
> SELECT cardinality(NULL);
-1
Since: 1.5.0
CASE expr1 WHEN expr2 THEN expr3 [WHEN expr4 THEN expr5]* [ELSE expr6] END - When expr1
= expr2
, returns expr3
; when expr1
= expr4
, return expr5
; else return expr6
.
CASE expr1 WHEN expr2 THEN expr3 [WHEN expr4 THEN expr5] [ELSE expr6] END - 当
expr1
=expr2
时,返回expr3
;当expr1
=expr4
时,返回expr5
;否则返回“expr6”。
Arguments:
Examples:
> SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE '?' END FROM VALUES 1, 2, 3;
one
two
?
> SELECT CASE col1 WHEN 1 THEN 'one' WHEN 2 THEN 'two' END FROM VALUES 1, 2, 3;
one
two
NULL
Since: 1.0.1
cast(expr AS type) - Casts the value expr
to the target data type type
.
cast(expr AS type) - 将值
expr
转换为目标数据类型type
。
Examples:
> SELECT cast('10' as int);
10
Since: 1.0.0
cbrt(expr) - Returns the cube root of expr
.
cbrt(expr) - 返回
expr
的立方根。
Examples:
> SELECT cbrt(27.0);
3.0
Since: 1.4.0
ceil(expr) - Returns the smallest integer not smaller than expr
.
ceil(expr) - 返回不小于
expr
的最小整数。
Examples:
> SELECT ceil(-0.1);
0
> SELECT ceil(5);
5
Since: 1.4.0
ceiling(expr) - Returns the smallest integer not smaller than expr
.
返回不小于
expr
的最小整数。
Examples:
> SELECT ceiling(-0.1);
0
> SELECT ceiling(5);
5
Since: 1.4.0
char(expr) - Returns the ASCII character having the binary equivalent to expr
. If n is larger than 256 the result is equivalent to chr(n % 256)
返回二进制等效于
expr
的 ASCII 字符。如果 n 大于 256,则结果等价于 chr(n % 256)
Examples:
> SELECT char(65);
A
Since: 2.3.0
char_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
返回字符串数据的字符长度或二进制数据的字节数。字符串数据的长度包括尾随空格。二进制数据的长度包括二进制零。
Examples:
> SELECT char_length('Spark SQL ');
10
> SELECT CHAR_LENGTH('Spark SQL ');
10
> SELECT CHARACTER_LENGTH('Spark SQL ');
10
Since: 1.5.0
character_length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
返回字符串数据的字符长度或二进制数据的字节数。字符串数据的长度包括尾随空格。二进制数据的长度包括二进制零。
Examples:
> SELECT character_length('Spark SQL ');
10
> SELECT CHAR_LENGTH('Spark SQL ');
10
> SELECT CHARACTER_LENGTH('Spark SQL ');
10
Since: 1.5.0
chr(expr) - Returns the ASCII character having the binary equivalent to expr
. If n is larger than 256 the result is equivalent to chr(n % 256)
返回二进制等效于
expr
的 ASCII 字符。如果 n 大于 256,则结果等价于 chr(n % 256)
Examples:
> SELECT chr(65);
A
Since: 2.3.0
coalesce(expr1, expr2, …) - Returns the first non-null argument if exists. Otherwise, null.
如果存在,则返回第一个非空参数。否则为空。
Examples:
> SELECT coalesce(NULL, 1, NULL);
1
Since: 1.0.0
collect_list(expr) - Collects and returns a list of non-unique elements.
收集并返回非唯一元素的列表。
Examples:
> SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col);
[1,2,1]
Note:
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Since: 2.0.0
collect_set(expr) - Collects and returns a set of unique elements.
收集并返回一组独特的元素。
Examples:
> SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col);
[1,2]
Note:
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Since: 2.0.0
concat(col1, col2, …, colN) - Returns the concatenation of col1, col2, …, colN.
返回 col1、col2、…、colN 的串联。
Examples:
> SELECT concat('Spark', 'SQL');
SparkSQL
> SELECT concat(array(1, 2, 3), array(4, 5), array(6));
[1,2,3,4,5,6]
Note:
Concat logic for arrays is available since 2.4.0.
Since: 1.5.0
concat_ws(sep[, str | array(str)]+) - Returns the concatenation of the strings separated by sep
.
返回由
sep
分隔的字符串的串联。
Examples:
> SELECT concat_ws(' ', 'Spark', 'SQL');
Spark SQL
> SELECT concat_ws('s');
Since: 1.5.0
conv(num, from_base, to_base) - Convert num
from from_base
to to_base
.
将
number
从from_base
转换为to_base
。
Examples:
> SELECT conv('100', 2, 10);
4
> SELECT conv(-10, 16, -10);
-16
Since: 1.5.0
corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.
返回一组数字对之间的 Pearson 相关系数。
Examples:
> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (6, 4) as tab(c1, c2);
0.8660254037844387
Since: 1.6.0
cos(expr) - Returns the cosine of expr
, as if computed by java.lang.Math.cos
.
返回
expr
的余弦值,就像由java.lang.Math.cos
计算的一样。
Arguments:
Examples:
> SELECT cos(0);
1.0
Since: 1.4.0
cosh(expr) - Returns the hyperbolic cosine of expr
, as if computed by java.lang.Math.cosh
.
返回
expr
的双曲余弦值,如同由java.lang.Math.cosh
计算。
Arguments:
Examples:
> SELECT cosh(0);
1.0
Since: 1.4.0
cot(expr) - Returns the cotangent of expr
, as if computed by 1/java.lang.Math.tan
.
返回
expr
的余切,如同由1 java.lang.Math.tan
计算。
Arguments:
Examples:
> SELECT cot(1);
0.6420926159343306
Since: 2.3.0
count(*) - Returns the total number of retrieved rows, including rows containing null.
返回检索到的总行数,包括包含 null 的行。
count(expr[, expr…]) - Returns the number of rows for which the supplied expression(s) are all non-null.
返回提供的表达式全部为非空的行数。
count(DISTINCT expr[, expr…]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.
返回提供的表达式唯一且非空的行数。
Examples:
> SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
4
> SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
3
> SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col);
2
Since: 1.0.0
count_if(expr) - Returns the number of TRUE
values for the expression.
返回提供的表达式唯一且非空的行数。
Examples:
> SELECT count_if(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
2
> SELECT count_if(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
1
Since: 3.0.0
count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch
before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
返回具有给定 esp、置信度和种子的列的 count-min 草图。结果是一个字节数组,可以在使用前反序列化为“CountMinSketch”。 Count-min sketch 是一种概率数据结构,用于使用次线性空间进行基数估计。
Examples:
> SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000
Since: 2.2.0
covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.
返回一组数字对的总体协方差。
Examples:
> SELECT covar_pop(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
0.6666666666666666
Since: 2.0.0
covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.
返回一组数字对的样本协方差。
Examples:
> SELECT covar_samp(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
1.0
Since: 2.0.0
crc32(expr) - Returns a cyclic redundancy check value of the expr
as a bigint.
以 bigint 形式返回
expr
的循环冗余校验值。
Examples:
> SELECT crc32('Spark');
1557323817
Since: 1.5.0
cume_dist() - Computes the position of a value relative to all values in the partition.
计算一个值相对于分区中所有值的位置。
Examples:
> SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 0.6666666666666666
A1 1 0.6666666666666666
A1 2 1.0
A2 3 1.0
Since: 2.0.0
current_catalog() - Returns the current catalog.
返回当前目录。
Examples:
> SELECT current_catalog();
spark_catalog
Since: 3.1.0
current_database() - Returns the current database.
返回当前数据库。
Examples:
> SELECT current_database();
default
Since: 1.6.0
current_date() - Returns the current date at the start of query evaluation. All calls of current_date within the same query return the same value.
返回查询评估开始时的当前日期。同一查询中对 current_date 的所有调用都返回相同的值。
current_date - Returns the current date at the start of query evaluation.
返回查询评估开始时的当前日期。
Examples:
> SELECT current_date();
2020-04-25
> SELECT current_date;
2020-04-25
Note:
The syntax without braces has been supported since 2.0.1.
Since: 1.5.0
current_timestamp() - Returns the current timestamp at the start of query evaluation. All calls of current_timestamp within the same query return the same value.
返回查询评估开始时的当前时间戳。同一查询中对 current_timestamp 的所有调用都返回相同的值。
current_timestamp - Returns the current timestamp at the start of query evaluation.
返回查询评估开始时的当前时间戳。
Examples:
> SELECT current_timestamp();
2020-04-25 15:49:11.914
> SELECT current_timestamp;
2020-04-25 15:49:11.914
Note:
The syntax without braces has been supported since 2.0.1.
Since: 1.5.0
current_timezone() - Returns the current session local timezone.
返回当前会话本地时区。
Examples:
> SELECT current_timezone();
Asia/Shanghai
Since: 3.1.0
current_user() - user name of current execution context.
当前执行上下文的用户名。
Examples:
> SELECT current_user();
mockingjay
Since: 3.2.0
date(expr) - Casts the value expr
to the target data type date
.
将值
expr
转换为目标数据类型date
。
Since: 2.0.1
date_add(start_date, num_days) - Returns the date that is num_days
after start_date
.
返回
start_date
之后的num_days
日期。
Examples:
> SELECT date_add('2016-07-30', 1);
2016-07-31
Since: 1.5.0
date_format(timestamp, fmt) - Converts timestamp
to a value of string in the format specified by the date format fmt
.
将
timestamp
转换为日期格式fmt
指定格式的字符串值。
Arguments:
Examples:
> SELECT date_format('2016-04-08', 'y');
2016
Since: 1.5.0
date_from_unix_date(days) - Create date from the number of days since 1970-01-01.
根据自 1970-01-01 以来的天数创建日期。
Examples:
> SELECT date_from_unix_date(1);
1970-01-02
Since: 3.1.0
date_part(field, source) - Extracts a part of the date/timestamp or interval source.
提取日期时间戳或间隔源的一部分。
Arguments:
EXTRACT
.field
should be extractedExamples:
> SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
2019
> SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456');
33
> SELECT date_part('doy', DATE'2019-08-12');
224
> SELECT date_part('SECONDS', timestamp'2019-10-01 00:00:01.000001');
1.000001
> SELECT date_part('days', interval 5 days 3 hours 7 minutes);
5
> SELECT date_part('seconds', interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
30.001001
> SELECT date_part('MONTH', INTERVAL '2021-11' YEAR TO MONTH);
11
> SELECT date_part('MINUTE', INTERVAL '123 23:55:59.002001' DAY TO SECOND);
55
Note:
The date_part function is equivalent to the SQL-standard function EXTRACT(field FROM source)
Since: 3.0.0
date_sub(start_date, num_days) - Returns the date that is num_days
before start_date
.
返回
start_date
之前的num_days
日期。
Examples:
> SELECT date_sub('2016-07-30', 1);
2016-07-29
Since: 1.5.0
date_trunc(fmt, ts) - Returns timestamp ts
truncated to the unit specified by the format model fmt
.
返回时间戳“ts”,截断为格式模型“fmt”指定的单位。
Arguments:
ts
falls in, the time part will be zero outts
falls in, the time part will be zero outts
falls in, the time part will be zero outts
falls in, the time part will be zero outExamples:
> SELECT date_trunc('YEAR', '2015-03-05T09:32:05.359');
2015-01-01 00:00:00
> SELECT date_trunc('MM', '2015-03-05T09:32:05.359');
2015-03-01 00:00:00
> SELECT date_trunc('DD', '2015-03-05T09:32:05.359');
2015-03-05 00:00:00
> SELECT date_trunc('HOUR', '2015-03-05T09:32:05.359');
2015-03-05 09:00:00
> SELECT date_trunc('MILLISECOND', '2015-03-05T09:32:05.123456');
2015-03-05 09:32:05.123
Since: 2.3.0
datediff(endDate, startDate) - Returns the number of days from startDate
to endDate
.
返回从
startDate
到endDate
的天数。
Examples:
> SELECT datediff('2009-07-31', '2009-07-30');
1
> SELECT datediff('2009-07-30', '2009-07-31');
-1
Since: 1.5.0
day(date) - Returns the day of month of the date/timestamp.
返回日期时间戳的月份日期。
Examples:
> SELECT day('2009-07-30');
30
Since: 1.5.0
dayofmonth(date) - Returns the day of month of the date/timestamp.
返回日期时间戳的月份日期。
Examples:
> SELECT dayofmonth('2009-07-30');
30
Since: 1.5.0
dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, …, 7 = Saturday).
返回日期时间戳的星期几(1 = 星期日,2 = 星期一,…,7 = 星期六)。
Examples:
> SELECT dayofweek('2009-07-30');
5
Since: 2.3.0
dayofyear(date) - Returns the day of year of the date/timestamp.
返回日期时间戳的年份。
Examples:
> SELECT dayofyear('2016-04-09');
100
Since: 1.5.0
decimal(expr) - Casts the value expr
to the target data type decimal
.
将值
expr
转换为目标数据类型decimal
。
Since: 2.0.1
decode(bin, charset) - Decodes the first argument using the second argument character set.
使用第二个参数字符集解码第一个参数。
decode(expr, search, result [, search, result ] … [, default]) - Decode compares expr to each search value one by one. If expr is equal to a search, returns the corresponding result. If no match is found, then Oracle returns default. If default is omitted, returns null.
Decode 将 expr 与每个搜索值一一进行比较。如果 expr 等于一次搜索,则返回相应的结果。如果未找到匹配项,则 Oracle 返回默认值。如果省略默认值,则返回 null。
Examples:
> SELECT decode(encode('abc', 'utf-8'), 'utf-8');
abc
> SELECT decode(2, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic');
San Francisco
> SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle', 'Non domestic');
Non domestic
> SELECT decode(6, 1, 'Southlake', 2, 'San Francisco', 3, 'New Jersey', 4, 'Seattle');
NULL
Since: 3.2.0
degrees(expr) - Converts radians to degrees.
将弧度转换为度数。
Arguments:
Examples:
> SELECT degrees(3.141592653589793);
180.0
Since: 1.4.0
dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.
计算一个值在一组值中的排名。结果是先前分配的排名值加一。与函数 rank 不同,dense_rank 不会在排序序列中产生间隙。
Arguments:
Examples:
> SELECT a, b, dense_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 2
A2 3 1
Since: 2.0.0
expr1 div expr2 - Divide expr1
by expr2
. It returns NULL if an operand is NULL or expr2
is 0. The result is casted to long.
将“expr1”除以“expr2”。如果操作数为 NULL 或
expr2
为 0,则返回 NULL。结果被强制转换为 long。
Examples:
> SELECT 3 div 2;
1
Since: 3.0.0
double(expr) - Casts the value expr
to the target data type double
.
将值
expr
转换为目标数据类型double
。
Since: 2.0.1
e() - Returns Euler’s number, e.
返回欧拉数.
Examples:
> SELECT e();
2.718281828459045
Since: 1.5.0
element_at(array, index) - Returns element of array at given (1-based) index. If index < 0, accesses elements from the last to the first. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled
is set to false. If spark.sql.ansi.enabled
is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.
返回给定(基于 1)索引处的数组元素。如果 index < 0,则从最后一个到第一个访问元素。如果索引超过数组的长度并且
spark.sql.ansi.enabled
设置为 false,则该函数返回 NULL。如果spark.sql.ansi.enabled
设置为 true,则会针对无效索引抛出 ArrayIndexOutOfBoundsException。
element_at(map, key) - Returns value for given key. The function returns NULL if the key is not contained in the map and spark.sql.ansi.enabled
is set to false. If spark.sql.ansi.enabled
is set to true, it throws NoSuchElementException instead.
返回给定键的值。如果键不包含在映射中并且
spark.sql.ansi.enabled
设置为 false,则该函数返回 NULL。如果spark.sql.ansi.enabled
设置为 true,则会引发 NoSuchElementException。
Examples:
> SELECT element_at(array(1, 2, 3), 2);
2
> SELECT element_at(map(1, 'a', 2, 'b'), 2);
b
Since: 2.4.0
elt(n, input1, input2, …) - Returns the n
-th input, e.g., returns input2
when n
is 2. The function returns NULL if the index exceeds the length of the array and spark.sql.ansi.enabled
is set to false. If spark.sql.ansi.enabled
is set to true, it throws ArrayIndexOutOfBoundsException for invalid indices.
返回第 n 个输入,例如,当
n
为 2 时返回input2
。如果索引超过数组的长度并且spark.sql.ansi.enabled
设置为 false,则该函数返回 NULL。如果spark.sql.ansi.enabled
设置为 true,则会针对无效索引抛出 ArrayIndexOutOfBoundsException。
Examples:
> SELECT elt(1, 'scala', 'java');
scala
Since: 2.0.0
encode(str, charset) - Encodes the first argument using the second argument character set.
使用第二个参数字符集对第一个参数进行编码。
Examples:
> SELECT encode('abc', 'utf-8');
abc
Since: 1.5.0
every(expr) - Returns true if all values of expr
are true.
如果
expr
的所有值都为真,则返回真。
Examples:
> SELECT every(col) FROM VALUES (true), (true), (true) AS tab(col);
true
> SELECT every(col) FROM VALUES (NULL), (true), (true) AS tab(col);
true
> SELECT every(col) FROM VALUES (true), (false), (true) AS tab(col);
false
Since: 3.0.0
exists(expr, pred) - Tests whether a predicate holds for one or more elements in the array.
测试谓词是否适用于数组中的一个或多个元素。
Examples:
> SELECT exists(array(1, 2, 3), x -> x % 2 == 0);
true
> SELECT exists(array(1, 2, 3), x -> x % 2 == 10);
false
> SELECT exists(array(1, null, 3), x -> x % 2 == 0);
NULL
> SELECT exists(array(0, null, 2, 3, null), x -> x IS NULL);
true
> SELECT exists(array(1, 2, 3), x -> x IS NULL);
false
Since: 2.4.0
exp(expr) - Returns e to the power of expr
.
返回 e 的
expr
次方。
Examples:
> SELECT exp(0);
1.0
Since: 1.4.0
explode(expr) - Separates the elements of array expr
into multiple rows, or the elements of map expr
into multiple rows and columns. Unless specified otherwise, uses the default column name col
for elements of the array or key
and value
for the elements of the map.
将数组“expr”的元素分成多行,或将映射“expr”的元素分成多行和多列。除非另有说明,否则对数组元素使用默认列名“col”或对映射元素使用“key”和“value”。
Examples:
> SELECT explode(array(10, 20));
10
20
Since: 1.0.0
explode_outer(expr) - Separates the elements of array expr
into multiple rows, or the elements of map expr
into multiple rows and columns. Unless specified otherwise, uses the default column name col
for elements of the array or key
and value
for the elements of the map.
将数组“expr”的元素分成多行,或将映射“expr”的元素分成多行和多列。除非另有说明,否则对数组元素使用默认列名“col”或对映射元素使用“key”和“value”。
Examples:
> SELECT explode_outer(array(10, 20));
10
20
Since: 1.0.0
expm1(expr) - Returns exp(expr
) - 1.
返回 exp(
expr
) - 1。
Examples:
> SELECT expm1(0);
0.0
Since: 1.4.0
extract(field FROM source) - Extracts a part of the date/timestamp or interval source.
提取日期时间戳或间隔源的一部分。
Arguments:
field - selects which part of the source should be extracted
Supported string values of
field
for dates and timestamps are(case insensitive):
Supported string values of
field
for interval(which consists of
months
,
days
,
microseconds
) are(case insensitive):
months
/ 12months
% 12days
part of intervalmicroseconds
containsmicroseconds
microseconds
source - a date/timestamp or interval column from where field
should be extracted
Examples:
> SELECT extract(YEAR FROM TIMESTAMP '2019-08-12 01:00:00.123456');
2019
> SELECT extract(week FROM timestamp'2019-08-12 01:00:00.123456');
33
> SELECT extract(doy FROM DATE'2019-08-12');
224
> SELECT extract(SECONDS FROM timestamp'2019-10-01 00:00:01.000001');
1.000001
> SELECT extract(days FROM interval 5 days 3 hours 7 minutes);
5
> SELECT extract(seconds FROM interval 5 hours 30 seconds 1 milliseconds 1 microseconds);
30.001001
> SELECT extract(MONTH FROM INTERVAL '2021-11' YEAR TO MONTH);
11
> SELECT extract(MINUTE FROM INTERVAL '123 23:55:59.002001' DAY TO SECOND);
55
Note:
The extract function is equivalent to date_part(field, source)
.
Since: 3.0.0
factorial(expr) - Returns the factorial of expr
. expr
is [0…20]. Otherwise, null.
返回
expr
的阶乘。expr
是 [0…20]。否则为空。
Examples:
> SELECT factorial(5);
120
Since: 1.5.0
filter(expr, func) - Filters the input array using the given predicate.
使用给定谓词过滤输入数组。
Examples:
> SELECT filter(array(1, 2, 3), x -> x % 2 == 1);
[1,3]
> SELECT filter(array(0, 2, 3), (x, i) -> x > i);
[2,3]
> SELECT filter(array(0, null, 2, 3, null), x -> x IS NOT NULL);
[0,2,3]
Note:
The inner function may use the index argument since 3.0.0.
Since: 2.4.0
find_in_set(str, str_array) - Returns the index (1-based) of the given string (str
) in the comma-delimited list (str_array
). Returns 0, if the string was not found or if the given string (str
) contains a comma.
返回逗号分隔列表 (
str_array
) 中给定字符串 (str
) 的索引(从 1 开始)。如果未找到字符串或给定字符串 (str
) 包含逗号,则返回 0。
Examples:
> SELECT find_in_set('ab','abc,b,ab,c,def');
3
Since: 1.5.0
first(expr[, isIgnoreNull]) - Returns the first value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values.
返回一组行的“expr”的第一个值。如果
isIgnoreNull
为真,则仅返回非空值。
Examples:
> SELECT first(col) FROM VALUES (10), (5), (20) AS tab(col);
10
> SELECT first(col) FROM VALUES (NULL), (5), (20) AS tab(col);
NULL
> SELECT first(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
5
Note:
The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Since: 2.0.0
first_value(expr[, isIgnoreNull]) - Returns the first value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values.
返回一组行的“expr”的第一个值。如果
isIgnoreNull
为真,则仅返回非空值。
Examples:
> SELECT first_value(col) FROM VALUES (10), (5), (20) AS tab(col);
10
> SELECT first_value(col) FROM VALUES (NULL), (5), (20) AS tab(col);
NULL
> SELECT first_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
5
Note:
The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Since: 2.0.0
flatten(arrayOfArrays) - Transforms an array of arrays into a single array.
将数组数组转换为单个数组。
Examples:
> SELECT flatten(array(array(1, 2), array(3, 4)));
[1,2,3,4]
Since: 2.4.0
float(expr) - Casts the value expr
to the target data type float
.
将值
expr
转换为目标数据类型float
。
Since: 2.0.1
floor(expr) - Returns the largest integer not greater than expr
.
返回不大于
expr
的最大整数。
Examples:
> SELECT floor(-0.1);
-1
> SELECT floor(5);
5
Since: 1.4.0
forall(expr, pred) - Tests whether a predicate holds for all elements in the array.
测试谓词是否适用于数组中的所有元素。
Examples:
> SELECT forall(array(1, 2, 3), x -> x % 2 == 0);
false
> SELECT forall(array(2, 4, 8), x -> x % 2 == 0);
true
> SELECT forall(array(1, null, 3), x -> x % 2 == 0);
false
> SELECT forall(array(2, null, 8), x -> x % 2 == 0);
NULL
Since: 3.0.0
format_number(expr1, expr2) - Formats the number expr1
like ‘#,###,###.##’, rounded to expr2
decimal places. If expr2
is 0, the result has no decimal point or fractional part. expr2
also accept a user specified format. This is supposed to function like MySQL’s FORMAT.
格式化数字“expr1”,如“,.”,四舍五入到“expr2”小数位。如果
expr2
为 0,则结果没有小数点或小数部分。expr2
也接受用户指定的格式。这应该像 MySQL 的 FORMAT 一样工作。
Examples:
> SELECT format_number(12332.123456, 4);
12,332.1235
> SELECT format_number(12332.123456, '##################.###');
12332.123
Since: 1.5.0
format_string(strfmt, obj, …) - Returns a formatted string from printf-style format strings.
从 printf 样式的格式字符串返回一个格式化的字符串。
Examples:
> SELECT format_string("Hello World %d %s", 100, "days");
Hello World 100 days
Since: 1.5.0
from_csv(csvStr, schema[, options]) - Returns a struct value with the given csvStr
and schema
.
返回具有给定
csvStr
和schema
的结构值。
Examples:
> SELECT from_csv('1, 0.8', 'a INT, b DOUBLE');
{"a":1,"b":0.8}
> SELECT from_csv('26/08/2015', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"time":2015-08-26 00:00:00}
Since: 3.0.0
from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr
and schema
.
返回具有给定
jsonStr
和schema
的结构值。
Examples:
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
{"a":1,"b":0.8}
> SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"time":2015-08-26 00:00:00}
Since: 2.2.0
from_unixtime(unix_time[, fmt]) - Returns unix_time
in the specified fmt
.
在指定的
fmt
中返回unix_time
。
Arguments:
Examples:
> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');
1969-12-31 16:00:00
> SELECT from_unixtime(0);
1969-12-31 16:00:00
Since: 1.5.0
from_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, ‘GMT+1’ would yield ‘2017-07-14 03:40:00.0’.
给定一个时间戳,如“2017-07-14 02:40:00.0”,将其解释为 UTC 时间,并将该时间呈现为给定时区的时间戳。例如,“GMT+1”将产生“2017-07-14 03:40:00.0”。
Examples:
> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-31 09:00:00
Since: 1.5.0
get_json_object(json_txt, path) - Extracts a json object from path
.
从
path
中提取一个 json 对象。
Examples:
> SELECT get_json_object('{"a":"b"}', '$.a');
b
Since: 1.5.0
getbit(expr, pos) - Returns the value of the bit (0 or 1) at the specified position. The positions are numbered from right to left, starting at zero. The position argument cannot be negative.
返回指定位置的位(0 或 1)的值。位置从右到左编号,从零开始。位置参数不能为负。
Examples:
> SELECT getbit(11, 0);
1
> SELECT getbit(11, 2);
0
Since: 3.2.0
greatest(expr, …) - Returns the greatest value of all parameters, skipping null values.
返回所有参数的最大值,跳过空值。
Examples:
> SELECT greatest(10, 9, 2, 4, 3);
10
Since: 1.5.0
grouping(col) - indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.",
指示 GROUP BY 中的指定列是否聚合,在结果集中返回 1 表示已聚合或 0 表示未聚合。",
Examples:
> SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name);
Alice 0 2
Bob 0 5
NULL 1 7
Since: 2.0.0
grouping_id([col1[, col2 …]]) - returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)
返回分组的级别,等于
(grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)
Examples:
> SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height);
Alice 0 2 165.0
Alice 1 2 165.0
NULL 3 7 172.5
Bob 0 5 180.0
Bob 1 5 180.0
NULL 2 2 165.0
NULL 2 5 180.0
Note:
Input columns should match with grouping columns exactly, or empty (means all the grouping columns).
Since: 2.0.0
hash(expr1, expr2, …) - Returns a hash value of the arguments.
返回参数的哈希值。
Examples:
> SELECT hash('Spark', array(123), 2);
-1321691492
Since: 2.0.0
hex(expr) - Converts expr
to hexadecimal.
将
expr
转换为十六进制。
Examples:
> SELECT hex(17);
11
> SELECT hex('Spark SQL');
537061726B2053514C
Since: 1.5.0
hour(timestamp) - Returns the hour component of the string/timestamp.
返回字符串时间戳的小时部分。
Examples:
> SELECT hour('2009-07-30 12:58:59');
12
Since: 1.5.0
hypot(expr1, expr2) - Returns sqrt(expr1
**2 + expr2
**2).
返回 sqrt(
expr1
2 +expr2
2)。
Examples:
> SELECT hypot(3, 4);
5.0
Since: 1.4.0
if(expr1, expr2, expr3) - If expr1
evaluates to true, then returns expr2
; otherwise returns expr3
.
如果
expr1
为真,则返回expr2
;否则返回expr3
。
Examples:
> SELECT if(1 < 2, 'a', 'b');
a
Since: 1.0.0
ifnull(expr1, expr2) - Returns expr2
if expr1
is null, or expr1
otherwise.
如果
expr1
为空,则返回expr2
,否则返回expr1
。
Examples:
> SELECT ifnull(NULL, array('2'));
["2"]
Since: 2.0.0
expr1 in(expr2, expr3, …) - Returns true if expr
equals to any valN.
如果
expr
等于任何 valN,则返回 true。
Arguments:
Examples:
> SELECT 1 in(1, 2, 3);
true
> SELECT 1 in(2, 3, 4);
false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 1), named_struct('a', 1, 'b', 3));
false
> SELECT named_struct('a', 1, 'b', 2) in(named_struct('a', 1, 'b', 2), named_struct('a', 1, 'b', 3));
true
Since: 1.0.0
initcap(str) - Returns str
with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.
返回
str
,每个单词的首字母大写。所有其他字母均为小写字母。单词由空格分隔。
Examples:
> SELECT initcap('sPark sql');
Spark Sql
Since: 1.5.0
inline(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.
将结构数组分解为表。除非另有说明,否则默认使用列名 col1、col2 等。
Examples:
> SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
Since: 2.0.0
inline_outer(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.
将结构数组分解为表。除非另有说明,否则默认使用列名 col1、col2 等。
Examples:
> SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
Since: 2.0.0
input_file_block_length() - Returns the length of the block being read, or -1 if not available.
返回正在读取的块的长度,如果不可用,则返回 -1。
Examples:
> SELECT input_file_block_length();
-1
Since: 2.2.0
input_file_block_start() - Returns the start offset of the block being read, or -1 if not available.
返回正在读取的块的起始偏移量,如果不可用,则返回 -1。
Examples:
> SELECT input_file_block_start();
-1
Since: 2.2.0
input_file_name() - Returns the name of the file being read, or empty string if not available.
返回正在读取的文件的名称,如果不可用,则返回空字符串。
Examples:
> SELECT input_file_name();
Since: 1.5.0
instr(str, substr) - Returns the (1-based) index of the first occurrence of substr
in str
.
返回
str
中第一次出现substr
的(从 1 开始的)索引。
Examples:
> SELECT instr('SparkSQL', 'SQL');
6
Since: 1.5.0
int(expr) - Casts the value expr
to the target data type int
.
将值
expr
转换为目标数据类型int
。
Since: 2.0.1
isnan(expr) - Returns true if expr
is NaN, or false otherwise.
如果
expr
为 NaN,则返回 true,否则返回 false。
Examples:
> SELECT isnan(cast('NaN' as double));
true
Since: 1.5.0
isnotnull(expr) - Returns true if expr
is not null, or false otherwise.
如果
expr
不为 null,则返回 true,否则返回 false。
Examples:
> SELECT isnotnull(1);
true
Since: 1.0.0
isnull(expr) - Returns true if expr
is null, or false otherwise.
如果
expr
为 null,则返回 true,否则返回 false。
Examples:
> SELECT isnull(1);
false
Since: 1.0.0
java_method(class, method[, arg1[, arg2 …]]) - Calls a method with reflection.
使用反射调用方法。
Examples:
> SELECT java_method('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
Since: 2.0.0
json_array_length(jsonArray) - Returns the number of elements in the outermost JSON array.
返回最外层JSON数组中的元素数。
Arguments:
NULL
is returned in case of any other valid JSON string, NULL
or an invalid JSON.Examples:
> SELECT json_array_length('[1,2,3,4]');
4
> SELECT json_array_length('[1,2,3,{"f1":1,"f2":[5,6]},4]');
5
> SELECT json_array_length('[1,2');
NULL
Since: 3.1.0
json_object_keys(json_object) - Returns all the keys of the outermost JSON object as an array.
以数组形式返回最外层JSON对象的所有键。
Arguments:
Examples:
> SELECT json_object_keys('{}');
[]
> SELECT json_object_keys('{"key": "value"}');
["key"]
> SELECT json_object_keys('{"f1":"abc","f2":{"f3":"a", "f4":"b"}}');
["f1","f2"]
Since: 3.1.0
json_tuple(jsonStr, p1, p2, …, pn) - Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.
返回一个元组,类似于函数get_json_object,但它有多个名称。所有输入参数和输出列类型都是字符串。
Examples:
> SELECT json_tuple('{"a":1, "b":2}', 'a', 'b');
1 2
Since: 1.6.0
kurtosis(expr) - Returns the kurtosis value calculated from values of a group.
返回根据组的值计算的峰度值。
Examples:
> SELECT kurtosis(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
-0.7014368047529627
> SELECT kurtosis(col) FROM VALUES (1), (10), (100), (10), (1) as tab(col);
0.19432323191699075
Since: 1.6.0
lag(input[, offset[, default]]) - Returns the value of input
at the offset
th row before the current row in the window. The default value of offset
is 1 and the default value of default
is null. If the value of input
at the offset
th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default
is returned.
返回窗口中当前行之前第’offset’行的’input’值。“offset”的默认值为1,“default”的默认值为空。如果第’offset’行的’input’值为null,则返回null。如果没有这样的偏移行(例如,当偏移量为1时,窗口的第一行没有任何前一行),则返回“default”。
Arguments:
offset
rows before the current row.Examples:
> SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 NULL
A1 1 1
A1 2 1
A2 3 NULL
Since: 2.0.0
last(expr[, isIgnoreNull]) - Returns the last value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values
返回一组行的最后一个’expr’值。如果’isIgnoreNull’为true,则只返回非空值
Examples:
> SELECT last(col) FROM VALUES (10), (5), (20) AS tab(col);
20
> SELECT last(col) FROM VALUES (10), (5), (NULL) AS tab(col);
NULL
> SELECT last(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
5
Note:
The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Since: 2.0.0
last_day(date) - Returns the last day of the month which the date belongs to.
返回日期所属月份的最后一天。
Examples:
> SELECT last_day('2009-01-12');
2009-01-31
Since: 1.5.0
last_value(expr[, isIgnoreNull]) - Returns the last value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values
返回一组行的最后一个’expr’值。如果’isIgnoreNull’为true,则只返回非空值
Examples:
> SELECT last_value(col) FROM VALUES (10), (5), (20) AS tab(col);
20
> SELECT last_value(col) FROM VALUES (10), (5), (NULL) AS tab(col);
NULL
> SELECT last_value(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
5
Note:
The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.
Since: 2.0.0
lcase(str) - Returns str
with all characters changed to lowercase.
返回所有字符都改为小写的’str’。
Examples:
> SELECT lcase('SparkSql');
sparksql
Since: 1.0.1
lead(input[, offset[, default]]) - Returns the value of input
at the offset
th row after the current row in the window. The default value of offset
is 1 and the default value of default
is null. If the value of input
at the offset
th row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), default
is returned.
返回窗口中当前行后第’offset’行的’input’值。“offset”的默认值为1,“default”的默认值为空。如果第’offset’行的’input’值为null,则返回null。如果没有这样的偏移行(例如,当偏移量为1时,窗口的最后一行没有任何后续行),则返回“default”。
Arguments:
offset
rows after the current row.Examples:
> SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 2
A1 2 NULL
A2 3 NULL
Since: 2.0.0
least(expr, …) - Returns the least value of all parameters, skipping null values.
返回所有参数中最小的值,跳过空值。
Examples:
> SELECT least(10, 9, 2, 4, 3);
2
Since: 1.5.0
left(str, len) - Returns the leftmost len
(len
can be string type) characters from the string str
,if len
is less or equal than 0 the result is an empty string.
返回字符串’str’中最左边的’len’('len’可以是字符串类型)字符,如果’len’小于或等于0,则结果为空字符串。
Examples:
> SELECT left('Spark SQL', 3);
Spa
Since: 2.3.0
length(expr) - Returns the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
返回字符串数据的字符长度或二进制数据的字节数。字符串数据的长度包括尾随空格。二进制数据的长度包括二进制零。
Examples:
> SELECT length('Spark SQL ');
10
> SELECT CHAR_LENGTH('Spark SQL ');
10
> SELECT CHARACTER_LENGTH('Spark SQL ');
10
Since: 1.5.0
levenshtein(str1, str2) - Returns the Levenshtein distance between the two given strings.
返回两个给定字符串之间的Levenshtein距离。
Examples:
> SELECT levenshtein('kitten', 'sitting');
3
Since: 1.5.0
str like pattern[ ESCAPE escape] - Returns true if str matches pattern
with escape
, null if any arguments are null, false otherwise.
如果str将’pattern’与’escape’匹配,则返回true;如果任何参数为null,则返回null;否则返回false。
Arguments:
str - a string expression
pattern - a string expression. The pattern is a string which is matched literally, with exception to the following special symbols:
_ matches any one character in the input (similar to . in posix regular expressions)
% matches zero or more characters in the input (similar to .* in posix regular expressions)
Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match “\abc”, the pattern should be “\abc”.
When SQL config ‘spark.sql.parser.escapedStringLiterals’ is enabled, it falls back to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match “\abc” should be “\abc”. * escape - an character added since Spark 3.0. The default escape character is the ‘’. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character.
Examples:
> SELECT like('Spark', '_park');
true
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT '%SystemDrive%\Users\John' like '\%SystemDrive\%\\Users%';
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT '%SystemDrive%\\Users\\John' like '\%SystemDrive\%\\\\Users%';
true
> SELECT '%SystemDrive%/Users/John' like '/%SystemDrive/%//Users%' ESCAPE '/';
true
Note:
Use RLIKE to match with standard regular expressions.
Since: 1.0.0
ln(expr) - Returns the natural logarithm (base e) of expr
.
返回’expr’的自然对数(以e为底)。
Examples:
> SELECT ln(1);
0.0
Since: 1.4.0
locate(substr, str[, pos]) - Returns the position of the first occurrence of substr
in str
after position pos
. The given pos
and return value are 1-based.
返回“pos”之后的“str”中第一个出现的“substr”的位置。给定的’pos’和返回值是基于1的。
Examples:
> SELECT locate('bar', 'foobarbar');
4
> SELECT locate('bar', 'foobarbar', 5);
7
> SELECT POSITION('bar' IN 'foobarbar');
4
Since: 1.5.0
log(base, expr) - Returns the logarithm of expr
with base
.
返回’expr’与’base’的对数。
Examples:
> SELECT log(10, 100);
2.0
Since: 1.5.0
log10(expr) - Returns the logarithm of expr
with base 10.
返回以10为底的’expr’的对数。
Examples:
> SELECT log10(10);
1.0
Since: 1.4.0
log1p(expr) - Returns log(1 + expr
).
返回日志(1+
expr
)。
Examples:
> SELECT log1p(0);
0.0
Since: 1.4.0
log2(expr) - Returns the logarithm of expr
with base 2.
返回以2为底的’expr’的对数。
Examples:
> SELECT log2(2);
1.0
Since: 1.4.0
lower(str) - Returns str
with all characters changed to lowercase.
返回所有字符都改为小写的’str’。
Examples:
> SELECT lower('SparkSql');
sparksql
Since: 1.0.1
lpad(str, len[, pad]) - Returns str
, left-padded with pad
to a length of len
. If str
is longer than len
, the return value is shortened to len
characters. If pad
is not specified, str
will be padded to the left with space characters.
返回’str’,左填充’pad’,长度为’len’。如果’str’比’len’长,则返回值将缩短为’len’个字符。如果未指定’pad’,'str’将用空格字符填充到左侧。
Examples:
> SELECT lpad('hi', 5, '??');
???hi
> SELECT lpad('hi', 1, '??');
h
> SELECT lpad('hi', 5);
hi
Since: 1.5.0
ltrim(str) - Removes the leading space characters from str
.
从“str”中删除前导空格字符。
Arguments:
Examples:
> SELECT ltrim(' SparkSQL ');
SparkSQL
Since: 1.5.0
make_date(year, month, day) - Create date from year, month and day fields.
从年、月和日字段创建日期。
Arguments:
Examples:
> SELECT make_date(2013, 7, 15);
2013-07-15
> SELECT make_date(2019, 13, 1);
NULL
> SELECT make_date(2019, 7, NULL);
NULL
> SELECT make_date(2019, 2, 30);
NULL
Since: 3.0.0
make_dt_interval([days[, hours[, mins[, secs]]]]) - Make DayTimeIntervalType duration from days, hours, mins and secs.
将DayTimeIntervalType持续时间设置为天、小时、分钟和秒。
Arguments:
Examples:
> SELECT make_dt_interval(1, 12, 30, 01.001001);
1 12:30:01.001001000
> SELECT make_dt_interval(2);
2 00:00:00.000000000
> SELECT make_dt_interval(100, null, 3);
NULL
Since: 3.2.0
make_interval([years[, months[, weeks[, days[, hours[, mins[, secs]]]]]]]) - Make interval from years, months, weeks, days, hours, mins and secs.
以年、月、周、日、小时、分钟和秒为间隔。
Arguments:
Examples:
> SELECT make_interval(100, 11, 1, 1, 12, 30, 01.001001);
100 years 11 months 8 days 12 hours 30 minutes 1.001001 seconds
> SELECT make_interval(100, null, 3);
NULL
> SELECT make_interval(0, 1, 0, 1, 0, 0, 100.000001);
1 months 1 days 1 minutes 40.000001 seconds
Since: 3.0.0
make_timestamp(year, month, day, hour, min, sec[, timezone]) - Create timestamp from year, month, day, hour, min, sec and timezone fields. The result data type is consistent with the value of configuration spark.sql.timestampType
从年、月、日、小时、分钟、秒和时区字段创建时间戳。结果数据类型与配置’spark’的值一致。sql。时间戳类型`
Arguments:
Examples:
> SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887);
2014-12-28 06:30:45.887
> SELECT make_timestamp(2014, 12, 28, 6, 30, 45.887, 'CET');
2014-12-27 21:30:45.887
> SELECT make_timestamp(2019, 6, 30, 23, 59, 60);
2019-07-01 00:00:00
> SELECT make_timestamp(2019, 6, 30, 23, 59, 1);
2019-06-30 23:59:01
> SELECT make_timestamp(2019, 13, 1, 10, 11, 12, 'PST');
NULL
> SELECT make_timestamp(null, 7, 22, 15, 30, 0);
NULL
Since: 3.0.0
make_ym_interval([years[, months]]) - Make year-month interval from years, months.
从年、月中设置年-月间隔。
Arguments:
Examples:
> SELECT make_ym_interval(1, 2);
1-2
> SELECT make_ym_interval(1, 0);
1-0
> SELECT make_ym_interval(-1, 1);
-0-11
> SELECT make_ym_interval(2);
2-0
Since: 3.2.0
map(key0, value0, key1, value1, …) - Creates a map with the given key/value pairs.
使用给定的键值对创建映射。
Examples:
> SELECT map(1.0, '2', 3.0, '4');
{1.0:"2",3.0:"4"}
Since: 2.0.0
map_concat(map, …) - Returns the union of all the given maps
返回所有给定映射的并集
Examples:
> SELECT map_concat(map(1, 'a', 2, 'b'), map(3, 'c'));
{1:"a",2:"b",3:"c"}
Since: 2.4.0
map_entries(map) - Returns an unordered array of all entries in the given map.
返回给定映射中所有项的无序数组。
Examples:
> SELECT map_entries(map(1, 'a', 2, 'b'));
[{"key":1,"value":"a"},{"key":2,"value":"b"}]
Since: 3.0.0
map_filter(expr, func) - Filters entries in a map using the function.
使用函数过滤地图中的条目。
Examples:
> SELECT map_filter(map(1, 0, 2, 2, 3, -1), (k, v) -> k > v);
{1:0,3:-1}
Since: 3.0.0
map_from_arrays(keys, values) - Creates a map with a pair of the given key/value arrays. All elements in keys should not be null
使用一对给定的键值数组创建一个映射。键中的所有元素都不应为空
Examples:
> SELECT map_from_arrays(array(1.0, 3.0), array('2', '4'));
{1.0:"2",3.0:"4"}
Since: 2.4.0
map_from_entries(arrayOfEntries) - Returns a map created from the given array of entries.
返回从给定的条目数组创建的映射。
Examples:
> SELECT map_from_entries(array(struct(1, 'a'), struct(2, 'b')));
{1:"a",2:"b"}
Since: 2.4.0
map_keys(map) - Returns an unordered array containing the keys of the map.
返回包含映射键的无序数组。
Examples:
> SELECT map_keys(map(1, 'a', 2, 'b'));
[1,2]
Since: 2.0.0
map_values(map) - Returns an unordered array containing the values of the map.
返回包含映射值的无序数组。
Examples:
> SELECT map_values(map(1, 'a', 2, 'b'));
["a","b"]
Since: 2.0.0
map_zip_with(map1, map2, function) - Merges two given maps into a single map by applying function to the pair of values with the same key. For keys only presented in one map, NULL will be passed as the value for the missing key. If an input map contains duplicated keys, only the first entry of the duplicated key is passed into the lambda function.
通过将函数应用于具有相同键的值对,将两个给定映射合并为一个映射。对于仅在一个映射中显示的键,NULL将作为缺少的键的值传递。如果输入映射包含重复的键,则只将重复键的第一个条目传递到lambda函数中。
Examples:
> SELECT map_zip_with(map(1, 'a', 2, 'b'), map(1, 'x', 2, 'y'), (k, v1, v2) -> concat(v1, v2));
{1:"ax",2:"by"}
Since: 3.0.0
max(expr) - Returns the maximum value of expr
.
返回“expr”的最大值。
Examples:
> SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col);
50
Since: 1.0.0
max_by(x, y) - Returns the value of x
associated with the maximum value of y
.
返回与最大值“y”关联的“x”值。
Examples:
> SELECT max_by(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y);
b
Since: 3.0.0
md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr
.
以十六进制字符串“expr”的形式返回MD5 128位校验和。
Examples:
> SELECT md5('Spark');
8cde774d6f7333752ed72cacddb05126
Since: 1.5.0
mean(expr) - Returns the mean calculated from values of a group.
返回根据组的值计算的平均值。
Examples:
> SELECT mean(col) FROM VALUES (1), (2), (3) AS tab(col);
2.0
> SELECT mean(col) FROM VALUES (1), (2), (NULL) AS tab(col);
1.5
Since: 1.0.0
min(expr) - Returns the minimum value of expr
.
返回“expr”的最小值。
Examples:
> SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col);
-1
Since: 1.0.0
min_by(x, y) - Returns the value of x
associated with the minimum value of y
.
返回与最小值“y”关联的“x”值。
Examples:
> SELECT min_by(x, y) FROM VALUES (('a', 10)), (('b', 50)), (('c', 20)) AS tab(x, y);
a
Since: 3.0.0
minute(timestamp) - Returns the minute component of the string/timestamp.
返回传入时间的秒
Examples:
> SELECT minute('2009-07-30 12:58:59');
58
Since: 1.5.0
expr1 mod expr2 - Returns the remainder after expr1
/expr2
.
返回’expr1’`expr2’的余数。
Examples:
> SELECT 2 % 1.8;
0.2
> SELECT MOD(2, 1.8);
0.2
Since: 1.0.0
monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.
返回单调递增的64位整数。生成的ID保证单调递增且唯一,但不是连续的。当前的实现将分区ID放在高31位,低33位代表每个分区内的记录号。假设数据帧的分区少于10亿,每个分区的记录少于80亿条。该函数是不确定的,因为其结果取决于分区ID。
Examples:
> SELECT monotonically_increasing_id();
0
Since: 1.4.0
month(date) - Returns the month component of the date/timestamp.
返回传入时间的月份
Examples:
> SELECT month('2016-07-30');
7
Since: 1.5.0
months_between(timestamp1, timestamp2[, roundOff]) - If timestamp1
is later than timestamp2
, then the result is positive. If timestamp1
and timestamp2
are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.
如果’timestamp1’晚于’timestamp2’,则结果为正。如果’timestamp1’和’timestamp2’在一个月的同一天,或者两者都是一个月的最后一天,则一天中的时间将被忽略。否则,差值按每月31天计算,并四舍五入至8位,除非四舍五入=假。
Examples:
> SELECT months_between('1997-02-28 10:30:00', '1996-10-30');
3.94959677
> SELECT months_between('1997-02-28 10:30:00', '1996-10-30', false);
3.9495967741935485
Since: 1.5.0
named_struct(name1, val1, name2, val2, …) - Creates a struct with the given field names and values.
使用给定的字段名和值创建结构。
Examples:
> SELECT named_struct("a", 1, "b", 2, "c", 3);
{"a":1,"b":2,"c":3}
Since: 1.5.0
nanvl(expr1, expr2) - Returns expr1
if it’s not NaN, or expr2
otherwise.
如果不是NaN,则返回’expr1’,否则返回’expr2’。
Examples:
> SELECT nanvl(cast('NaN' as double), 123);
123.0
Since: 1.5.0
negative(expr) - Returns the negated value of expr
.
返回“expr”的反值。
Examples:
> SELECT negative(1);
-1
Since: 1.0.0
next_day(start_date, day_of_week) - Returns the first date which is later than start_date
and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws IllegalArgumentException if spark.sql.ansi.enabled
is set to true, otherwise NULL.
返回第一个日期,该日期晚于“开始日期”,并按指示命名。如果至少有一个输入参数为NULL,则函数返回NULL。当两个输入参数都不为NULL,且周中的天为无效输入时,如果’spark’,函数将抛出IllegalArgumentException。sql。ansi。enabled`设置为true,否则为NULL。
Examples:
> SELECT next_day('2015-01-14', 'TU');
2015-01-20
Since: 1.5.0
not expr - Logical not.
Examples:
> SELECT not true;
false
> SELECT not false;
true
> SELECT not NULL;
NULL
Since: 1.0.0
now() - Returns the current timestamp at the start of query evaluation.
返回查询计算开始时的当前时间戳。
Examples:
> SELECT now();
2020-04-25 15:49:11.914
Since: 1.6.0
nth_value(input[, offset]) - Returns the value of input
at the row that is the offset
th row from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip nulls when finding the offset
th row. Otherwise, every row counts for the offset
. If there is no such an offset
th row (e.g., when the offset is 10, size of the window frame is less than 10), null is returned.
返回从窗口框架开始的第’offset’行的’input’值。偏移量从1开始。如果ignoreNulls=true,我们将在查找第’offset’行时跳过nulls。否则,每一行都算作“偏移量”。如果没有这样的“offset”第行(例如,当offset为10时,窗口帧的大小小于10),则返回null。
Arguments:
Examples:
> SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 1
A2 3 NULL
Since: 3.1.0
ntile(n) - Divides the rows for each window partition into n
buckets ranging from 1 to at most n
.
将每个窗口分区的行划分为’n’个bucket,范围从1到至多’n’。
Arguments:
Examples:
> SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 2
A2 3 1
Since: 2.0.0
nullif(expr1, expr2) - Returns null if expr1
equals to expr2
, or expr1
otherwise.
如果’expr1’等于’expr2’,则返回null,否则返回’expr1’。
Examples:
> SELECT nullif(2, 2);
NULL
Since: 2.0.0
nvl(expr1, expr2) - Returns expr2
if expr1
is null, or expr1
otherwise.
Returns
expr2
ifexpr1
is null, orexpr1
otherwise.
Examples:
> SELECT nvl(NULL, array('2'));
["2"]
Since: 2.0.0
nvl2(expr1, expr2, expr3) - Returns expr2
if expr1
is not null, or expr3
otherwise.
如果’expr1’不为空,则返回’expr2’,否则返回’expr3’。
Examples:
> SELECT nvl2(NULL, 2, 1);
1
Since: 2.0.0
octet_length(expr) - Returns the byte length of string data or number of bytes of binary data.
返回字符串数据的字节长度或二进制数据的字节数。
Examples:
> SELECT octet_length('Spark SQL');
9
Since: 2.3.0
expr1 or expr2 - Logical OR.
Examples:
> SELECT true or false;
true
> SELECT false or false;
false
> SELECT true or NULL;
true
> SELECT false or NULL;
NULL
Since: 1.0.0
overlay(input, replace, pos[, len]) - Replace input
with replace
that starts at pos
and is of length len
.
将’input’替换为从’pos’开始、长度为’len’的’Replace’。
Examples:
> SELECT overlay('Spark SQL' PLACING '_' FROM 6);
Spark_SQL
> SELECT overlay('Spark SQL' PLACING 'CORE' FROM 7);
Spark CORE
> SELECT overlay('Spark SQL' PLACING 'ANSI ' FROM 7 FOR 0);
Spark ANSI SQL
> SELECT overlay('Spark SQL' PLACING 'tructured' FROM 2 FOR 4);
Structured SQL
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('_', 'utf-8') FROM 6);
Spark_SQL
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('CORE', 'utf-8') FROM 7);
Spark CORE
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('ANSI ', 'utf-8') FROM 7 FOR 0);
Spark ANSI SQL
> SELECT overlay(encode('Spark SQL', 'utf-8') PLACING encode('tructured', 'utf-8') FROM 2 FOR 4);
Structured SQL
Since: 3.0.0
parse_url(url, partToExtract[, key]) - Extracts a part from a URL.
从URL中提取一部分。
Examples:
> SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST');
spark.apache.org
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY');
query=1
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query');
1
Since: 2.0.0
percent_rank() - Computes the percentage ranking of a value in a group of values.
计算一个值在一组值中的百分比排名。
Arguments:
Examples:
> SELECT a, b, percent_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 0.0
A1 1 0.0
A1 2 1.0
A2 3 0.0
Since: 2.0.0
percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric column col
at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral
返回给定百分比下数值列’col’的精确百分位值。百分比的值必须介于0.0和1.0之间。频率值应为正积分
percentile(col, array(percentage1 [, percentage2]…) [, frequency]) - Returns the exact percentile value array of numeric column col
at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
返回给定百分比下数值列“col”的精确百分位值数组。百分比数组的每个值必须介于0.0和1.0之间。频率值应为正积分
Examples:
> SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col);
3.0
> SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col);
[2.5,7.5]
Since: 2.1.0
percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile
of the numeric column col
which is the smallest value in the ordered col
values (sorted from least to greatest) such that no more than percentage
of col
values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy
yields better accuracy, 1.0/accuracy
is the relative error of the approximation. When percentage
is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col
at the given percentage array.
返回数值列’col’的近似’percentile’,它是有序’col’值(从最小到最大排序)中的最小值,因此’col’值中小于或等于该值的百分比不超过’percentile’。百分比的值必须介于0.0和1.0之间。默认情况下,`literal’控制精度为10000的数值近似值。“精度”的值越高,精度越好,“1.0精度”是近似值的相对误差。当’percentage’是一个数组时,percentage数组的每个值必须介于0.0和1.0之间。在本例中,返回给定百分比数组中“col”列的近似百分比数组。
Examples:
> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
[1,1,0]
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
7
Since: 2.1.0
pi() - Returns pi.
返回π
Examples:
> SELECT pi();
3.141592653589793
Since: 1.5.0
pmod(expr1, expr2) - Returns the positive value of expr1
mod expr2
.
返回’expr1’mod’expr2’的正值。
Examples:
> SELECT pmod(10, 3);
1
> SELECT pmod(-10, 3);
2
Since: 1.5.0
posexplode(expr) - Separates the elements of array expr
into multiple rows with positions, or the elements of map expr
into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos
for position, col
for elements of the array or key
and value
for elements of the map.
将数组“expr”的元素分隔为多个带位置的行,或将map“expr”的元素分隔为多个带位置的行和列。除非另有规定,否则使用列名’pos’表示位置,'col’表示数组元素,或使用’key’和’value’表示映射元素。
Examples:
> SELECT posexplode(array(10,20));
0 10
1 20
Since: 2.0.0
posexplode_outer(expr) - Separates the elements of array expr
into multiple rows with positions, or the elements of map expr
into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos
for position, col
for elements of the array or key
and value
for elements of the map.
将数组“expr”的元素分隔为多个带位置的行,或将map“expr”的元素分隔为多个带位置的行和列。除非另有规定,否则使用列名’pos’表示位置,'col’表示数组元素,或使用’key’和’value’表示映射元素。
Examples:
> SELECT posexplode_outer(array(10,20));
0 10
1 20
Since: 2.0.0
position(substr, str[, pos]) - Returns the position of the first occurrence of substr
in str
after position pos
. The given pos
and return value are 1-based.
返回“pos”之后的“str”中第一个出现的“substr”的位置。给定的’pos’和返回值是基于1的。
Examples:
> SELECT position('bar', 'foobarbar');
4
> SELECT position('bar', 'foobarbar', 5);
7
> SELECT POSITION('bar' IN 'foobarbar');
4
Since: 1.5.0
positive(expr) - Returns the value of expr
.
返回’expr’的值。
Examples:
> SELECT positive(1);
1
Since: 1.5.0
pow(expr1, expr2) - Raises expr1
to the power of expr2
.
expr1
的expr2
次幂
Examples:
> SELECT pow(2, 3);
8.0
Since: 1.4.0
power(expr1, expr2) - Raises expr1
to the power of expr2
.
expr1
的expr2
次幂
Examples:
> SELECT power(2, 3);
8.0
Since: 1.4.0
printf(strfmt, obj, …) - Returns a formatted string from printf-style format strings.
从printf样式的格式字符串返回格式化字符串。
Examples:
> SELECT printf("Hello World %d %s", 100, "days");
Hello World 100 days
Since: 1.5.0
quarter(date) - Returns the quarter of the year for date, in the range 1 to 4.
返回日期的年度季度,范围为1到4。
Examples:
> SELECT quarter('2016-08-31');
3
Since: 1.5.0
radians(expr) - Converts degrees to radians.
将度转换为弧度。
Arguments:
Examples:
> SELECT radians(180);
3.141592653589793
Since: 1.4.0
raise_error(expr) - Throws an exception with expr
.
抛出带有“expr”的异常。
Examples:
> SELECT raise_error('custom error message');
java.lang.RuntimeException
custom error message
Since: 3.1.0
rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
返回一个随机值,其中[0,1]中的值独立且均匀分布(i.i.d.)。
Examples:
> SELECT rand();
0.9629742951434543
> SELECT rand(0);
0.7604953758285915
> SELECT rand(null);
0.7604953758285915
Note:
The function is non-deterministic in general case.
Since: 1.5.0
randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.
返回一个随机值,其中包含从标准正态分布中提取的独立且同分布(i.i.d.)值。
Examples:
> SELECT randn();
-0.3254147983080288
> SELECT randn(0);
1.6034991609278433
> SELECT randn(null);
1.6034991609278433
Note:
The function is non-deterministic in general case.
Since: 1.5.0
random([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
返回一个随机值,其中[0,1]中的值独立且均匀分布(i.i.d.)。
Examples:
> SELECT random();
0.9629742951434543
> SELECT random(0);
0.7604953758285915
> SELECT random(null);
0.7604953758285915
Note:
The function is non-deterministic in general case.
Since: 1.5.0
rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.
计算一个值在一组值中的排名。结果是1加上分区顺序中当前行之前或等于当前行的行数。这些值将在序列中产生间隙。
Arguments:
Examples:
> SELECT a, b, rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 3
A2 3 1
Since: 2.0.0
reflect(class, method[, arg1[, arg2 …]]) - Calls a method with reflection.
调用带有反射的方法。
Examples:
> SELECT reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
Since: 2.0.0
regexp(str, regexp) - Returns true if str
matches regexp
, or false otherwise.
如果’str’与’regexp’匹配,则返回true,否则返回false。
Arguments:
str - a string expression
regexp - a string expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match “\abc”, a regular expression for regexp
can be “^\abc$”.
There is a SQL config ‘spark.sql.parser.escapedStringLiterals’ that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match “\abc” is “^\abc$”.
Examples:
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT regexp('%SystemDrive%\Users\John', '%SystemDrive%\\Users.*');
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT regexp('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*');
true
Note:
Use LIKE to match with simple string pattern.
Since: 3.2.0
regexp_extract(str, regexp[, idx]) - Extract the first string in the str
that match the regexp
expression and corresponding to the regex group index.
提取’str’中与’regexp’表达式匹配并与regex组索引对应的第一个字符串。
Arguments:
str - a string expression.
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match “\abc”, a regular expression for regexp
can be “^\abc$”.
There is a SQL config ‘spark.sql.parser.escapedStringLiterals’ that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match “\abc” is “^\abc$”. * idx - an integer expression that representing the group index. The regex maybe contains multiple groups. idx
indicates which regex group to extract. The group index should be non-negative. The minimum value of idx
is 0, which means matching the entire regular expression. If idx
is not specified, the default group index value is 1. The idx
parameter is the Java regex Matcher group() method index.
Examples:
> SELECT regexp_extract('100-200', '(\\d+)-(\\d+)', 1);
100
Since: 1.5.0
regexp_extract_all(str, regexp[, idx]) - Extract all strings in the str
that match the regexp
expression and corresponding to the regex group index.
提取’str’中与’regexp’表达式匹配并与regex组索引对应的所有字符串。
Arguments:
str - a string expression.
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match “\abc”, a regular expression for regexp
can be “^\abc$”.
There is a SQL config ‘spark.sql.parser.escapedStringLiterals’ that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match “\abc” is “^\abc$”. * idx - an integer expression that representing the group index. The regex may contains multiple groups. idx
indicates which regex group to extract. The group index should be non-negative. The minimum value of idx
is 0, which means matching the entire regular expression. If idx
is not specified, the default group index value is 1. The idx
parameter is the Java regex Matcher group() method index.
Examples:
> SELECT regexp_extract_all('100-200, 300-400', '(\\d+)-(\\d+)', 1);
["100","300"]
Since: 3.1.0
regexp_like(str, regexp) - Returns true if str
matches regexp
, or false otherwise.
如果’str’与’regexp’匹配,则返回true,否则返回false。
Arguments:
str - a string expression
regexp - a string expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match “\abc”, a regular expression for regexp
can be “^\abc$”.
There is a SQL config ‘spark.sql.parser.escapedStringLiterals’ that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match “\abc” is “^\abc$”.
Examples:
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT regexp_like('%SystemDrive%\Users\John', '%SystemDrive%\\Users.*');
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT regexp_like('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*');
true
Note:
Use LIKE to match with simple string pattern.
Since: 3.2.0
regexp_replace(str, regexp, rep[, position]) - Replaces all substrings of str
that match regexp
with rep
.
将与“regexp”匹配的所有“str”子字符串替换为“rep”。
Arguments:
str - a string expression to search for a regular expression pattern match.
regexp - a string representing a regular expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match “\abc”, a regular expression for regexp
can be “^\abc$”.
There is a SQL config ‘spark.sql.parser.escapedStringLiterals’ that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match “\abc” is “^\abc$”. * rep - a string expression to replace matched substrings. * position - a positive integer literal that indicates the position within str
to begin searching. The default is 1. If position is greater than the number of characters in str
, the result is str
.
Examples:
> SELECT regexp_replace('100-200', '(\\d+)', 'num');
num-num
Since: 1.5.0
repeat(str, n) - Returns the string which repeats the given string value n times.
返回将给定字符串值重复n次的字符串。
Examples:
> SELECT repeat('123', 2);
123123
Since: 1.5.0
replace(str, search[, replace]) - Replaces all occurrences of search
with replace
.
将str字符串中的
search
替换为replace
Arguments:
search
is not found in str
, str
is returned unchanged.replace
is not specified or is an empty string, nothing replaces the string that is removed from str
.Examples:
> SELECT replace('ABCabc', 'abc', 'DEF');
ABCDEF
Since: 2.3.0
reverse(array) - Returns a reversed string or an array with reverse order of elements.
返回
array
的反转数组
Examples:
> SELECT reverse('Spark SQL');
LQS krapS
> SELECT reverse(array(2, 1, 4, 3));
[3,4,1,2]
Note:
Reverse logic for arrays is available since 2.4.0.
Since: 1.5.0
right(str, len) - Returns the rightmost len
(len
can be string type) characters from the string str
,if len
is less or equal than 0 the result is an empty string.
返回字符串’str’中最右边的’len’('len’可以是字符串类型)字符,如果’len’小于或等于0,则结果为空字符串。
Examples:
> SELECT right('Spark SQL', 3);
SQL
Since: 2.3.0
rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
返回与参数值最接近且等于数学整数的双精度值。
Examples:
> SELECT rint(12.3456);
12.0
Since: 1.4.0
rlike(str, regexp) - Returns true if str
matches regexp
, or false otherwise.
如果
str
中包含regexp
就返回True否则就返回false
Arguments:
str - a string expression
regexp - a string expression. The regex string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match “\abc”, a regular expression for regexp
can be “^\abc$”.
There is a SQL config ‘spark.sql.parser.escapedStringLiterals’ that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match “\abc” is “^\abc$”.
Examples:
> SET spark.sql.parser.escapedStringLiterals=true;
spark.sql.parser.escapedStringLiterals true
> SELECT rlike('%SystemDrive%\Users\John', '%SystemDrive%\\Users.*');
true
> SET spark.sql.parser.escapedStringLiterals=false;
spark.sql.parser.escapedStringLiterals false
> SELECT rlike('%SystemDrive%\\Users\\John', '%SystemDrive%\\\\Users.*');
true
Note:
Use LIKE to match with simple string pattern.
Since: 1.0.0
round(expr, d) - Returns expr
rounded to d
decimal places using HALF_UP rounding mode.
保留传入
expr
数字d
长度的小数
Examples:
> SELECT round(2.5, 0);
3
Since: 1.5.0
row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
按照行号排序(窗口函数)
Examples:
> SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 2
A1 2 3
A2 3 1
Since: 2.0.0
rpad(str, len[, pad]) - Returns str
, right-padded with pad
to a length of len
. If str
is longer than len
, the return value is shortened to len
characters. If pad
is not specified, str
will be padded to the right with space characters.
Examples:
> SELECT rpad('hi', 5, '??');
hi???
> SELECT rpad('hi', 1, '??');
h
> SELECT rpad('hi', 5);
hi
Since: 1.5.0
rtrim(str) - Removes the trailing space characters from str
.
去除
str
右边的空格
Arguments:
Examples:
> SELECT rtrim(' SparkSQL ');
SparkSQL
Since: 1.5.0
schema_of_csv(csv[, options]) - Returns schema in the DDL format of CSV string.
以CSV字符串的DDL格式返回架构。
Examples:
> SELECT schema_of_csv('1,abc');
STRUCT<`_c0`: INT, `_c1`: STRING>
Since: 3.0.0
schema_of_json(json[, options]) - Returns schema in the DDL format of JSON string.
以JSON字符串的DDL格式返回架构。
Examples:
> SELECT schema_of_json('[{"col":0}]');
ARRAY>
> SELECT schema_of_json('[{"col":01}]', map('allowNumericLeadingZeros', 'true'));
ARRAY>
Since: 2.4.0
second(timestamp) - Returns the second component of the string/timestamp.
返回传入时间的分钟
Examples:
> SELECT second('2009-07-30 12:58:59');
59
Since: 1.5.0
sentences(str[, lang, country]) - Splits str
into an array of array of words.
将传入的字符串切割成多个数组
Examples:
> SELECT sentences('Hi there! Good morning.');
[["Hi","there"],["Good","morning"]]
Since: 2.0.0
sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.
生成区间范围内的数组
Supported types are: byte, short, integer, long, date, timestamp.
The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the ‘date’ or ‘timestamp’ type then the step expression must resolve to the ‘interval’ or ‘year-month interval’ or ‘day-time interval’ type, otherwise to the same type as the start and stop expressions.
Arguments:
Examples:
> SELECT sequence(1, 5);
[1,2,3,4,5]
> SELECT sequence(5, 1);
[5,4,3,2,1]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);
[2018-01-01,2018-02-01,2018-03-01]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month);
[2018-01-01,2018-02-01,2018-03-01]
Since: 2.4.0
session_window(time_column, gap_duration) - Generates session window given a timestamp specifying column and gap duration. See ‘Types of time windows’ in Structured Streaming guide doc for detailed explanation and examples.
Arguments:
Examples:
> SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, session_window(b, '5 minutes') ORDER BY a, start;
A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2
A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1
A2 2021-01-01 00:01:00 2021-01-01 00:06:00 1
> SELECT a, session_window.start, session_window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:10:00'), ('A2', '2021-01-01 00:01:00'), ('A2', '2021-01-01 00:04:30') AS tab(a, b) GROUP by a, session_window(b, CASE WHEN a = 'A1' THEN '5 minutes' WHEN a = 'A2' THEN '1 minute' ELSE '10 minutes' END) ORDER BY a, start;
A1 2021-01-01 00:00:00 2021-01-01 00:09:30 2
A1 2021-01-01 00:10:00 2021-01-01 00:15:00 1
A2 2021-01-01 00:01:00 2021-01-01 00:02:00 1
A2 2021-01-01 00:04:30 2021-01-01 00:05:30 1
Since: 3.2.0
sha(expr) - Returns a sha1 hash value as a hex string of the expr
.
Examples:
> SELECT sha('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
Since: 1.5.0
sha1(expr) - Returns a sha1 hash value as a hex string of the expr
.
Examples:
> SELECT sha1('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
Since: 1.5.0
sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr
. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.
Examples:
> SELECT sha2('Spark', 256);
529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b
Since: 1.5.0
shiftleft(base, expr) - Bitwise left shift.
Examples:
> SELECT shiftleft(2, 1);
4
Since: 1.5.0
shiftright(base, expr) - Bitwise (signed) right shift.
Examples:
> SELECT shiftright(4, 1);
2
Since: 1.5.0
shiftrightunsigned(base, expr) - Bitwise unsigned right shift.
Examples:
> SELECT shiftrightunsigned(4, 1);
2
Since: 1.5.0
shuffle(array) - Returns a random permutation of the given array.
Examples:
> SELECT shuffle(array(1, 20, 3, 5));
[3,1,5,20]
> SELECT shuffle(array(1, 20, null, 3));
[20,null,3,1]
Note:
The function is non-deterministic.
Since: 2.4.0
sign(expr) - Returns -1.0, 0.0 or 1.0 as expr
is negative, 0 or positive.
Examples:
> SELECT sign(40);
1.0
Since: 1.4.0
signum(expr) - Returns -1.0, 0.0 or 1.0 as expr
is negative, 0 or positive.
Examples:
> SELECT signum(40);
1.0
Since: 1.4.0
sin(expr) - Returns the sine of expr
, as if computed by java.lang.Math.sin
.
Arguments:
Examples:
> SELECT sin(0);
0.0
Since: 1.4.0
sinh(expr) - Returns hyperbolic sine of expr
, as if computed by java.lang.Math.sinh
.
Arguments:
Examples:
> SELECT sinh(0);
0.0
Since: 1.4.0
size(expr) - Returns the size of an array or a map. The function returns null for null input if spark.sql.legacy.sizeOfNull is set to false or spark.sql.ansi.enabled is set to true. Otherwise, the function returns -1 for null input. With the default settings, the function returns -1 for null input.
Examples:
> SELECT size(array('b', 'd', 'c', 'a'));
4
> SELECT size(map('a', 1, 'b', 2));
2
> SELECT size(NULL);
-1
Since: 1.5.0
skewness(expr) - Returns the skewness value calculated from values of a group.
Examples:
> SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
1.1135657469022011
> SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col);
-1.1135657469022011
Since: 1.6.0
slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.
Examples:
> SELECT slice(array(1, 2, 3, 4), 2, 2);
[2,3]
> SELECT slice(array(1, 2, 3, 4), -2, 2);
[3,4]
Since: 2.4.0
smallint(expr) - Casts the value expr
to the target data type smallint
.
Since: 2.0.1
some(expr) - Returns true if at least one value of expr
is true.
Examples:
> SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col);
true
> SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col);
true
> SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col);
false
Since: 3.0.0
sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
Examples:
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), true);
[null,"a","b","c","d"]
Since: 1.5.0
soundex(str) - Returns Soundex code of the string.
Examples:
> SELECT soundex('Miller');
M460
Since: 1.5.0
space(n) - Returns a string consisting of n
spaces.
Examples:
> SELECT concat(space(2), '1');
1
Since: 1.5.0
spark_partition_id() - Returns the current partition id.
Examples:
> SELECT spark_partition_id();
0
Since: 1.4.0
split(str, regex, limit) - Splits str
around occurrences that match regex
and returns an array with a length of at most limit
Arguments:
limit
, and the resulting array’s last entry will contain all input beyond the last matched regex.regex
will be applied as many times as possible, and the resulting array can be of any size.Examples:
> SELECT split('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', -1);
["one","two","three",""]
> SELECT split('oneAtwoBthreeC', '[ABC]', 2);
["one","twoBthreeC"]
Since: 1.5.0
sqrt(expr) - Returns the square root of expr
.
Examples:
> SELECT sqrt(4);
2.0
Since: 1.1.1
stack(n, expr1, …, exprk) - Separates expr1
, …, exprk
into n
rows. Uses column names col0, col1, etc. by default unless specified otherwise.
Examples:
> SELECT stack(2, 1, 2, 3);
1 2
3 NULL
Since: 2.0.0
std(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
> SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
stddev(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
> SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.
Examples:
> SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
0.816496580927726
Since: 1.6.0
stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.
Examples:
> SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ‘,’ for pairDelim
and ‘:’ for keyValueDelim
. Both pairDelim
and keyValueDelim
are treated as regular expressions.
Examples:
> SELECT str_to_map('a:1,b:2,c:3', ',', ':');
{"a":"1","b":"2","c":"3"}
> SELECT str_to_map('a');
{"a":null}
Since: 2.0.1
string(expr) - Casts the value expr
to the target data type string
.
Since: 2.0.1
struct(col1, col2, col3, …) - Creates a struct with the given field values.
Examples:
> SELECT struct(1, 2, 3);
{"col1":1,"col2":2,"col3":3}
Since: 1.4.0
substr(str, pos[, len]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
substr(str FROM pos[ FOR len]]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
Examples:
> SELECT substr('Spark SQL', 5);
k SQL
> SELECT substr('Spark SQL', -3);
SQL
> SELECT substr('Spark SQL', 5, 1);
k
> SELECT substr('Spark SQL' FROM 5);
k SQL
> SELECT substr('Spark SQL' FROM -3);
SQL
> SELECT substr('Spark SQL' FROM 5 FOR 1);
k
Since: 1.5.0
substring(str, pos[, len]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
substring(str FROM pos[ FOR len]]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
Examples:
> SELECT substring('Spark SQL', 5);
k SQL
> SELECT substring('Spark SQL', -3);
SQL
> SELECT substring('Spark SQL', 5, 1);
k
> SELECT substring('Spark SQL' FROM 5);
k SQL
> SELECT substring('Spark SQL' FROM -3);
SQL
> SELECT substring('Spark SQL' FROM 5 FOR 1);
k
Since: 1.5.0
substring_index(str, delim, count) - Returns the substring from str
before count
occurrences of the delimiter delim
. If count
is positive, everything to the left of the final delimiter (counting from the left) is returned. If count
is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim
.
Examples:
> SELECT substring_index('www.apache.org', '.', 2);
www.apache
Since: 1.5.0
sum(expr) - Returns the sum calculated from values of a group.
Examples:
> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
30
> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
25
> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
NULL
Since: 1.0.0
tan(expr) - Returns the tangent of expr
, as if computed by java.lang.Math.tan
.
Arguments:
Examples:
> SELECT tan(0);
0.0
Since: 1.4.0
tanh(expr) - Returns the hyperbolic tangent of expr
, as if computed by java.lang.Math.tanh
.
Arguments:
Examples:
> SELECT tanh(0);
0.0
Since: 1.4.0
timestamp(expr) - Casts the value expr
to the target data type timestamp
.
Since: 2.0.1
timestamp_micros(microseconds) - Creates timestamp from the number of microseconds since UTC epoch.
Examples:
> SELECT timestamp_micros(1230219000123123);
2008-12-25 07:30:00.123123
Since: 3.1.0
timestamp_millis(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch.
Examples:
> SELECT timestamp_millis(1230219000123);
2008-12-25 07:30:00.123
Since: 3.1.0
timestamp_seconds(seconds) - Creates timestamp from the number of seconds (can be fractional) since UTC epoch.
Examples:
> SELECT timestamp_seconds(1230219000);
2008-12-25 07:30:00
> SELECT timestamp_seconds(1230219000.123);
2008-12-25 07:30:00.123
Since: 3.1.0
tinyint(expr) - Casts the value expr
to the target data type tinyint
.
Since: 2.0.1
to_csv(expr[, options]) - Returns a CSV string with a given struct value
Examples:
> SELECT to_csv(named_struct('a', 1, 'b', 2));
1,2
> SELECT to_csv(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
26/08/2015
Since: 3.0.0
to_date(date_str[, fmt]) - Parses the date_str
expression with the fmt
expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt
is omitted.
Arguments:
Examples:
> SELECT to_date('2009-07-30 04:17:52');
2009-07-30
> SELECT to_date('2016-12-31', 'yyyy-MM-dd');
2016-12-31
Since: 1.5.0
to_json(expr[, options]) - Returns a JSON string with a given struct value
Examples:
> SELECT to_json(named_struct('a', 1, 'b', 2));
{"a":1,"b":2}
> SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
{"time":"26/08/2015"}
> SELECT to_json(array(named_struct('a', 1, 'b', 2)));
[{"a":1,"b":2}]
> SELECT to_json(map('a', named_struct('b', 1)));
{"a":{"b":1}}
> SELECT to_json(map(named_struct('a', 1),named_struct('b', 2)));
{"[1]":{"b":2}}
> SELECT to_json(map('a', 1));
{"a":1}
> SELECT to_json(array((map('a', 1))));
[{"a":1}]
Since: 2.2.0
to_timestamp(timestamp_str[, fmt]) - Parses the timestamp_str
expression with the fmt
expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt
is omitted. The result data type is consistent with the value of configuration spark.sql.timestampType
.
Arguments:
Examples:
> SELECT to_timestamp('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
Since: 2.2.0
to_unix_timestamp(timeExp[, fmt]) - Returns the UNIX timestamp of the given time.
Arguments:
timeExp
is not a string. Default value is “yyyy-MM-dd HH:mm:ss”. See Datetime Patterns for valid date and time format patterns.Examples:
> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460098800
Since: 1.6.0
to_utc_timestamp(timestamp, timezone) - Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.
Examples:
> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
Since: 1.5.0
transform(expr, func) - Transforms elements in an array using the function.
Examples:
> SELECT transform(array(1, 2, 3), x -> x + 1);
[2,3,4]
> SELECT transform(array(1, 2, 3), (x, i) -> x + i);
[1,3,5]
Since: 2.4.0
transform_keys(expr, func) - Transforms elements in a map using the function.
Examples:
> SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + 1);
{2:1,3:2,4:3}
> SELECT transform_keys(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
{2:1,4:2,6:3}
Since: 3.0.0
transform_values(expr, func) - Transforms values in the map using the function.
Examples:
> SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> v + 1);
{1:2,2:3,3:4}
> SELECT transform_values(map_from_arrays(array(1, 2, 3), array(1, 2, 3)), (k, v) -> k + v);
{1:2,2:4,3:6}
Since: 3.0.0
translate(input, from, to) - Translates the input
string by replacing the characters present in the from
string with the corresponding characters in the to
string.
Examples:
> SELECT translate('AaBbCc', 'abc', '123');
A1B2C3
Since: 1.5.0
trim(str) - Removes the leading and trailing space characters from str
.
trim(BOTH FROM str) - Removes the leading and trailing space characters from str
.
trim(LEADING FROM str) - Removes the leading space characters from str
.
trim(TRAILING FROM str) - Removes the trailing space characters from str
.
trim(trimStr FROM str) - Remove the leading and trailing trimStr
characters from str
.
trim(BOTH trimStr FROM str) - Remove the leading and trailing trimStr
characters from str
.
trim(LEADING trimStr FROM str) - Remove the leading trimStr
characters from str
.
trim(TRAILING trimStr FROM str) - Remove the trailing trimStr
characters from str
.
Arguments:
Examples:
> SELECT trim(' SparkSQL ');
SparkSQL
> SELECT trim(BOTH FROM ' SparkSQL ');
SparkSQL
> SELECT trim(LEADING FROM ' SparkSQL ');
SparkSQL
> SELECT trim(TRAILING FROM ' SparkSQL ');
SparkSQL
> SELECT trim('SL' FROM 'SSparkSQLS');
parkSQ
> SELECT trim(BOTH 'SL' FROM 'SSparkSQLS');
parkSQ
> SELECT trim(LEADING 'SL' FROM 'SSparkSQLS');
parkSQLS
> SELECT trim(TRAILING 'SL' FROM 'SSparkSQLS');
SSparkSQ
Since: 1.5.0
trunc(date, fmt) - Returns date
with the time portion of the day truncated to the unit specified by the format model fmt
.
Arguments:
date
falls indate
falls indate
falls indate
falls inExamples:
> SELECT trunc('2019-08-04', 'week');
2019-07-29
> SELECT trunc('2019-08-04', 'quarter');
2019-07-01
> SELECT trunc('2009-02-12', 'MM');
2009-02-01
> SELECT trunc('2015-10-27', 'YEAR');
2015-01-01
Since: 1.5.0
try_add(expr1, expr2) - Returns the sum of expr1
and expr2
and the result is null on overflow. The acceptable input types are the same with the +
operator.
Examples:
> SELECT try_add(1, 2);
3
> SELECT try_add(2147483647, 1);
NULL
> SELECT try_add(date'2021-01-01', 1);
2021-01-02
> SELECT try_add(date'2021-01-01', interval 1 year);
2022-01-01
> SELECT try_add(timestamp'2021-01-01 00:00:00', interval 1 day);
2021-01-02 00:00:00
> SELECT try_add(interval 1 year, interval 2 year);
3-0
Since: 3.2.0
try_divide(dividend, divisor) - Returns dividend
/divisor
. It always performs floating point division. Its result is always null if expr2
is 0. dividend
must be a numeric or an interval. divisor
must be a numeric.
Examples:
> SELECT try_divide(3, 2);
1.5
> SELECT try_divide(2L, 2L);
1.0
> SELECT try_divide(1, 0);
NULL
> SELECT try_divide(interval 2 month, 2);
0-1
> SELECT try_divide(interval 2 month, 0);
NULL
Since: 3.2.0
typeof(expr) - Return DDL-formatted type string for the data type of the input.
Examples:
> SELECT typeof(1);
int
> SELECT typeof(array(1));
array
Since: 3.0.0
ucase(str) - Returns str
with all characters changed to uppercase.
Examples:
> SELECT ucase('SparkSql');
SPARKSQL
Since: 1.0.1
unbase64(str) - Converts the argument from a base 64 string str
to a binary.
Examples:
> SELECT unbase64('U3BhcmsgU1FM');
Spark SQL
Since: 1.5.0
unhex(expr) - Converts hexadecimal expr
to binary.
Examples:
> SELECT decode(unhex('537061726B2053514C'), 'UTF-8');
Spark SQL
Since: 1.5.0
unix_date(date) - Returns the number of days since 1970-01-01.
Examples:
> SELECT unix_date(DATE("1970-01-02"));
1
Since: 3.1.0
unix_micros(timestamp) - Returns the number of microseconds since 1970-01-01 00:00:00 UTC.
Examples:
> SELECT unix_micros(TIMESTAMP('1970-01-01 00:00:01Z'));
1000000
Since: 3.1.0
unix_millis(timestamp) - Returns the number of milliseconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.
Examples:
> SELECT unix_millis(TIMESTAMP('1970-01-01 00:00:01Z'));
1000
Since: 3.1.0
unix_seconds(timestamp) - Returns the number of seconds since 1970-01-01 00:00:00 UTC. Truncates higher levels of precision.
Examples:
> SELECT unix_seconds(TIMESTAMP('1970-01-01 00:00:01Z'));
1
Since: 3.1.0
unix_timestamp([timeExp[, fmt]]) - Returns the UNIX timestamp of current or specified time.
Arguments:
timeExp
is not a string. Default value is “yyyy-MM-dd HH:mm:ss”. See Datetime Patterns for valid date and time format patterns.Examples:
> SELECT unix_timestamp();
1476884637
> SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460041200
Since: 1.5.0
upper(str) - Returns str
with all characters changed to uppercase.
Examples:
> SELECT upper('SparkSql');
SPARKSQL
Since: 1.0.1
uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
Examples:
> SELECT uuid();
46707d92-02f4-4817-8116-a4c3b23e6266
Note:
The function is non-deterministic.
Since: 2.3.0
var_pop(expr) - Returns the population variance calculated from values of a group.
Examples:
> SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
0.6666666666666666
Since: 1.6.0
var_samp(expr) - Returns the sample variance calculated from values of a group.
Examples:
> SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
variance(expr) - Returns the sample variance calculated from values of a group.
Examples:
> SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col);
1.0
Since: 1.6.0
version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.
Examples:
> SELECT version();
3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40
Since: 3.0.0
weekday(date) - Returns the day of the week for date/timestamp (0 = Monday, 1 = Tuesday, …, 6 = Sunday).
Examples:
> SELECT weekday('2009-07-30');
3
Since: 2.4.0
weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
Examples:
> SELECT weekofyear('2008-02-20');
8
Since: 1.5.0
CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1
= true, returns expr2
; else when expr3
= true, returns expr4
; else returns expr5
.
Arguments:
Examples:
> SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
1.0
> SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
2.0
> SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 END;
NULL
Since: 1.0.1
width_bucket(value, min_value, max_value, num_bucket) - Returns the bucket number to which value
would be assigned in an equiwidth histogram with num_bucket
buckets, in the range min_value
to max_value
."
Examples:
> SELECT width_bucket(5.3, 0.2, 10.6, 5);
3
> SELECT width_bucket(-2.1, 1.3, 3.4, 3);
0
> SELECT width_bucket(8.1, 0.0, 5.7, 4);
5
> SELECT width_bucket(-0.9, 5.2, 0.5, 2);
3
Since: 3.1.0
window(time_column, window_duration[, slide_duration[, start_time]]) - Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. See ‘Window Operations on Event Time’ in Structured Streaming guide doc for detailed explanation and examples.
Arguments:
slide_duration
. Must be less than or equal to the window_duration
. This duration is likewise absolute, and does not vary according to a calendar.start_time
as 15 minutes
.Examples:
> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '5 minutes') ORDER BY a, start;
A1 2021-01-01 00:00:00 2021-01-01 00:05:00 2
A1 2021-01-01 00:05:00 2021-01-01 00:10:00 1
A2 2021-01-01 00:00:00 2021-01-01 00:05:00 1
> SELECT a, window.start, window.end, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a, window(b, '10 minutes', '5 minutes') ORDER BY a, start;
A1 2020-12-31 23:55:00 2021-01-01 00:05:00 2
A1 2021-01-01 00:00:00 2021-01-01 00:10:00 3
A1 2021-01-01 00:05:00 2021-01-01 00:15:00 1
A2 2020-12-31 23:55:00 2021-01-01 00:05:00 1
A2 2021-01-01 00:00:00 2021-01-01 00:10:00 1
Since: 2.0.0
xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.
Examples:
> SELECT xpath('b1b2b3c1 c2 ','a/b/text()');
["b1","b2","b3"]
Since: 2.0.0
xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.
Examples:
> SELECT xpath_boolean('1','a/b');
true
Since: 2.0.0
xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
> SELECT xpath_double('12', 'sum(a/b)');
3.0
Since: 2.0.0
xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
> SELECT xpath_float('12', 'sum(a/b)');
3.0
Since: 2.0.0
xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
> SELECT xpath_int('12', 'sum(a/b)');
3
Since: 2.0.0
xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
> SELECT xpath_long('12', 'sum(a/b)');
3
Since: 2.0.0
xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
> SELECT xpath_number('12', 'sum(a/b)');
3.0
Since: 2.0.0
xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
> SELECT xpath_short('12', 'sum(a/b)');
3
Since: 2.0.0
xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.
Examples:
> SELECT xpath_string('bcc ','a/c');
cc
Since: 2.0.0
xxhash64(expr1, expr2, …) - Returns a 64-bit hash value of the arguments.
Xxhash64(expr1,expr2,…)-返回参数的64位散列值。
Examples:
> SELECT xxhash64('Spark', array(123), 2);
5602566077635097486
Since: 3.0.0
year(date) - Returns the year component of the date/timestamp.
返回日期/时间戳的年份部分。
Examples:
> SELECT year('2016-07-30');
2016
Since: 1.5.0
zip_with(left, right, func) - Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.
Zip _ with (left,right,func)-将给定的两个数组按元素合并为一个数组,使用函数。如果一个数组比较短,则在应用函数之前,在末尾追加空值以匹配较长数组的长度。
Examples:
> SELECT zip_with(array(1, 2, 3), array('a', 'b', 'c'), (x, y) -> (y, x));
[{"y":"a","x":1},{"y":"b","x":2},{"y":"c","x":3}]
> SELECT zip_with(array(1, 2), array(3, 4), (x, y) -> x + y);
[4,6]
> SELECT zip_with(array('a', 'b', 'c'), array('d', 'e', 'f'), (x, y) -> concat(x, y));
["ad","be","cf"]
Since: 2.4.0
expr1 | expr2 - Returns the result of bitwise OR of expr1
and expr2
.
返回结果按位或expr1 expr
Examples:
> SELECT 3 | 5;
7
Since: 1.4.0
expr1 || expr2 - Returns the concatenation of expr1
and expr2
.
Expr1 | | expr2-返回 expr1和 expr2的连接。
Examples:
> SELECT 'Spark' || 'SQL';
SparkSQL
> SELECT array(1, 2, 3) || array(4, 5) || array(6);
[1,2,3,4,5,6]
Note:
|| for arrays is available since 2.4.0.
Since: 2.3.0
~ expr - Returns the result of bitwise NOT of expr
.
返回 expr 的按位 NOT 结果。
Examples:
> SELECT ~ 0;
-1
Since: 1.4.0