! expr - Logical not.
expr1 % expr2 - Returns the remainder after expr1
/expr2
.
Examples:
> SELECT 2 % 1.8;
0.2
> SELECT MOD(2, 1.8);
0.2
expr1 & expr2 - Returns the result of bitwise AND of expr1
and expr2
.
Examples:
> SELECT 3 & 5;
1
expr1 * expr2 - Returns expr1
*expr2
.
Examples:
> SELECT 2 * 3;
6
expr1 + expr2 - Returns expr1
+expr2
.
Examples:
> SELECT 1 + 2;
3
expr1 - expr2 - Returns expr1
-expr2
.
Examples:
> SELECT 2 - 1;
1
expr1 / expr2 - Returns expr1
/expr2
. It always performs floating point division.
Examples:
> SELECT 3 / 2;
1.5
> SELECT 2L / 2L;
1.0
expr1 < expr2 - Returns true if expr1
is less than expr2
.
expr1 <= expr2 - Returns true if expr1
is less than or equal to expr2
.
expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.
expr1 = expr2 - Returns true if expr1
equals expr2
, or false otherwise.
expr1 == expr2 - Returns true if expr1
equals expr2
, or false otherwise.
expr1 > expr2 - Returns true if expr1
is greater than expr2
.
expr1 >= expr2 - Returns true if expr1
is greater than or equal to expr2
.
expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1
and expr2
.
Examples:
> SELECT 3 ^ 5;
2
abs(expr) - Returns the absolute value of the numeric value.
Examples:
> SELECT abs(-1);
1
acos(expr) - Returns the inverse cosine (a.k.a. arccosine) of expr
if -1<=expr
<=1 or NaN otherwise.
Examples:
> SELECT acos(1);
0.0
> SELECT acos(2);
NaN
add_months(start_date, num_months) - Returns the date that is num_months
after start_date
.
Examples:
> SELECT add_months('2016-08-31', 1);
2016-09-30
expr1 and expr2 - Logical AND.
approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD
defines the maximum estimation error allowed.
approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column col
at the given percentage. The value of percentage must be between 0.0 and 1.0. The accuracy
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy
yields better accuracy, 1.0/accuracy
is the relative error of the approximation. When percentage
is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col
at the given percentage array.
Examples:
> SELECT approx_percentile(10.0, array(0.5, 0.4, 0.1), 100);
[10.0,10.0,10.0]
> SELECT approx_percentile(10.0, 0.5, 100);
10.0
array(expr, ...) - Returns an array with the given elements.
Examples:
> SELECT array(1, 2, 3);
[1,2,3]
array_contains(array, value) - Returns true if the array contains the value.
Examples:
> SELECT array_contains(array(1, 2, 3), 2);
true
ascii(str) - Returns the numeric value of the first character of str
.
Examples:
> SELECT ascii('222');
50
> SELECT ascii(2);
50
asin(expr) - Returns the inverse sine (a.k.a. arcsine) the arc sin of expr
if -1<=expr
<=1 or NaN otherwise.
Examples:
> SELECT asin(0);
0.0
> SELECT asin(2);
NaN
assert_true(expr) - Throws an exception if expr
is not true.
Examples:
> SELECT assert_true(0 < 1);
NULL
atan(expr) - Returns the inverse tangent (a.k.a. arctangent).
Examples:
> SELECT atan(0);
0.0
atan2(expr1, expr2) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (expr1
, expr2
).
Examples:
> SELECT atan2(0, 0);
0.0
avg(expr) - Returns the mean calculated from values of a group.
base64(bin) - Converts the argument from a binary bin
to a base 64 string.
Examples:
> SELECT base64('Spark SQL');
U3BhcmsgU1FM
bigint(expr) - Casts the value expr
to the target data type bigint
.
bin(expr) - Returns the string representation of the long value expr
represented in binary.
Examples:
> SELECT bin(13);
1101
> SELECT bin(-13);
1111111111111111111111111111111111111111111111111111111111110011
> SELECT bin(13.3);
1101
binary(expr) - Casts the value expr
to the target data type binary
.
bit_length(expr) - Returns the bit length of expr
or number of bits in binary data.
Examples:
> SELECT bit_length('Spark SQL');
72
boolean(expr) - Casts the value expr
to the target data type boolean
.
bround(expr, d) - Returns expr
rounded to d
decimal places using HALF_EVEN rounding mode.
Examples:
> SELECT bround(2.5, 0);
2.0
cast(expr AS type) - Casts the value expr
to the target data type type
.
Examples:
> SELECT cast('10' as int);
10
cbrt(expr) - Returns the cube root of expr
.
Examples:
> SELECT cbrt(27.0);
3.0
ceil(expr) - Returns the smallest integer not smaller than expr
.
Examples:
> SELECT ceil(-0.1);
0
> SELECT ceil(5);
5
ceiling(expr) - Returns the smallest integer not smaller than expr
.
Examples:
> SELECT ceiling(-0.1);
0
> SELECT ceiling(5);
5
char(expr) - Returns the ASCII character having the binary equivalent to expr
. If n is larger than 256 the result is equivalent to chr(n % 256)
Examples:
> SELECT char(65);
A
char_length(expr) - Returns the character length of expr
or number of bytes in binary data.
Examples:
> SELECT char_length('Spark SQL');
9
> SELECT CHAR_LENGTH('Spark SQL');
9
> SELECT CHARACTER_LENGTH('Spark SQL');
9
character_length(expr) - Returns the character length of expr
or number of bytes in binary data.
Examples:
> SELECT character_length('Spark SQL');
9
> SELECT CHAR_LENGTH('Spark SQL');
9
> SELECT CHARACTER_LENGTH('Spark SQL');
9
chr(expr) - Returns the ASCII character having the binary equivalent to expr
. If n is larger than 256 the result is equivalent to chr(n % 256)
Examples:
> SELECT chr(65);
A
coalesce(expr1, expr2, ...) - Returns the first non-null argument if exists. Otherwise, null.
Examples:
> SELECT coalesce(NULL, 1, NULL);
1
collect_list(expr) - Collects and returns a list of non-unique elements.
collect_set(expr) - Collects and returns a set of unique elements.
concat(str1, str2, ..., strN) - Returns the concatenation of str1, str2, ..., strN.
Examples:
> SELECT concat('Spark', 'SQL');
SparkSQL
concat_ws(sep, [str | array(str)]+) - Returns the concatenation of the strings separated by sep
.
Examples:
> SELECT concat_ws(' ', 'Spark', 'SQL');
Spark SQL
conv(num, from_base, to_base) - Convert num
from from_base
to to_base
.
Examples:
> SELECT conv('100', 2, 10);
4
> SELECT conv(-10, 16, -10);
-16
corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.
cos(expr) - Returns the cosine of expr
.
Examples:
> SELECT cos(0);
1.0
cosh(expr) - Returns the hyperbolic cosine of expr
.
Examples:
> SELECT cosh(0);
1.0
cot(expr) - Returns the cotangent of expr
.
Examples:
> SELECT cot(1);
0.6420926159343306
count(*) - Returns the total number of retrieved rows, including rows containing null.
count(expr) - Returns the number of rows for which the supplied expression is non-null.
count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.
count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch
before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.
covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.
crc32(expr) - Returns a cyclic redundancy check value of the expr
as a bigint.
Examples:
> SELECT crc32('Spark');
1557323817
cume_dist() - Computes the position of a value relative to all values in the partition.
current_database() - Returns the current database.
Examples:
> SELECT current_database();
default
current_date() - Returns the current date at the start of query evaluation.
current_timestamp() - Returns the current timestamp at the start of query evaluation.
date(expr) - Casts the value expr
to the target data type date
.
date_add(start_date, num_days) - Returns the date that is num_days
after start_date
.
Examples:
> SELECT date_add('2016-07-30', 1);
2016-07-31
date_format(timestamp, fmt) - Converts timestamp
to a value of string in the format specified by the date format fmt
.
Examples:
> SELECT date_format('2016-04-08', 'y');
2016
date_sub(start_date, num_days) - Returns the date that is num_days
before start_date
.
Examples:
> SELECT date_sub('2016-07-30', 1);
2016-07-29
datediff(endDate, startDate) - Returns the number of days from startDate
to endDate
.
Examples:
> SELECT datediff('2009-07-31', '2009-07-30');
1
> SELECT datediff('2009-07-30', '2009-07-31');
-1
day(date) - Returns the day of month of the date/timestamp.
Examples:
> SELECT day('2009-07-30');
30
dayofmonth(date) - Returns the day of month of the date/timestamp.
Examples:
> SELECT dayofmonth('2009-07-30');
30
dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday).
Examples:
> SELECT dayofweek('2009-07-30');
5
dayofyear(date) - Returns the day of year of the date/timestamp.
Examples:
> SELECT dayofyear('2016-04-09');
100
decimal(expr) - Casts the value expr
to the target data type decimal
.
decode(bin, charset) - Decodes the first argument using the second argument character set.
Examples:
> SELECT decode(encode('abc', 'utf-8'), 'utf-8');
abc
degrees(expr) - Converts radians to degrees.
Examples:
> SELECT degrees(3.141592653589793);
180.0
dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.
double(expr) - Casts the value expr
to the target data type double
.
e() - Returns Euler's number, e.
Examples:
> SELECT e();
2.718281828459045
elt(n, str1, str2, ...) - Returns the n
-th string, e.g., returns str2
when n
is 2.
Examples:
> SELECT elt(1, 'scala', 'java');
scala
encode(str, charset) - Encodes the first argument using the second argument character set.
Examples:
> SELECT encode('abc', 'utf-8');
abc
exp(expr) - Returns e to the power of expr
.
Examples:
> SELECT exp(0);
1.0
explode(expr) - Separates the elements of array expr
into multiple rows, or the elements of map expr
into multiple rows and columns.
Examples:
> SELECT explode(array(10, 20));
10
20
explode_outer(expr) - Separates the elements of array expr
into multiple rows, or the elements of map expr
into multiple rows and columns.
Examples:
> SELECT explode_outer(array(10, 20));
10
20
expm1(expr) - Returns exp(expr
) - 1.
Examples:
> SELECT expm1(0);
0.0
factorial(expr) - Returns the factorial of expr
. expr
is [0..20]. Otherwise, null.
Examples:
> SELECT factorial(5);
120
find_in_set(str, str_array) - Returns the index (1-based) of the given string (str
) in the comma-delimited list (str_array
). Returns 0, if the string was not found or if the given string (str
) contains a comma.
Examples:
> SELECT find_in_set('ab','abc,b,ab,c,def');
3
first(expr[, isIgnoreNull]) - Returns the first value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values.
first_value(expr[, isIgnoreNull]) - Returns the first value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values.
float(expr) - Casts the value expr
to the target data type float
.
floor(expr) - Returns the largest integer not greater than expr
.
Examples:
> SELECT floor(-0.1);
-1
> SELECT floor(5);
5
format_number(expr1, expr2) - Formats the number expr1
like '#,###,###.##', rounded to expr2
decimal places. If expr2
is 0, the result has no decimal point or fractional part. This is supposed to function like MySQL's FORMAT.
Examples:
> SELECT format_number(12332.123456, 4);
12,332.1235
format_string(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.
Examples:
> SELECT format_string("Hello World %d %s", 100, "days");
Hello World 100 days
from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr
and schema
.
Examples:
> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
{"a":1, "b":0.8}
> SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
{"time":"2015-08-26 00:00:00.0"}
Since: 2.2.0
from_unixtime(unix_time, format) - Returns unix_time
in the specified format
.
Examples:
> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');
1970-01-01 00:00:00
from_utc_timestamp(timestamp, timezone) - Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.
Examples:
> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-31 09:00:00
get_json_object(json_txt, path) - Extracts a json object from path
.
Examples:
> SELECT get_json_object('{"a":"b"}', '$.a');
b
greatest(expr, ...) - Returns the greatest value of all parameters, skipping null values.
Examples:
> SELECT greatest(10, 9, 2, 4, 3);
10
hash(expr1, expr2, ...) - Returns a hash value of the arguments.
Examples:
> SELECT hash('Spark', array(123), 2);
-1321691492
hex(expr) - Converts expr
to hexadecimal.
Examples:
> SELECT hex(17);
11
> SELECT hex('Spark SQL');
537061726B2053514C
hour(timestamp) - Returns the hour component of the string/timestamp.
Examples:
> SELECT hour('2009-07-30 12:58:59');
12
hypot(expr1, expr2) - Returns sqrt(expr1
2 + expr2
2).
Examples:
> SELECT hypot(3, 4);
5.0
if(expr1, expr2, expr3) - If expr1
evaluates to true, then returns expr2
; otherwise returns expr3
.
Examples:
> SELECT if(1 < 2, 'a', 'b');
a
ifnull(expr1, expr2) - Returns expr2
if expr1
is null, or expr1
otherwise.
Examples:
> SELECT ifnull(NULL, array('2'));
["2"]
expr1 in(expr2, expr3, ...) - Returns true if expr
equals to any valN.
initcap(str) - Returns str
with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.
Examples:
> SELECT initcap('sPark sql');
Spark Sql
inline(expr) - Explodes an array of structs into a table.
Examples:
> SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
inline_outer(expr) - Explodes an array of structs into a table.
Examples:
> SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
input_file_block_length() - Returns the length of the block being read, or -1 if not available.
input_file_block_start() - Returns the start offset of the block being read, or -1 if not available.
input_file_name() - Returns the name of the file being read, or empty string if not available.
instr(str, substr) - Returns the (1-based) index of the first occurrence of substr
in str
.
Examples:
> SELECT instr('SparkSQL', 'SQL');
6
int(expr) - Casts the value expr
to the target data type int
.
isnan(expr) - Returns true if expr
is NaN, or false otherwise.
Examples:
> SELECT isnan(cast('NaN' as double));
true
isnotnull(expr) - Returns true if expr
is not null, or false otherwise.
Examples:
> SELECT isnotnull(1);
true
isnull(expr) - Returns true if expr
is null, or false otherwise.
Examples:
> SELECT isnull(1);
false
java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.
Examples:
> SELECT java_method('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
json_tuple(jsonStr, p1, p2, ..., pn) - Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.
Examples:
> SELECT json_tuple('{"a":1, "b":2}', 'a', 'b');
1 2
kurtosis(expr) - Returns the kurtosis value calculated from values of a group.
lag(input[, offset[, default]]) - Returns the value of input
at the offset
th row before the current row in the window. The default value of offset
is 1 and the default value of default
is null. If the value of input
at the offset
th row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default
is returned.
last(expr[, isIgnoreNull]) - Returns the last value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values.
last_day(date) - Returns the last day of the month which the date belongs to.
Examples:
> SELECT last_day('2009-01-12');
2009-01-31
last_value(expr[, isIgnoreNull]) - Returns the last value of expr
for a group of rows. If isIgnoreNull
is true, returns only non-null values.
lcase(str) - Returns str
with all characters changed to lowercase.
Examples:
> SELECT lcase('SparkSql');
sparksql
lead(input[, offset[, default]]) - Returns the value of input
at the offset
th row after the current row in the window. The default value of offset
is 1 and the default value of default
is null. If the value of input
at the offset
th row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), default
is returned.
least(expr, ...) - Returns the least value of all parameters, skipping null values.
Examples:
> SELECT least(10, 9, 2, 4, 3);
2
left(str, len) - Returns the leftmost len
(len
can be string type) characters from the string str
,if len
is less or equal than 0 the result is an empty string.
Examples:
> SELECT left('Spark SQL', 3);
Spa
length(expr) - Returns the character length of expr
or number of bytes in binary data.
Examples:
> SELECT length('Spark SQL');
9
> SELECT CHAR_LENGTH('Spark SQL');
9
> SELECT CHARACTER_LENGTH('Spark SQL');
9
levenshtein(str1, str2) - Returns the Levenshtein distance between the two given strings.
Examples:
> SELECT levenshtein('kitten', 'sitting');
3
str like pattern - Returns true if str matches pattern, null if any arguments are null, false otherwise.
Arguments:
pattern - a string expression. The pattern is a string which is matched literally, with exception to the following special symbols:
_ matches any one character in the input (similar to . in posix regular expressions)
% matches zero or more characters in the input (similar to .* in posix regular expressions)
The escape character is '\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character.
Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\abc".
When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc".
Examples:
> SELECT '%SystemDrive%\Users\John' like '\%SystemDrive\%\\Users%'
true
Note:
Use RLIKE to match with standard regular expressions.
ln(expr) - Returns the natural logarithm (base e) of expr
.
Examples:
> SELECT ln(1);
0.0
locate(substr, str[, pos]) - Returns the position of the first occurrence of substr
in str
after position pos
. The given pos
and return value are 1-based.
Examples:
> SELECT locate('bar', 'foobarbar');
4
> SELECT locate('bar', 'foobarbar', 5);
7
> SELECT POSITION('bar' IN 'foobarbar');
4
log(base, expr) - Returns the logarithm of expr
with base
.
Examples:
> SELECT log(10, 100);
2.0
log10(expr) - Returns the logarithm of expr
with base 10.
Examples:
> SELECT log10(10);
1.0
log1p(expr) - Returns log(1 + expr
).
Examples:
> SELECT log1p(0);
0.0
log2(expr) - Returns the logarithm of expr
with base 2.
Examples:
> SELECT log2(2);
1.0
lower(str) - Returns str
with all characters changed to lowercase.
Examples:
> SELECT lower('SparkSql');
sparksql
lpad(str, len, pad) - Returns str
, left-padded with pad
to a length of len
. If str
is longer than len
, the return value is shortened to len
characters.
Examples:
> SELECT lpad('hi', 5, '??');
???hi
> SELECT lpad('hi', 1, '??');
h
ltrim(str) - Removes the leading and trailing space characters from str
.
Examples:
> SELECT ltrim(' SparkSQL');
SparkSQL
map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs.
Examples:
> SELECT map(1.0, '2', 3.0, '4');
{1.0:"2",3.0:"4"}
map_keys(map) - Returns an unordered array containing the keys of the map.
Examples:
> SELECT map_keys(map(1, 'a', 2, 'b'));
[1,2]
map_values(map) - Returns an unordered array containing the values of the map.
Examples:
> SELECT map_values(map(1, 'a', 2, 'b'));
["a","b"]
max(expr) - Returns the maximum value of expr
.
md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr
.
Examples:
> SELECT md5('Spark');
8cde774d6f7333752ed72cacddb05126
mean(expr) - Returns the mean calculated from values of a group.
min(expr) - Returns the minimum value of expr
.
minute(timestamp) - Returns the minute component of the string/timestamp.
Examples:
> SELECT minute('2009-07-30 12:58:59');
58
expr1 mod expr2 - Returns the remainder after expr1
/expr2
.
Examples:
> SELECT 2 mod 1.8;
0.2
> SELECT MOD(2, 1.8);
0.2
monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
month(date) - Returns the month component of the date/timestamp.
Examples:
> SELECT month('2016-07-30');
7
months_between(timestamp1, timestamp2) - Returns number of months between timestamp1
and timestamp2
.
Examples:
> SELECT months_between('1997-02-28 10:30:00', '1996-10-30');
3.94959677
named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values.
Examples:
> SELECT named_struct("a", 1, "b", 2, "c", 3);
{"a":1,"b":2,"c":3}
nanvl(expr1, expr2) - Returns expr1
if it's not NaN, or expr2
otherwise.
Examples:
> SELECT nanvl(cast('NaN' as double), 123);
123.0
negative(expr) - Returns the negated value of expr
.
Examples:
> SELECT negative(1);
-1
next_day(start_date, day_of_week) - Returns the first date which is later than start_date
and named as indicated.
Examples:
> SELECT next_day('2015-01-14', 'TU');
2015-01-20
not expr - Logical not.
now() - Returns the current timestamp at the start of query evaluation.
ntile(n) - Divides the rows for each window partition into n
buckets ranging from 1 to at most n
.
nullif(expr1, expr2) - Returns null if expr1
equals to expr2
, or expr1
otherwise.
Examples:
> SELECT nullif(2, 2);
NULL
nvl(expr1, expr2) - Returns expr2
if expr1
is null, or expr1
otherwise.
Examples:
> SELECT nvl(NULL, array('2'));
["2"]
nvl2(expr1, expr2, expr3) - Returns expr2
if expr1
is not null, or expr3
otherwise.
Examples:
> SELECT nvl2(NULL, 2, 1);
1
octet_length(expr) - Returns the byte length of expr
or number of bytes in binary data.
Examples:
> SELECT octet_length('Spark SQL');
9
expr1 or expr2 - Logical OR.
parse_url(url, partToExtract[, key]) - Extracts a part from a URL.
Examples:
> SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST')
spark.apache.org
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY')
query=1
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query')
1
percent_rank() - Computes the percentage ranking of a value in a group of values.
percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric column col
at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col
at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral
percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column col
at the given percentage. The value of percentage must be between 0.0 and 1.0. The accuracy
parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy
yields better accuracy, 1.0/accuracy
is the relative error of the approximation. When percentage
is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col
at the given percentage array.
Examples:
> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
[10.0,10.0,10.0]
> SELECT percentile_approx(10.0, 0.5, 100);
10.0
pi() - Returns pi.
Examples:
> SELECT pi();
3.141592653589793
pmod(expr1, expr2) - Returns the positive value of expr1
mod expr2
.
Examples:
> SELECT pmod(10, 3);
1
> SELECT pmod(-10, 3);
2
posexplode(expr) - Separates the elements of array expr
into multiple rows with positions, or the elements of map expr
into multiple rows and columns with positions.
Examples:
> SELECT posexplode(array(10,20));
0 10
1 20
posexplode_outer(expr) - Separates the elements of array expr
into multiple rows with positions, or the elements of map expr
into multiple rows and columns with positions.
Examples:
> SELECT posexplode_outer(array(10,20));
0 10
1 20
position(substr, str[, pos]) - Returns the position of the first occurrence of substr
in str
after position pos
. The given pos
and return value are 1-based.
Examples:
> SELECT position('bar', 'foobarbar');
4
> SELECT position('bar', 'foobarbar', 5);
7
> SELECT POSITION('bar' IN 'foobarbar');
4
positive(expr) - Returns the value of expr
.
pow(expr1, expr2) - Raises expr1
to the power of expr2
.
Examples:
> SELECT pow(2, 3);
8.0
power(expr1, expr2) - Raises expr1
to the power of expr2
.
Examples:
> SELECT power(2, 3);
8.0
printf(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.
Examples:
> SELECT printf("Hello World %d %s", 100, "days");
Hello World 100 days
quarter(date) - Returns the quarter of the year for date, in the range 1 to 4.
Examples:
> SELECT quarter('2016-08-31');
3
radians(expr) - Converts degrees to radians.
Examples:
> SELECT radians(180);
3.141592653589793
rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).
Examples:
> SELECT rand();
0.9629742951434543
> SELECT rand(0);
0.8446490682263027
> SELECT rand(null);
0.8446490682263027
randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.
Examples:
> SELECT randn();
-0.3254147983080288
> SELECT randn(0);
1.1164209726833079
> SELECT randn(null);
1.1164209726833079
rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.
reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.
Examples:
> SELECT reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
regexp_extract(str, regexp[, idx]) - Extracts a group that matches regexp
.
Examples:
> SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1);
100
regexp_replace(str, regexp, rep) - Replaces all substrings of str
that match regexp
with rep
.
Examples:
> SELECT regexp_replace('100-200', '(\d+)', 'num');
num-num
repeat(str, n) - Returns the string which repeats the given string value n times.
Examples:
> SELECT repeat('123', 2);
123123
replace(str, search[, replace]) - Replaces all occurrences of search
with replace
.
Arguments:
search
is not found in str
, str
is returned unchanged.replace
is not specified or is an empty string, nothing replaces the string that is removed from str
.Examples:
> SELECT replace('ABCabc', 'abc', 'DEF');
ABCDEF
reverse(str) - Returns the reversed given string.
Examples:
> SELECT reverse('Spark SQL');
LQS krapS
right(str, len) - Returns the rightmost len
(len
can be string type) characters from the string str
,if len
is less or equal than 0 the result is an empty string.
Examples:
> SELECT right('Spark SQL', 3);
SQL
rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer.
Examples:
> SELECT rint(12.3456);
12.0
str rlike regexp - Returns true if str
matches regexp
, or false otherwise.
Arguments:
regexp - a string expression. The pattern string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match "\abc", a regular expression for regexp
can be "^\abc$".
There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp
that can match "\abc" is "^\abc$".
Examples:
When spark.sql.parser.escapedStringLiterals is disabled (default).
> SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*'
true
When spark.sql.parser.escapedStringLiterals is enabled.
> SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\Users.*'
true
Note:
Use LIKE to match with simple string pattern.
round(expr, d) - Returns expr
rounded to d
decimal places using HALF_UP rounding mode.
Examples:
> SELECT round(2.5, 0);
3.0
row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
rpad(str, len, pad) - Returns str
, right-padded with pad
to a length of len
. If str
is longer than len
, the return value is shortened to len
characters.
Examples:
> SELECT rpad('hi', 5, '??');
hi???
> SELECT rpad('hi', 1, '??');
h
rtrim(str) - Removes the trailing space characters from str
.
Examples:
> SELECT rtrim(' SparkSQL ');
SparkSQL
second(timestamp) - Returns the second component of the string/timestamp.
Examples:
> SELECT second('2009-07-30 12:58:59');
59
sentences(str[, lang, country]) - Splits str
into an array of array of words.
Examples:
> SELECT sentences('Hi there! Good morning.');
[["Hi","there"],["Good","morning"]]
sha(expr) - Returns a sha1 hash value as a hex string of the expr
.
Examples:
> SELECT sha('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
sha1(expr) - Returns a sha1 hash value as a hex string of the expr
.
Examples:
> SELECT sha1('Spark');
85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c
sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr
. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.
Examples:
> SELECT sha2('Spark', 256);
529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b
shiftleft(base, expr) - Bitwise left shift.
Examples:
> SELECT shiftleft(2, 1);
4
shiftright(base, expr) - Bitwise (signed) right shift.
Examples:
> SELECT shiftright(4, 1);
2
shiftrightunsigned(base, expr) - Bitwise unsigned right shift.
Examples:
> SELECT shiftrightunsigned(4, 1);
2
sign(expr) - Returns -1.0, 0.0 or 1.0 as expr
is negative, 0 or positive.
Examples:
> SELECT sign(40);
1.0
signum(expr) - Returns -1.0, 0.0 or 1.0 as expr
is negative, 0 or positive.
Examples:
> SELECT signum(40);
1.0
sin(expr) - Returns the sine of expr
.
Examples:
> SELECT sin(0);
0.0
sinh(expr) - Returns the hyperbolic sine of expr
.
Examples:
> SELECT sinh(0);
0.0
size(expr) - Returns the size of an array or a map. Returns -1 if null.
Examples:
> SELECT size(array('b', 'd', 'c', 'a'));
4
skewness(expr) - Returns the skewness value calculated from values of a group.
smallint(expr) - Casts the value expr
to the target data type smallint
.
sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements.
Examples:
> SELECT sort_array(array('b', 'd', 'c', 'a'), true);
["a","b","c","d"]
soundex(str) - Returns Soundex code of the string.
Examples:
> SELECT soundex('Miller');
M460
space(n) - Returns a string consisting of n
spaces.
Examples:
> SELECT concat(space(2), '1');
1
spark_partition_id() - Returns the current partition id.
split(str, regex) - Splits str
around occurrences that match regex
.
Examples:
> SELECT split('oneAtwoBthreeC', '[ABC]');
["one","two","three",""]
sqrt(expr) - Returns the square root of expr
.
Examples:
> SELECT sqrt(4);
2.0
stack(n, expr1, ..., exprk) - Separates expr1
, ..., exprk
into n
rows.
Examples:
> SELECT stack(2, 1, 2, 3);
1 2
3 NULL
std(expr) - Returns the sample standard deviation calculated from values of a group.
stddev(expr) - Returns the sample standard deviation calculated from values of a group.
stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.
stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.
str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim
and ':' for keyValueDelim
.
Examples:
> SELECT str_to_map('a:1,b:2,c:3', ',', ':');
map("a":"1","b":"2","c":"3")
> SELECT str_to_map('a');
map("a":null)
string(expr) - Casts the value expr
to the target data type string
.
struct(col1, col2, col3, ...) - Creates a struct with the given field values.
substr(str, pos[, len]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
Examples:
> SELECT substr('Spark SQL', 5);
k SQL
> SELECT substr('Spark SQL', -3);
SQL
> SELECT substr('Spark SQL', 5, 1);
k
substring(str, pos[, len]) - Returns the substring of str
that starts at pos
and is of length len
, or the slice of byte array that starts at pos
and is of length len
.
Examples:
> SELECT substring('Spark SQL', 5);
k SQL
> SELECT substring('Spark SQL', -3);
SQL
> SELECT substring('Spark SQL', 5, 1);
k
substring_index(str, delim, count) - Returns the substring from str
before count
occurrences of the delimiter delim
. If count
is positive, everything to the left of the final delimiter (counting from the left) is returned. If count
is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim
.
Examples:
> SELECT substring_index('www.apache.org', '.', 2);
www.apache
sum(expr) - Returns the sum calculated from values of a group.
tan(expr) - Returns the tangent of expr
.
Examples:
> SELECT tan(0);
0.0
tanh(expr) - Returns the hyperbolic tangent of expr
.
Examples:
> SELECT tanh(0);
0.0
timestamp(expr) - Casts the value expr
to the target data type timestamp
.
tinyint(expr) - Casts the value expr
to the target data type tinyint
.
to_date(date_str[, fmt]) - Parses the date_str
expression with the fmt
expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt
is omitted.
Examples:
> SELECT to_date('2009-07-30 04:17:52');
2009-07-30
> SELECT to_date('2016-12-31', 'yyyy-MM-dd');
2016-12-31
to_json(expr[, options]) - Returns a json string with a given struct value
Examples:
> SELECT to_json(named_struct('a', 1, 'b', 2));
{"a":1,"b":2}
> SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
{"time":"26/08/2015"}
> SELECT to_json(array(named_struct('a', 1, 'b', 2));
[{"a":1,"b":2}]
Since: 2.2.0
to_timestamp(timestamp[, fmt]) - Parses the timestamp
expression with the fmt
expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt
is omitted.
Examples:
> SELECT to_timestamp('2016-12-31 00:12:00');
2016-12-31 00:12:00
> SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
2016-12-31 00:00:00
to_unix_timestamp(expr[, pattern]) - Returns the UNIX timestamp of the given time.
Examples:
> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460041200
to_utc_timestamp(timestamp, timezone) - Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.
Examples:
> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
2016-08-30 15:00:00
translate(input, from, to) - Translates the input
string by replacing the characters present in the from
string with the corresponding characters in the to
string.
Examples:
> SELECT translate('AaBbCc', 'abc', '123');
A1B2C3
trim(str) - Removes the leading and trailing space characters from str
.
Examples:
> SELECT trim(' SparkSQL ');
SparkSQL
trunc(date, fmt) - Returns date
with the time portion of the day truncated to the unit specified by the format model fmt
.
Examples:
> SELECT trunc('2009-02-12', 'MM');
2009-02-01
> SELECT trunc('2015-10-27', 'YEAR');
2015-01-01
ucase(str) - Returns str
with all characters changed to uppercase.
Examples:
> SELECT ucase('SparkSql');
SPARKSQL
unbase64(str) - Converts the argument from a base 64 string str
to a binary.
Examples:
> SELECT unbase64('U3BhcmsgU1FM');
Spark SQL
unhex(expr) - Converts hexadecimal expr
to binary.
Examples:
> SELECT decode(unhex('537061726B2053514C'), 'UTF-8');
Spark SQL
unix_timestamp([expr[, pattern]]) - Returns the UNIX timestamp of current or specified time.
Examples:
> SELECT unix_timestamp();
1476884637
> SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');
1460041200
upper(str) - Returns str
with all characters changed to uppercase.
Examples:
> SELECT upper('SparkSql');
SPARKSQL
uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
Examples:
> SELECT uuid();
46707d92-02f4-4817-8116-a4c3b23e6266
var_pop(expr) - Returns the population variance calculated from values of a group.
var_samp(expr) - Returns the sample variance calculated from values of a group.
variance(expr) - Returns the sample variance calculated from values of a group.
weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
Examples:
> SELECT weekofyear('2008-02-20');
8
CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1
= true, returns expr2
; when expr3
= true, return expr4
; else return expr5
.
xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.
Examples:
> SELECT xpath('b1b2b3 c1 c2','a/b/text()');
['b1','b2','b3']
xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.
Examples:
> SELECT xpath_boolean('1','a/b');
true
xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
> SELECT xpath_double('12', 'sum(a/b)');
3.0
xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
> SELECT xpath_float('12', 'sum(a/b)');
3.0
xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
> SELECT xpath_int('12', 'sum(a/b)');
3
xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
> SELECT xpath_long('12', 'sum(a/b)');
3
xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
Examples:
> SELECT xpath_number('12', 'sum(a/b)');
3.0
xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.
Examples:
> SELECT xpath_short('12', 'sum(a/b)');
3
xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.
Examples:
> SELECT xpath_string('b cc','a/c');
cc
year(date) - Returns the year component of the date/timestamp.
Examples:
> SELECT year('2016-07-30');
2016
expr1 | expr2 - Returns the result of bitwise OR of expr1
and expr2
.
Examples:
> SELECT 3 | 5;
7
~ expr - Returns the result of bitwise NOT of expr
.
Examples:
> SELECT ~ 0;
-1
原文:https://spark-test.github.io/sparksqldoc/