highfei2011

[Spark sql]--所有函数举例(spark-2.x版本)

!

! expr - Logical not.

%

expr1 % expr2 - Returns the remainder after expr1/expr2.

Examples:

> SELECT 2 % 1.8;
 0.2
> SELECT MOD(2, 1.8);
 0.2

&

expr1 & expr2 - Returns the result of bitwise AND of expr1 and expr2.

Examples:

> SELECT 3 & 5;
 1

*

expr1 * expr2 - Returns expr1*expr2.

Examples:

> SELECT 2 * 3;
 6

+

expr1 + expr2 - Returns expr1+expr2.

Examples:

> SELECT 1 + 2;
 3

-

expr1 - expr2 - Returns expr1-expr2.

Examples:

> SELECT 2 - 1;
 1

/

expr1 / expr2 - Returns expr1/expr2. It always performs floating point division.

Examples:

> SELECT 3 / 2;
 1.5
> SELECT 2L / 2L;
 1.0

<

expr1 < expr2 - Returns true if expr1 is less than expr2.

<=

expr1 <= expr2 - Returns true if expr1 is less than or equal to expr2.

<=>

expr1 <=> expr2 - Returns same result as the EQUAL(=) operator for non-null operands, but returns true if both are null, false if one of the them is null.

=

expr1 = expr2 - Returns true if expr1 equals expr2, or false otherwise.

==

expr1 == expr2 - Returns true if expr1 equals expr2, or false otherwise.

>

expr1 > expr2 - Returns true if expr1 is greater than expr2.

>=

expr1 >= expr2 - Returns true if expr1 is greater than or equal to expr2.

^

expr1 ^ expr2 - Returns the result of bitwise exclusive OR of expr1 and expr2.

Examples:

> SELECT 3 ^ 5;
 2

abs

abs(expr) - Returns the absolute value of the numeric value.

Examples:

> SELECT abs(-1);
 1

acos

acos(expr) - Returns the inverse cosine (a.k.a. arccosine) of expr if -1<=expr<=1 or NaN otherwise.

Examples:

> SELECT acos(1);
 0.0
> SELECT acos(2);
 NaN

add_months

add_months(start_date, num_months) - Returns the date that is num_months after start_date.

Examples:

> SELECT add_months('2016-08-31', 1);
 2016-09-30

and

expr1 and expr2 - Logical AND.

approx_count_distinct

approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD defines the maximum estimation error allowed.

approx_percentile

approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:

> SELECT approx_percentile(10.0, array(0.5, 0.4, 0.1), 100);
 [10.0,10.0,10.0]
> SELECT approx_percentile(10.0, 0.5, 100);
 10.0

array

array(expr, ...) - Returns an array with the given elements.

Examples:

> SELECT array(1, 2, 3);
 [1,2,3]

array_contains

array_contains(array, value) - Returns true if the array contains the value.

Examples:

> SELECT array_contains(array(1, 2, 3), 2);
 true

ascii

ascii(str) - Returns the numeric value of the first character of str.

Examples:

> SELECT ascii('222');
 50
> SELECT ascii(2);
 50

asin

asin(expr) - Returns the inverse sine (a.k.a. arcsine) the arc sin of expr if -1<=expr<=1 or NaN otherwise.

Examples:

> SELECT asin(0);
 0.0
> SELECT asin(2);
 NaN

assert_true

assert_true(expr) - Throws an exception if expr is not true.

Examples:

> SELECT assert_true(0 < 1);
 NULL

atan

atan(expr) - Returns the inverse tangent (a.k.a. arctangent).

Examples:

> SELECT atan(0);
 0.0

atan2

atan2(expr1, expr2) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (expr1, expr2).

Examples:

> SELECT atan2(0, 0);
 0.0

avg

avg(expr) - Returns the mean calculated from values of a group.

base64

base64(bin) - Converts the argument from a binary bin to a base 64 string.

Examples:

> SELECT base64('Spark SQL');
 U3BhcmsgU1FM

bigint

bigint(expr) - Casts the value expr to the target data type bigint.

bin

bin(expr) - Returns the string representation of the long value expr represented in binary.

Examples:

> SELECT bin(13);
 1101
> SELECT bin(-13);
 1111111111111111111111111111111111111111111111111111111111110011
> SELECT bin(13.3);
 1101

binary

binary(expr) - Casts the value expr to the target data type binary.

bit_length

bit_length(expr) - Returns the bit length of expr or number of bits in binary data.

Examples:

> SELECT bit_length('Spark SQL');
 72

boolean

boolean(expr) - Casts the value expr to the target data type boolean.

bround

bround(expr, d) - Returns expr rounded to d decimal places using HALF_EVEN rounding mode.

Examples:

> SELECT bround(2.5, 0);
 2.0

cast

cast(expr AS type) - Casts the value expr to the target data type type.

Examples:

> SELECT cast('10' as int);
 10

cbrt

cbrt(expr) - Returns the cube root of expr.

Examples:

> SELECT cbrt(27.0);
 3.0

ceil

ceil(expr) - Returns the smallest integer not smaller than expr.

Examples:

> SELECT ceil(-0.1);
 0
> SELECT ceil(5);
 5

ceiling

ceiling(expr) - Returns the smallest integer not smaller than expr.

Examples:

> SELECT ceiling(-0.1);
 0
> SELECT ceiling(5);
 5

char

char(expr) - Returns the ASCII character having the binary equivalent to expr. If n is larger than 256 the result is equivalent to chr(n % 256)

Examples:

> SELECT char(65);
 A

char_length

char_length(expr) - Returns the character length of expr or number of bytes in binary data.

Examples:

> SELECT char_length('Spark SQL');
 9
> SELECT CHAR_LENGTH('Spark SQL');
 9
> SELECT CHARACTER_LENGTH('Spark SQL');
 9

character_length

character_length(expr) - Returns the character length of expr or number of bytes in binary data.

Examples:

> SELECT character_length('Spark SQL');
 9
> SELECT CHAR_LENGTH('Spark SQL');
 9
> SELECT CHARACTER_LENGTH('Spark SQL');
 9

chr

chr(expr) - Returns the ASCII character having the binary equivalent to expr. If n is larger than 256 the result is equivalent to chr(n % 256)

Examples:

> SELECT chr(65);
 A

coalesce

coalesce(expr1, expr2, ...) - Returns the first non-null argument if exists. Otherwise, null.

Examples:

> SELECT coalesce(NULL, 1, NULL);
 1

collect_list

collect_list(expr) - Collects and returns a list of non-unique elements.

collect_set

collect_set(expr) - Collects and returns a set of unique elements.

concat

concat(str1, str2, ..., strN) - Returns the concatenation of str1, str2, ..., strN.

Examples:

> SELECT concat('Spark', 'SQL');
 SparkSQL

concat_ws

concat_ws(sep, [str | array(str)]+) - Returns the concatenation of the strings separated by sep.

Examples:

> SELECT concat_ws(' ', 'Spark', 'SQL');
  Spark SQL

conv

conv(num, from_base, to_base) - Convert num from from_base to to_base.

Examples:

> SELECT conv('100', 2, 10);
 4
> SELECT conv(-10, 16, -10);
 -16

corr

corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.

cos

cos(expr) - Returns the cosine of expr.

Examples:

> SELECT cos(0);
 1.0

cosh

cosh(expr) - Returns the hyperbolic cosine of expr.

Examples:

> SELECT cosh(0);
 1.0

cot

cot(expr) - Returns the cotangent of expr.

Examples:

> SELECT cot(1);
 0.6420926159343306

count

count(*) - Returns the total number of retrieved rows, including rows containing null.

count(expr) - Returns the number of rows for which the supplied expression is non-null.

count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.

count_min_sketch

count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

covar_pop

covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.

covar_samp

covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.

crc32

crc32(expr) - Returns a cyclic redundancy check value of the expr as a bigint.

Examples:

> SELECT crc32('Spark');
 1557323817

cube

cume_dist

cume_dist() - Computes the position of a value relative to all values in the partition.

current_database

current_database() - Returns the current database.

Examples:

> SELECT current_database();
 default

current_date

current_date() - Returns the current date at the start of query evaluation.

current_timestamp

current_timestamp() - Returns the current timestamp at the start of query evaluation.

date

date(expr) - Casts the value expr to the target data type date.

date_add

date_add(start_date, num_days) - Returns the date that is num_days after start_date.

Examples:

> SELECT date_add('2016-07-30', 1);
 2016-07-31

date_format

date_format(timestamp, fmt) - Converts timestamp to a value of string in the format specified by the date format fmt.

Examples:

> SELECT date_format('2016-04-08', 'y');
 2016

date_sub

date_sub(start_date, num_days) - Returns the date that is num_days before start_date.

Examples:

> SELECT date_sub('2016-07-30', 1);
 2016-07-29

datediff

datediff(endDate, startDate) - Returns the number of days from startDate to endDate.

Examples:

> SELECT datediff('2009-07-31', '2009-07-30');
 1

> SELECT datediff('2009-07-30', '2009-07-31');
 -1

day

day(date) - Returns the day of month of the date/timestamp.

Examples:

> SELECT day('2009-07-30');
 30

dayofmonth

dayofmonth(date) - Returns the day of month of the date/timestamp.

Examples:

> SELECT dayofmonth('2009-07-30');
 30

dayofweek

dayofweek(date) - Returns the day of the week for date/timestamp (1 = Sunday, 2 = Monday, ..., 7 = Saturday).

Examples:

> SELECT dayofweek('2009-07-30');
 5

dayofyear

dayofyear(date) - Returns the day of year of the date/timestamp.

Examples:

> SELECT dayofyear('2016-04-09');
 100

decimal

decimal(expr) - Casts the value expr to the target data type decimal.

decode

decode(bin, charset) - Decodes the first argument using the second argument character set.

Examples:

> SELECT decode(encode('abc', 'utf-8'), 'utf-8');
 abc

degrees

degrees(expr) - Converts radians to degrees.

Examples:

> SELECT degrees(3.141592653589793);
 180.0

dense_rank

dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.

double

double(expr) - Casts the value expr to the target data type double.

e

e() - Returns Euler's number, e.

Examples:

> SELECT e();
 2.718281828459045

elt

elt(n, str1, str2, ...) - Returns the n-th string, e.g., returns str2 when n is 2.

Examples:

> SELECT elt(1, 'scala', 'java');
 scala

encode

encode(str, charset) - Encodes the first argument using the second argument character set.

Examples:

> SELECT encode('abc', 'utf-8');
 abc

exp

exp(expr) - Returns e to the power of expr.

Examples:

> SELECT exp(0);
 1.0

explode

explode(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.

Examples:

> SELECT explode(array(10, 20));
 10
 20

explode_outer

explode_outer(expr) - Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.

Examples:

> SELECT explode_outer(array(10, 20));
 10
 20

expm1

expm1(expr) - Returns exp(expr) - 1.

Examples:

> SELECT expm1(0);
 0.0

factorial

factorial(expr) - Returns the factorial of expr. expr is [0..20]. Otherwise, null.

Examples:

> SELECT factorial(5);
 120

find_in_set

find_in_set(str, str_array) - Returns the index (1-based) of the given string (str) in the comma-delimited list (str_array). Returns 0, if the string was not found or if the given string (str) contains a comma.

Examples:

> SELECT find_in_set('ab','abc,b,ab,c,def');
 3

first

first(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

first_value

first_value(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNullis true, returns only non-null values.

float

float(expr) - Casts the value expr to the target data type float.

floor

floor(expr) - Returns the largest integer not greater than expr.

Examples:

> SELECT floor(-0.1);
 -1
> SELECT floor(5);
 5

format_number

format_number(expr1, expr2) - Formats the number expr1 like '#,###,###.##', rounded to expr2decimal places. If expr2 is 0, the result has no decimal point or fractional part. This is supposed to function like MySQL's FORMAT.

Examples:

> SELECT format_number(12332.123456, 4);
 12,332.1235

format_string

format_string(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.

Examples:

> SELECT format_string("Hello World %d %s", 100, "days");
 Hello World 100 days

from_json

from_json(jsonStr, schema[, options]) - Returns a struct value with the given jsonStr and schema.

Examples:

> SELECT from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE');
 {"a":1, "b":0.8}
> SELECT from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'));
 {"time":"2015-08-26 00:00:00.0"}

Since: 2.2.0

from_unixtime

from_unixtime(unix_time, format) - Returns unix_time in the specified format.

Examples:

> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');
 1970-01-01 00:00:00

from_utc_timestamp

from_utc_timestamp(timestamp, timezone) - Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.

Examples:

> SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');
 2016-08-31 09:00:00

get_json_object

get_json_object(json_txt, path) - Extracts a json object from path.

Examples:

> SELECT get_json_object('{"a":"b"}', '$.a');
 b

greatest

greatest(expr, ...) - Returns the greatest value of all parameters, skipping null values.

Examples:

> SELECT greatest(10, 9, 2, 4, 3);
 10

grouping

grouping_id

hash

hash(expr1, expr2, ...) - Returns a hash value of the arguments.

Examples:

> SELECT hash('Spark', array(123), 2);
 -1321691492

hex

hex(expr) - Converts expr to hexadecimal.

Examples:

> SELECT hex(17);
 11
> SELECT hex('Spark SQL');
 537061726B2053514C

hour

hour(timestamp) - Returns the hour component of the string/timestamp.

Examples:

> SELECT hour('2009-07-30 12:58:59');
 12

hypot

hypot(expr1, expr2) - Returns sqrt(expr12 + expr22).

Examples:

> SELECT hypot(3, 4);
 5.0

if

if(expr1, expr2, expr3) - If expr1 evaluates to true, then returns expr2; otherwise returns expr3.

Examples:

> SELECT if(1 < 2, 'a', 'b');
 a

ifnull

ifnull(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise.

Examples:

> SELECT ifnull(NULL, array('2'));
 ["2"]

in

expr1 in(expr2, expr3, ...) - Returns true if expr equals to any valN.

initcap

initcap(str) - Returns str with the first letter of each word in uppercase. All other letters are in lowercase. Words are delimited by white space.

Examples:

> SELECT initcap('sPark sql');
 Spark Sql

inline

inline(expr) - Explodes an array of structs into a table.

Examples:

> SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
 1  a
 2  b

inline_outer

inline_outer(expr) - Explodes an array of structs into a table.

Examples:

> SELECT inline_outer(array(struct(1, 'a'), struct(2, 'b')));
 1  a
 2  b

input_file_block_length

input_file_block_length() - Returns the length of the block being read, or -1 if not available.

input_file_block_start

input_file_block_start() - Returns the start offset of the block being read, or -1 if not available.

input_file_name

input_file_name() - Returns the name of the file being read, or empty string if not available.

instr

instr(str, substr) - Returns the (1-based) index of the first occurrence of substr in str.

Examples:

> SELECT instr('SparkSQL', 'SQL');
 6

int

int(expr) - Casts the value expr to the target data type int.

isnan

isnan(expr) - Returns true if expr is NaN, or false otherwise.

Examples:

> SELECT isnan(cast('NaN' as double));
 true

isnotnull

isnotnull(expr) - Returns true if expr is not null, or false otherwise.

Examples:

> SELECT isnotnull(1);
 true

isnull

isnull(expr) - Returns true if expr is null, or false otherwise.

Examples:

> SELECT isnull(1);
 false

java_method

java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

Examples:

> SELECT java_method('java.util.UUID', 'randomUUID');
 c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
 a5cf6c42-0c85-418f-af6c-3e4e5b1328f2

json_tuple

json_tuple(jsonStr, p1, p2, ..., pn) - Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string.

Examples:

> SELECT json_tuple('{"a":1, "b":2}', 'a', 'b');
 1  2

kurtosis

kurtosis(expr) - Returns the kurtosis value calculated from values of a group.

lag

lag(input[, offset[, default]]) - Returns the value of input at the offsetth row before the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offsetth row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default is returned.

last

last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

last_day

last_day(date) - Returns the last day of the month which the date belongs to.

Examples:

> SELECT last_day('2009-01-12');
 2009-01-31

last_value

last_value(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

lcase

lcase(str) - Returns str with all characters changed to lowercase.

Examples:

> SELECT lcase('SparkSql');
 sparksql

lead

lead(input[, offset[, default]]) - Returns the value of input at the offsetth row after the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offsetth row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), default is returned.

least

least(expr, ...) - Returns the least value of all parameters, skipping null values.

Examples:

> SELECT least(10, 9, 2, 4, 3);
 2

left

left(str, len) - Returns the leftmost len(len can be string type) characters from the string str,if len is less or equal than 0 the result is an empty string.

Examples:

> SELECT left('Spark SQL', 3);
 Spa

length

length(expr) - Returns the character length of expr or number of bytes in binary data.

Examples:

> SELECT length('Spark SQL');
 9
> SELECT CHAR_LENGTH('Spark SQL');
 9
> SELECT CHARACTER_LENGTH('Spark SQL');
 9

levenshtein

levenshtein(str1, str2) - Returns the Levenshtein distance between the two given strings.

Examples:

> SELECT levenshtein('kitten', 'sitting');
 3

like

str like pattern - Returns true if str matches pattern, null if any arguments are null, false otherwise.

Arguments:

str - a string expression
pattern - a string expression. The pattern is a string which is matched literally, with exception to the following special symbols:
_ matches any one character in the input (similar to . in posix regular expressions)
% matches zero or more characters in the input (similar to .* in posix regular expressions)
The escape character is '\'. If an escape character precedes a special symbol or another escape character, the following character is matched literally. It is invalid to escape any other character.
Since Spark 2.0, string literals are unescaped in our SQL parser. For example, in order to match "\abc", the pattern should be "\abc".
When SQL config 'spark.sql.parser.escapedStringLiterals' is enabled, it fallbacks to Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the pattern to match "\abc" should be "\abc".

Examples:

> SELECT '%SystemDrive%\Users\John' like '\%SystemDrive\%\\Users%'
true

Note:

Use RLIKE to match with standard regular expressions.

ln

ln(expr) - Returns the natural logarithm (base e) of expr.

Examples:

> SELECT ln(1);
 0.0

locate

locate(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based.

Examples:

> SELECT locate('bar', 'foobarbar');
 4
> SELECT locate('bar', 'foobarbar', 5);
 7
> SELECT POSITION('bar' IN 'foobarbar');
 4

log

log(base, expr) - Returns the logarithm of expr with base.

Examples:

> SELECT log(10, 100);
 2.0

log10

log10(expr) - Returns the logarithm of expr with base 10.

Examples:

> SELECT log10(10);
 1.0

log1p

log1p(expr) - Returns log(1 + expr).

Examples:

> SELECT log1p(0);
 0.0

log2

log2(expr) - Returns the logarithm of expr with base 2.

Examples:

> SELECT log2(2);
 1.0

lower

lower(str) - Returns str with all characters changed to lowercase.

Examples:

> SELECT lower('SparkSql');
 sparksql

lpad

lpad(str, len, pad) - Returns str, left-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters.

Examples:

> SELECT lpad('hi', 5, '??');
 ???hi
> SELECT lpad('hi', 1, '??');
 h

ltrim

ltrim(str) - Removes the leading and trailing space characters from str.

Examples:

> SELECT ltrim('    SparkSQL');
 SparkSQL

map

map(key0, value0, key1, value1, ...) - Creates a map with the given key/value pairs.

Examples:

> SELECT map(1.0, '2', 3.0, '4');
 {1.0:"2",3.0:"4"}

map_keys

map_keys(map) - Returns an unordered array containing the keys of the map.

Examples:

> SELECT map_keys(map(1, 'a', 2, 'b'));
 [1,2]

map_values

map_values(map) - Returns an unordered array containing the values of the map.

Examples:

> SELECT map_values(map(1, 'a', 2, 'b'));
 ["a","b"]

max

max(expr) - Returns the maximum value of expr.

md5

md5(expr) - Returns an MD5 128-bit checksum as a hex string of expr.

Examples:

> SELECT md5('Spark');
 8cde774d6f7333752ed72cacddb05126

mean

mean(expr) - Returns the mean calculated from values of a group.

min

min(expr) - Returns the minimum value of expr.

minute

minute(timestamp) - Returns the minute component of the string/timestamp.

Examples:

> SELECT minute('2009-07-30 12:58:59');
 58

mod

expr1 mod expr2 - Returns the remainder after expr1/expr2.

Examples:

> SELECT 2 mod 1.8;
 0.2
> SELECT MOD(2, 1.8);
 0.2

monotonically_increasing_id

monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

month

month(date) - Returns the month component of the date/timestamp.

Examples:

> SELECT month('2016-07-30');
 7

months_between

months_between(timestamp1, timestamp2) - Returns number of months between timestamp1 and timestamp2.

Examples:

> SELECT months_between('1997-02-28 10:30:00', '1996-10-30');
 3.94959677

named_struct

named_struct(name1, val1, name2, val2, ...) - Creates a struct with the given field names and values.

Examples:

> SELECT named_struct("a", 1, "b", 2, "c", 3);
 {"a":1,"b":2,"c":3}

nanvl

nanvl(expr1, expr2) - Returns expr1 if it's not NaN, or expr2 otherwise.

Examples:

> SELECT nanvl(cast('NaN' as double), 123);
 123.0

negative

negative(expr) - Returns the negated value of expr.

Examples:

> SELECT negative(1);
 -1

next_day

next_day(start_date, day_of_week) - Returns the first date which is later than start_date and named as indicated.

Examples:

> SELECT next_day('2015-01-14', 'TU');
 2015-01-20

not

not expr - Logical not.

now

now() - Returns the current timestamp at the start of query evaluation.

ntile

ntile(n) - Divides the rows for each window partition into n buckets ranging from 1 to at most n.

nullif

nullif(expr1, expr2) - Returns null if expr1 equals to expr2, or expr1 otherwise.

Examples:

> SELECT nullif(2, 2);
 NULL

nvl

nvl(expr1, expr2) - Returns expr2 if expr1 is null, or expr1 otherwise.

Examples:

> SELECT nvl(NULL, array('2'));
 ["2"]

nvl2

nvl2(expr1, expr2, expr3) - Returns expr2 if expr1 is not null, or expr3 otherwise.

Examples:

> SELECT nvl2(NULL, 2, 1);
 1

octet_length

octet_length(expr) - Returns the byte length of expr or number of bytes in binary data.

Examples:

> SELECT octet_length('Spark SQL');
 9

or

expr1 or expr2 - Logical OR.

parse_url

parse_url(url, partToExtract[, key]) - Extracts a part from a URL.

Examples:

> SELECT parse_url('http://spark.apache.org/path?query=1', 'HOST')
 spark.apache.org
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY')
 query=1
> SELECT parse_url('http://spark.apache.org/path?query=1', 'QUERY', 'query')
 1

percent_rank

percent_rank() - Computes the percentage ranking of a value in a group of values.

percentile

percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric column colat the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral

percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral

percentile_approx

percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:

> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
 [10.0,10.0,10.0]
> SELECT percentile_approx(10.0, 0.5, 100);
 10.0

pi

pi() - Returns pi.

Examples:

> SELECT pi();
 3.141592653589793

pmod

pmod(expr1, expr2) - Returns the positive value of expr1 mod expr2.

Examples:

> SELECT pmod(10, 3);
 1
> SELECT pmod(-10, 3);
 2

posexplode

posexplode(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions.

Examples:

> SELECT posexplode(array(10,20));
 0  10
 1  20

posexplode_outer

posexplode_outer(expr) - Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions.

Examples:

> SELECT posexplode_outer(array(10,20));
 0  10
 1  20

position

position(substr, str[, pos]) - Returns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based.

Examples:

> SELECT position('bar', 'foobarbar');
 4
> SELECT position('bar', 'foobarbar', 5);
 7
> SELECT POSITION('bar' IN 'foobarbar');
 4

positive

positive(expr) - Returns the value of expr.

pow

pow(expr1, expr2) - Raises expr1 to the power of expr2.

Examples:

> SELECT pow(2, 3);
 8.0

power

power(expr1, expr2) - Raises expr1 to the power of expr2.

Examples:

> SELECT power(2, 3);
 8.0

printf

printf(strfmt, obj, ...) - Returns a formatted string from printf-style format strings.

Examples:

> SELECT printf("Hello World %d %s", 100, "days");
 Hello World 100 days

quarter

quarter(date) - Returns the quarter of the year for date, in the range 1 to 4.

Examples:

> SELECT quarter('2016-08-31');
 3

radians

radians(expr) - Converts degrees to radians.

Examples:

> SELECT radians(180);
 3.141592653589793

rand

rand([seed]) - Returns a random value with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1).

Examples:

> SELECT rand();
 0.9629742951434543
> SELECT rand(0);
 0.8446490682263027
> SELECT rand(null);
 0.8446490682263027

randn

randn([seed]) - Returns a random value with independent and identically distributed (i.i.d.) values drawn from the standard normal distribution.

Examples:

> SELECT randn();
 -0.3254147983080288
> SELECT randn(0);
 1.1164209726833079
> SELECT randn(null);
 1.1164209726833079

rank

rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.

reflect

reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

Examples:

> SELECT reflect('java.util.UUID', 'randomUUID');
 c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
 a5cf6c42-0c85-418f-af6c-3e4e5b1328f2

regexp_extract

regexp_extract(str, regexp[, idx]) - Extracts a group that matches regexp.

Examples:

> SELECT regexp_extract('100-200', '(\d+)-(\d+)', 1);
 100

regexp_replace

regexp_replace(str, regexp, rep) - Replaces all substrings of str that match regexp with rep.

Examples:

> SELECT regexp_replace('100-200', '(\d+)', 'num');
 num-num

repeat

repeat(str, n) - Returns the string which repeats the given string value n times.

Examples:

> SELECT repeat('123', 2);
 123123

replace

replace(str, search[, replace]) - Replaces all occurrences of search with replace.

Arguments:

str - a string expression
search - a string expression. If search is not found in str, str is returned unchanged.
replace - a string expression. If replace is not specified or is an empty string, nothing replaces the string that is removed from str.

Examples:

> SELECT replace('ABCabc', 'abc', 'DEF');
 ABCDEF

reverse

reverse(str) - Returns the reversed given string.

Examples:

> SELECT reverse('Spark SQL');
 LQS krapS

right

right(str, len) - Returns the rightmost len(len can be string type) characters from the string str,if len is less or equal than 0 the result is an empty string.

Examples:

> SELECT right('Spark SQL', 3);
 SQL

rint

rint(expr) - Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

Examples:

> SELECT rint(12.3456);
 12.0

rlike

str rlike regexp - Returns true if str matches regexp, or false otherwise.

Arguments:

str - a string expression
regexp - a string expression. The pattern string should be a Java regular expression.
Since Spark 2.0, string literals (including regex patterns) are unescaped in our SQL parser. For example, to match "\abc", a regular expression for regexp can be "^\abc$".
There is a SQL config 'spark.sql.parser.escapedStringLiterals' that can be used to fallback to the Spark 1.6 behavior regarding string literal parsing. For example, if the config is enabled, the regexp that can match "\abc" is "^\abc$".

Examples:

When spark.sql.parser.escapedStringLiterals is disabled (default).
> SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\\Users.*'
true

When spark.sql.parser.escapedStringLiterals is enabled.
> SELECT '%SystemDrive%\Users\John' rlike '%SystemDrive%\Users.*'
true

Note:

Use LIKE to match with simple string pattern.

rollup

round

round(expr, d) - Returns expr rounded to d decimal places using HALF_UP rounding mode.

Examples:

> SELECT round(2.5, 0);
 3.0

row_number

row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.

rpad

rpad(str, len, pad) - Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters.

Examples:

> SELECT rpad('hi', 5, '??');
 hi???
> SELECT rpad('hi', 1, '??');
 h

rtrim

rtrim(str) - Removes the trailing space characters from str.

Examples:

> SELECT rtrim('    SparkSQL   ');
     SparkSQL

second

second(timestamp) - Returns the second component of the string/timestamp.

Examples:

> SELECT second('2009-07-30 12:58:59');
 59

sentences

sentences(str[, lang, country]) - Splits str into an array of array of words.

Examples:

> SELECT sentences('Hi there! Good morning.');
 [["Hi","there"],["Good","morning"]]

sha

sha(expr) - Returns a sha1 hash value as a hex string of the expr.

Examples:

> SELECT sha('Spark');
 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c

sha1

sha1(expr) - Returns a sha1 hash value as a hex string of the expr.

Examples:

> SELECT sha1('Spark');
 85f5955f4b27a9a4c2aab6ffe5d7189fc298b92c

sha2

sha2(expr, bitLength) - Returns a checksum of SHA-2 family as a hex string of expr. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.

Examples:

> SELECT sha2('Spark', 256);
 529bc3b07127ecb7e53a4dcf1991d9152c24537d919178022b2c42657f79a26b

shiftleft

shiftleft(base, expr) - Bitwise left shift.

Examples:

> SELECT shiftleft(2, 1);
 4

shiftright

shiftright(base, expr) - Bitwise (signed) right shift.

Examples:

> SELECT shiftright(4, 1);
 2

shiftrightunsigned

shiftrightunsigned(base, expr) - Bitwise unsigned right shift.

Examples:

> SELECT shiftrightunsigned(4, 1);
 2

sign

sign(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.

Examples:

> SELECT sign(40);
 1.0

signum

signum(expr) - Returns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.

Examples:

> SELECT signum(40);
 1.0

sin

sin(expr) - Returns the sine of expr.

Examples:

> SELECT sin(0);
 0.0

sinh

sinh(expr) - Returns the hyperbolic sine of expr.

Examples:

> SELECT sinh(0);
 0.0

size

size(expr) - Returns the size of an array or a map. Returns -1 if null.

Examples:

> SELECT size(array('b', 'd', 'c', 'a'));
 4

skewness

skewness(expr) - Returns the skewness value calculated from values of a group.

smallint

smallint(expr) - Casts the value expr to the target data type smallint.

sort_array

sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements.

Examples:

> SELECT sort_array(array('b', 'd', 'c', 'a'), true);
 ["a","b","c","d"]

soundex

soundex(str) - Returns Soundex code of the string.

Examples:

> SELECT soundex('Miller');
 M460

space

space(n) - Returns a string consisting of n spaces.

Examples:

> SELECT concat(space(2), '1');
   1

spark_partition_id

spark_partition_id() - Returns the current partition id.

split

split(str, regex) - Splits str around occurrences that match regex.

Examples:

> SELECT split('oneAtwoBthreeC', '[ABC]');
 ["one","two","three",""]

sqrt

sqrt(expr) - Returns the square root of expr.

Examples:

> SELECT sqrt(4);
 2.0

stack

stack(n, expr1, ..., exprk) - Separates expr1, ..., exprk into n rows.

Examples:

> SELECT stack(2, 1, 2, 3);
 1  2
 3  NULL

std

std(expr) - Returns the sample standard deviation calculated from values of a group.

stddev

stddev(expr) - Returns the sample standard deviation calculated from values of a group.

stddev_pop

stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.

stddev_samp

stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.

str_to_map

str_to_map(text[, pairDelim[, keyValueDelim]]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and ':' for keyValueDelim.

Examples:

> SELECT str_to_map('a:1,b:2,c:3', ',', ':');
 map("a":"1","b":"2","c":"3")
> SELECT str_to_map('a');
 map("a":null)

string

string(expr) - Casts the value expr to the target data type string.

struct

struct(col1, col2, col3, ...) - Creates a struct with the given field values.

substr

substr(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

Examples:

> SELECT substr('Spark SQL', 5);
 k SQL
> SELECT substr('Spark SQL', -3);
 SQL
> SELECT substr('Spark SQL', 5, 1);
 k

substring

substring(str, pos[, len]) - Returns the substring of str that starts at pos and is of length len, or the slice of byte array that starts at pos and is of length len.

Examples:

> SELECT substring('Spark SQL', 5);
 k SQL
> SELECT substring('Spark SQL', -3);
 SQL
> SELECT substring('Spark SQL', 5, 1);
 k

substring_index

substring_index(str, delim, count) - Returns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim.

Examples:

> SELECT substring_index('www.apache.org', '.', 2);
 www.apache

sum

sum(expr) - Returns the sum calculated from values of a group.

tan

tan(expr) - Returns the tangent of expr.

Examples:

> SELECT tan(0);
 0.0

tanh

tanh(expr) - Returns the hyperbolic tangent of expr.

Examples:

> SELECT tanh(0);
 0.0

timestamp

timestamp(expr) - Casts the value expr to the target data type timestamp.

tinyint

tinyint(expr) - Casts the value expr to the target data type tinyint.

to_date

to_date(date_str[, fmt]) - Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.

Examples:

> SELECT to_date('2009-07-30 04:17:52');
 2009-07-30
> SELECT to_date('2016-12-31', 'yyyy-MM-dd');
 2016-12-31

to_json

to_json(expr[, options]) - Returns a json string with a given struct value

Examples:

> SELECT to_json(named_struct('a', 1, 'b', 2));
 {"a":1,"b":2}
> SELECT to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy'));
 {"time":"26/08/2015"}
> SELECT to_json(array(named_struct('a', 1, 'b', 2));
 [{"a":1,"b":2}]

Since: 2.2.0

to_timestamp

to_timestamp(timestamp[, fmt]) - Parses the timestamp expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.

Examples:

> SELECT to_timestamp('2016-12-31 00:12:00');
 2016-12-31 00:12:00
> SELECT to_timestamp('2016-12-31', 'yyyy-MM-dd');
 2016-12-31 00:00:00

to_unix_timestamp

to_unix_timestamp(expr[, pattern]) - Returns the UNIX timestamp of the given time.

Examples:

> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');
 1460041200

to_utc_timestamp

to_utc_timestamp(timestamp, timezone) - Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.

Examples:

> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');
 2016-08-30 15:00:00

translate

translate(input, from, to) - Translates the input string by replacing the characters present in the from string with the corresponding characters in the to string.

Examples:

> SELECT translate('AaBbCc', 'abc', '123');
 A1B2C3

trim

trim(str) - Removes the leading and trailing space characters from str.

Examples:

> SELECT trim('    SparkSQL   ');
 SparkSQL

trunc

trunc(date, fmt) - Returns date with the time portion of the day truncated to the unit specified by the format model fmt.

Examples:

> SELECT trunc('2009-02-12', 'MM');
 2009-02-01
> SELECT trunc('2015-10-27', 'YEAR');
 2015-01-01

ucase

ucase(str) - Returns str with all characters changed to uppercase.

Examples:

> SELECT ucase('SparkSql');
 SPARKSQL

unbase64

unbase64(str) - Converts the argument from a base 64 string str to a binary.

Examples:

> SELECT unbase64('U3BhcmsgU1FM');
 Spark SQL

unhex

unhex(expr) - Converts hexadecimal expr to binary.

Examples:

> SELECT decode(unhex('537061726B2053514C'), 'UTF-8');
 Spark SQL

unix_timestamp

unix_timestamp([expr[, pattern]]) - Returns the UNIX timestamp of current or specified time.

Examples:

> SELECT unix_timestamp();
 1476884637
> SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');
 1460041200

upper

upper(str) - Returns str with all characters changed to uppercase.

Examples:

> SELECT upper('SparkSql');
 SPARKSQL

uuid

uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.

Examples:

> SELECT uuid();
 46707d92-02f4-4817-8116-a4c3b23e6266

var_pop

var_pop(expr) - Returns the population variance calculated from values of a group.

var_samp

var_samp(expr) - Returns the sample variance calculated from values of a group.

variance

variance(expr) - Returns the sample variance calculated from values of a group.

weekofyear

weekofyear(date) - Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.

Examples:

> SELECT weekofyear('2008-02-20');
 8

when

CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE expr5] END - When expr1 = true, returns expr2; when expr3 = true, return expr4; else return expr5.

window

xpath

xpath(xml, xpath) - Returns a string array of values within the nodes of xml that match the XPath expression.

Examples:

> SELECT xpath('b1b2b3c1c2','a/b/text()');
 ['b1','b2','b3']

xpath_boolean

xpath_boolean(xml, xpath) - Returns true if the XPath expression evaluates to true, or if a matching node is found.

Examples:

> SELECT xpath_boolean('1','a/b');
 true

xpath_double

xpath_double(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

> SELECT xpath_double('12', 'sum(a/b)');
 3.0

xpath_float

xpath_float(xml, xpath) - Returns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

> SELECT xpath_float('12', 'sum(a/b)');
 3.0

xpath_int

xpath_int(xml, xpath) - Returns an integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

> SELECT xpath_int('12', 'sum(a/b)');
 3

xpath_long

xpath_long(xml, xpath) - Returns a long integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

> SELECT xpath_long('12', 'sum(a/b)');
 3

xpath_number

xpath_number(xml, xpath) - Returns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.

Examples:

> SELECT xpath_number('12', 'sum(a/b)');
 3.0

xpath_short

xpath_short(xml, xpath) - Returns a short integer value, or the value zero if no match is found, or a match is found but the value is non-numeric.

Examples:

> SELECT xpath_short('12', 'sum(a/b)');
 3

xpath_string

xpath_string(xml, xpath) - Returns the text contents of the first xml node that matches the XPath expression.

Examples:

> SELECT xpath_string('bcc','a/c');
 cc

year

year(date) - Returns the year component of the date/timestamp.

Examples:

> SELECT year('2016-07-30');
 2016

|

expr1 | expr2 - Returns the result of bitwise OR of expr1 and expr2.

Examples:

> SELECT 3 | 5;
 7

~

~ expr - Returns the result of bitwise NOT of expr.

Examples:

> SELECT ~ 0;
 -1

原文：https://spark-test.github.io/sparksqldoc/

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
Spark 组件 GraphX、Streaming 叶域大数据 spark spark 大数据分布式
Spark组件GraphX、Streaming一、SparkGraphX1.1GraphX的主要概念1.2GraphX的核心操作1.3示例代码1.4GraphX的应用场景二、SparkStreaming2.1SparkStreaming的主要概念2.2示例代码2.3SparkStreaming的集成2.4SparkStreaming的应用场景SparkGraphX用于处理图和图并行计算。Graph
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
Spark集群的三种模式 MelodyYN #Spark spark hadoop big data
文章目录1、Spark的由来1.1Hadoop的发展1.2MapReduce与Spark对比2、Spark内置模块3、Spark运行模式3.1Standalone模式部署配置历史服务器配置高可用运行模式3.2Yarn模式安装部署配置历史服务器运行模式4、WordCount案例1、Spark的由来定义：Hadoop主要解决，海量数据的存储和海量数据的分析计算。Spark是一种基于内存的快速、通用、可
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
写出渗透测试信息收集详细流程卿酌南烛_b805
一、扫描域名漏洞：域名漏洞扫描工具有AWVS、APPSCAN、Netspark、WebInspect、Nmap、Nessus、天镜、明鉴、WVSS、RSAS等。二、子域名探测：1、dns域传送漏洞2、搜索引擎查找（通过Google、bing、搜索c段）3、通过ssl证书查询网站：https://myssl.com/ssl.html和https://www.chinassl.net/ssltools
Spark MLlib模型训练—推荐算法 ALS(Alternative Least Squares) 不二人生 Spark ML 实战 spark-ml 推荐算法算法
SparkMLlib模型训练—推荐算法ALS(AlternativeLeastSquares)如果你平时爱刷抖音，或者热衷看电影，不知道有没有过这样的体验：这类影视App你用得越久，它就好像会读心术一样，总能给你推荐对胃口的内容。其实这种迎合用户喜好的推荐，离不开机器学习中的推荐算法。在今天这一讲，我们就结合两个有趣的电影推荐场景，为你讲解SparkMLlib支持的协同过滤与频繁项集算法电影推荐场
Python基础知识进阶之正则表达式_头歌python正则表达式进阶前端陈萨龙程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
分布式离线计算—Spark—基础介绍测试开发abbey 人工智能—大数据
原文作者：饥渴的小苹果原文地址：【Spark】Spark基础教程目录Spark特点Spark相对于Hadoop的优势Spark生态系统Spark基本概念Spark结构设计Spark各种概念之间的关系Executor的优点Spark运行基本流程Spark运行架构的特点Spark的部署模式Spark三种部署方式Hadoop和Spark的统一部署摘要：Spark是基于内存计算的大数据并行计算框架Spar
spark常用命令我是浣熊的微笑 spark
查看报错日志：yarnlogsapplicationIDspark2-submit--masteryarn--classcom.hik.ReadHdfstest-1.0-SNAPSHOT.jar进入$SPARK_HOME目录，输入bin/spark-submit--help可以得到该命令的使用帮助。hadoop@wyy:/app/hadoop/spark100$bin/spark-submit--
spark启动命令学不会又听不懂 spark 大数据分布式
hadoop启动：cd/root/toolssstart-dfs.sh，只需在hadoop01上启动stop-dfs.sh日志查看：cat/root/toolss/hadoop/logs/hadoop-root-datanode-hadoop03.outzookeeper启动：cd/root/toolss/zookeeperbin/zkServer.shstart，三台都要启动bin/zkServ
大数据领域的深度分析——AI是在帮助开发者还是取代他们？阳爱铭大数据与数据中台技术沉淀大数据人工智能后端数据库架构数据库开发 etl工程师 chatgpt
在大数据领域，生成式人工智能（AIGC）的应用正在迅速扩展，改变了数据科学家和开发者的工作方式。本文将从大数据的专业视角，探讨AI工具在这一领域的作用，以及它们是如何帮助开发者而非取代他们的。1.大数据领域的AI工具现状在大数据领域，AI工具已经取得了显著进展，以下是几款主要的AI工具及其功能和实际应用：ApacheSpark+MLlib：ApacheSpark是一个开源的分布式计算系统，广泛用于
大数据新视界 --大数据大厂之 Spark 性能优化秘籍：从配置到代码实践青云交大数据新视界 Spark 性能优化内存分配并行度存储级别 shuffle 减少算法优化代码实践数据读取广播变量数据倾斜 Spark 数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
【面试系列】Spark 高频面试题解答野老杂谈全网最全IT公司面试宝典面试 spark 职场和发展大数据
欢迎来到我的博客，很高兴能够在这里和您见面！欢迎订阅相关专栏：⭐️全网最全IT互联网公司面试宝典：收集整理全网各大IT互联网公司技术、项目、HR面试真题.⭐️AIGC时代的创新与未来：详细讲解AIGC的概念、核心技术、应用领域等内容。⭐️大数据平台建设指南：全面讲解从数据采集到数据可视化的整个过程，掌握构建现代化数据平台的核心技术和方法。⭐️《遇见Python：初识、了解与热恋》：涵盖了Pytho
spark常见面试题爱敲代码的小黑 spark 大数据分布式
文章目录1.Spark的运行流程？2.Spark中的RDD机制理解吗？3.RDD的宽窄依赖4.DAG中为什么要划分Stage？5.Spark程序执行，有时候默认为什么会产生很多task，怎么修改默认task执行个数？6.RDD中reduceBykey与groupByKey哪个性能好，为什么？7.SparkMasterHA主从切换过程不会影响到集群已有作业的运行，为什么？8.SparkMaster使
Spark面试题 golove666 面试题大全 spark 大数据分布式面试
Spark面试题1.Spark基础概念1.1解释Spark是什么以及它的主要特点Spark是什么？Spark的主要特点1.2描述Spark运行时架构和组件主要的Spark架构组件：1.3讲述Spark中的弹性分布式数据集（RDD）和数据帧（DataFrame）弹性分布式数据集（RDD）主要特征：创建和转换：使用场景：数据帧（DataFrame）主要特征：创建和操作：使用场景：RDD与DataFra
图计算：基于SparkGrpahX计算聚类系数妙龄少女郭德纲 Spark 图算法 Scala 聚类数据挖掘机器学习
图计算：基于SparkGrpahX计算聚类系数文章目录图计算：基于SparkGrpahX计算聚类系数一、什么是聚类系数二、基于SparkGraphX的聚类系数代码实现总结一、什么是聚类系数聚类系数（ClusteringCoefficient）是图计算和网络分析中的一个重要概念，用于衡量网络中节点的局部聚集程度。它有助于理解网络中节点之间的紧密程度和网络的结构特性。这是一种用来衡量图中节点聚类程度的
2024年最全使用Python求解方程_python解方程(1)，字节面试官迟到 2401_84569545 程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
Spark运行时架构 tooolik spark 架构大数据
目录一，Spark运行时架构二，YARN集群架构（一）YARN集群主要组件1、ResourceManager-资源管理器2、NodeManager-节点管理器3、Task-任务4、Container-容器5、ApplicationMaster-应用程序管理器6，总结（二）YARN集群中应用程序的执行流程三、SparkStandalone架构（一）client提交方式（二）cluster提交方式四、
使用SparkSql进行表的分析与统计 xingyuan8 大数据 java
背景我们的数据挖掘平台对数据统计有比较迫切的需求，而Spark本身对数据统计已经做了一些工作，希望梳理一下Spark已经支持的数据统计功能，后期再进行扩展。准备数据在参考文献6中下载鸢尾花数据，此处格式为iris.data格式，先将data后缀改为csv后缀（不影响使用，只是为了保证后续操作不需要修改）。数据格式如下：SepalLengthSepalWidthPetalLengthPetalWid
13.Spark Core-Spark中广播变量和累加器 __元昊__
一、前述Spark中因为算子中的真正逻辑是发送到Executor中去运行的，所以当Executor中需要引用外部变量时，需要使用广播变量。累机器相当于统筹大变量，常用于计数，统计。二、具体原理1、广播变量广播变量理解图image注意事项1、能不能将一个RDD使用广播变量广播出去？不能，因为RDD是不存储数据的。可以将RDD的结果广播出去。2、广播变量只能在Driver端定义，不能在Executor
比较Spark与Flink 傲雪凌霜，松柏长青大数据后端 spark flink 大数据
ApacheSpark和ApacheFlink都是目前非常流行的大数据处理引擎，但它们在架构、处理模式、应用场景等方面有一些显著的区别。下面是二者的对比：1.处理模式Spark:主要支持批处理（BatchProcessing），也能通过SparkStreaming处理流式数据，但SparkStreaming本质上是通过微批（micro-batching）的方式处理流数据，延迟相对较高。SparkS
Spark底层逻辑傲雪凌霜，松柏长青大数据后端 spark 大数据
ApacheSpark的底层逻辑可以从其核心概念、组件和执行流程等方面来理解。Spark提供了一个分布式数据处理框架，其底层逻辑基于批处理架构，能够在大规模集群中高效地处理数据。以下是Spark的底层逻辑的详细介绍：1.核心概念Spark的底层基于几个核心概念来实现分布式计算，包括：RDD（ResilientDistributedDataset，弹性分布式数据集）：RDD是Spark最基础的数据抽
Spark - 升级版数据源JDBC2 大猪大猪
在spark的数据源中，只支持Append,Overwrite,ErrorIfExists,Ignore,这几种模式，但是我们在线上的业务几乎全是需要upsert功能的，就是已存在的数据肯定不能覆盖，在mysql中实现就是采用：ONDUPLICATEKEYUPDATE，有没有这样一种实现？官方：不好意思，不提供，dounine：我这有呀，你来用吧。哈哈，为了方便大家的使用我已经把项目打包到mave
PySpark 静听山水 Spark spark
PySpark的本质确实是Python的一个接口层，它允许你使用Python语言来编写ApacheSpark应用程序。通过这个接口，你可以利用Spark强大的分布式计算能力，同时享受Python的易用性和灵活性。1、PySpark的工作原理PySpark的工作原理可以概括为以下几个步骤：编写Python代码：开发者使用Python语法来编写Spark应用程序。这些程序通常涉及创建RDDs（弹性分布
Ubuntu的ssh 请不要问我是谁
安装sshsudoapt-getupdatesudoapt-getinstallopenssh-server检测ssh是否启动sudops-e|grepssh创建root用户sudopasswdroot配置本机无密码ssh登录cd/home/spark0ssh-keygen-trsa-P""cat.ssh/id_rsa.pub>>.ssh/authorized_keyschmod600.ssh/a
2024年大数据最新实时数仓之实时数仓架构(Hudi) 2401_84185556 程序员大数据架构
技术框架Kafka：用于接入数据源；FlinkCDC：如果直接接入业务数据源可以考虑CDC方式，如果通过Kafka缓冲接入业务数据可以忽略;Flink：用于数据ETL，包括接入数据、处理数据及输出数据全链路数据计算任务；Spark：用于数据ETL，包括处理数据及输出数据全链路数据计算任务；Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；Doris：O
实时数仓之实时数仓架构(Hudi)(1)，2024年最新熬夜整理华为最新大数据开发笔试题 2401_84181221 程序员架构大数据
+Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；+Doris：OLAP引擎，同步数仓结果模型，对外提供数据服务支持；+Hbase：用来存储维表信息，维表数据来源一部分有Flink加工实时写入，另一部分是从Spark任务生产，其主要作用用来支持FlinkETL处理过程中的LookupJoin功能。这里选用Hbase原因主要因为Table的HbaseC
java线程的无限循环和退出 3213213333332132 java
最近想写一个游戏，然后碰到有关线程的问题，网上查了好多资料都没满足。突然想起了前段时间看的有关线程的视频，于是信手拈来写了一个线程的代码片段。希望帮助刚学java线程的童鞋 package thread; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date
tomcat 容器 BlueSkator tomcat Web servlet
Tomcat的组成部分 1、server A Server element represents the entire Catalina servlet container. (Singleton) 2、service service包括多个connector以及一个engine，其职责为处理由connector获得的客户请求。 3、connector 一个connector
php递归,静态变量,匿名函数使用 dcj3sjt126com PHP 递归函数匿名函数静态变量引用传参
<!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Current To-Do List</title> </head> <body>
属性颜色字体变化周华华 JavaScript
function changSize(className){ var diva=byId("fot") diva.className=className; } </script> <style type="text/css"> .max{ background: #900; color:#039;
将properties内容放置到map中 g21121 properties
代码比较简单： private static Map<Object, Object> map; private static Properties p; static { //读取properties文件 InputStream is = XXX.class.getClassLoader().getResourceAsStream("xxx.properti
[简单]拼接字符串 53873039oycg 字符串
工作中遇到需要从Map里面取值拼接字符串的情况，自己写了个，不是很好，欢迎提出更优雅的写法，代码如下： import java.util.HashMap; import java.uti
Struts2学习云端月影
最近开始关注struts2的新特性，从这个版本开始，Struts开始使用convention-plugin代替codebehind-plugin来实现struts的零配置。配置文件精简了，的确是简便了开发过程，但是，我们熟悉的配置突然disappear了，真是一下很不适应。跟着潮流走吧，看看该怎样来搞定convention-plugin。使用Convention插件，你需要将其JAR文件放
Java新手入门的30个基本概念二 aijuans java 新手 java 入门
基本概念:　　1.OOP中唯一关系的是对象的接口是什么,就像计算机的销售商她不管电源内部结构是怎样的,他只关系能否给你提供电就行了,也就是只要知道can or not而不是how and why.所有的程序是由一定的属性和行为对象组成的,不同的对象的访问通过函数调用来完成,对象间所有的交流都是通过方法调用,通过对封装对象数据,很大限度上提高复用率。　　2.OOP中最重要的思想是类,类是模板是蓝图,
jedis 简单使用 antlove java redis cache command jedis
jedis.RedisOperationCollection.java package jedis; import org.apache.log4j.Logger; import redis.clients.jedis.Jedis; import java.util.List; import java.util.Map; import java.util.Set; pub
PL/SQL的函数和包体的基础百合不是茶 PL/SQL编程函数包体显示包的具体数据包
由于明天举要上课,所以刚刚将代码敲了一遍PL/SQL的函数和包体的实现(单例模式过几天好好的总结下再发出来);以便明天能更好的学习PL/SQL的循环,今天太累了,所以早点睡觉,明天继续PL/SQL总有一天我会将你永远的记载在心里,,, 函数; 函数:PL/SQL中的函数相当于java中的方法;函数有返回值定义函数的 --输入姓名找到该姓名的年薪 create or re
Mockito(二)--实例篇 bijian1013 持续集成 mockito 单元测试
学习了基本知识后，就可以实战了，Mockito的实际使用还是比较麻烦的。因为在实际使用中，最常遇到的就是需要模拟第三方类库的行为。比如现在有一个类FTPFileTransfer，实现了向FTP传输文件的功能。这个类中使用了a
精通Oracle10编程SQL(7)编写控制结构 bijian1013 oracle 数据库 plsql
/* *编写控制结构 */ --条件分支语句 --简单条件判断 DECLARE v_sal NUMBER(6,2); BEGIN select sal into v_sal from emp where lower(ename)=lower('&name'); if v_sal<2000 then update emp set
【Log4j二】Log4j属性文件配置详解 bit1129 log4j
如下是一个log4j.properties的配置 log4j.rootCategory=INFO, stdout , R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appe
java集合排序笔记白糖_ java
public class CollectionDemo implements Serializable,Comparable<CollectionDemo>{ private static final long serialVersionUID = -2958090810811192128L; private int id; private String nam
java导致linux负载过高的定位方法 ronin47
定位java进程ID 可以使用top或ps -ef |grep java ![图片描述][1] 根据进程ID找到最消耗资源的java pid 比如第一步找到的进程ID为5431 执行 top -p 5431 -H ![图片描述][2] 打印java栈信息 $ jstack -l 5431 > 5431.log 在栈信息中定位具体问题将消耗资源的Java PID转
给定能随机生成整数1到5的函数，写出能随机生成整数1到7的函数 bylijinnan 函数
import java.util.ArrayList; import java.util.List; import java.util.Random; public class RandNFromRand5 { /** 题目：给定能随机生成整数1到5的函数，写出能随机生成整数1到7的函数。解法1： f(k) = (x0-1)*5^0+(x1-
PL/SQL Developer保存布局 Kai_Ge
近日由于项目需要，数据库从DB2迁移到ORCAL，因此数据库连接客户端选择了PL/SQL Developer。由于软件运用不熟悉，造成了很多麻烦，最主要的就是进入后，左边列表有很多选项，自己删除了一些选项卡，布局很满意了，下次进入后又恢复了以前的布局，很是苦恼。在众多PL/SQL Developer使用技巧中找到如下这段： &n
[未来战士计划]超能查派[剧透,慎入] comsci 计划
非常好看,超能查派,这部电影......为我们这些热爱人工智能的工程技术人员提供一些参考意见和思想........ 虽然电影里面的人物形象不是非常的可爱....但是非常的贴近现实生活.... &nbs
Google Map API V2 dai_lm google map
以后如果要开发包含google map的程序就更麻烦咯 http://www.cnblogs.com/mengdd/archive/2013/01/01/2841390.html 找到篇不错的文章，大家可以参考一下 http://blog.sina.com.cn/s/blog_c2839d410101jahv.html 1. 创建Android工程由于v2的key需要G
java数据计算层的几种解决方法2 datamachine java sql 集算器
2、SQL SQL/SP/JDBC在这里属于一类，这是老牌的数据计算层，性能和灵活性是它的优势。但随着新情况的不断出现，单纯用SQL已经难以满足需求，比如： JAVA开发规模的扩大，数据量的剧增，复杂计算问题的涌现。虽然SQL得高分的指标不多，但都是权重最高的。成熟度：5星。最成熟的。
Linux下Telnet的安装与运行 dcj3sjt126com linux telnet
Linux下Telnet的安装与运行 linux默认是使用SSH服务的而不安装telnet服务如果要使用telnet 就必须先安装相应的软件包即使安装了软件包默认的设置telnet 服务也是不运行的需要手工进行设置如果是redhat9，则在第三张光盘中找到 telnet-server-0.17-25.i386.rpm
PHP中钩子函数的实现与认识 dcj3sjt126com PHP
假如有这么一段程序： function fun(){ fun1(); fun2(); } 首先程序执行完fun1()之后执行fun2()然后fun()结束。但是，假如我们想对函数做一些变化。比如说，fun是一个解析函数，我们希望后期可以提供丰富的解析函数，而究竟用哪个函数解析，我们希望在配置文件中配置。这个时候就可以发挥钩子的力量了。我们可以在fu
EOS中的WorkSpace密码修改蕃薯耀修改WorkSpace密码
EOS中BPS的WorkSpace密码修改 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 201
SpringMVC4零配置--SpringSecurity相关配置【SpringSecurityConfig】 hanqunfeng SpringSecurity
SpringSecurity的配置相对来说有些复杂，如果是完整的bean配置，则需要配置大量的bean，所以xml配置时使用了命名空间来简化配置，同样，spring为我们提供了一个抽象类WebSecurityConfigurerAdapter和一个注解@EnableWebMvcSecurity，达到同样减少bean配置的目的，如下： applicationContex
ie 9 kendo ui中ajax跨域的问题 jackyrong AJAX跨域
这两天遇到个问题，kendo ui的datagrid，根据json去读取数据，然后前端通过kendo ui的datagrid去渲染，但很奇怪的是，在ie 10,ie 11,chrome,firefox等浏览器中，同样的程序，浏览起来是没问题的，但把应用放到公网上的一台服务器，却发现如下情况： 1） ie 9下，不能出现任何数据，但用IE 9浏览器浏览本机的应用，却没任何问题
不要让别人笑你不能成为程序员 lampcy 编程程序员
在经历六个月的编程集训之后，我刚刚完成了我的第一次一对一的编码评估。但是事情并没有如我所想的那般顺利。说实话，我感觉我的脑细胞像被轰炸过一样。手慢慢地离开键盘，心里很压抑。不禁默默祈祷：一切都会进展顺利的，对吧？至少有些地方我的回答应该是没有遗漏的，是不是？难道我选择编程真的是一个巨大的错误吗——我真的永远也成不了程序员吗？我需要一点点安慰。在自我怀疑，不安全感和脆弱等等像龙卷风一
马皇后的贤德 nannan408
马皇后不怕朱元璋的坏脾气，并敢理直气壮地吹耳边风。众所周知，朱元璋不喜欢女人干政，他认为“后妃虽母仪天下，然不可使干政事”，因为“宠之太过，则骄恣犯分，上下失序”，因此还特地命人纂述《女诫》，以示警诫。但马皇后是个例外。　　有一次，马皇后问朱元璋道：“如今天下老百姓安居乐业了吗？”朱元璋不高兴地回答：“这不是你应该问的。”马皇后振振有词地回敬道：“陛下是天下之父，
选择某个属性值最大的那条记录（不仅仅包含指定属性，而是想要什么属性都可以） Rainbow702 sql group by 最大值 max 最大的那条记录
好久好久不写SQL了，技能退化严重啊！！！直入主题：比如我有一张表，file_info，它有两个属性（但实际不只，我这里只是作说明用）： file_code, file_version 同一个code可能对应多个version 现在，我想针对每一个code，取得它相关的记录中，version 值最大的那条记录， SQL如下： select *
VBScript脚本语言 tntxia VBScript
VBScript 是基于VB的脚本语言。主要用于Asp和Excel的编程。 VB家族语言简介 Visual Basic 6.0 源于BASIC语言。由微软公司开发的包含协助开发环境的事
java中枚举类型的使用 xiao1zhao2 java enum 枚举 1.5新特性
枚举类型是j2se在1.5引入的新的类型,通过关键字enum来定义,常用来存储一些常量. 1.定义一个简单的枚举类型 public enum Sex { MAN, WOMAN } 枚举类型本质是类,编译此段代码会生成.class文件.通过Sex.MAN来访问Sex中的成员,其返回值是Sex类型. 2.常用方法静态的values()方