Spark SQL functions.scala 源码解析(一)Sort functions (基于 Spark 3.3.0)

前言

本文隶属于专栏《1000个问题搞定大数据技术体系》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!

本专栏目录结构和参考文献请见1000个问题搞定大数据技术体系

目录

Spark SQL functions.scala 源码解析(一)Sort functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(二)Aggregate functions(基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(三)Window functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(四)Non-aggregate functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(五)Math Functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(六)Misc functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(七)String functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(八)DateTime functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(九)Collection functions (基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(十)Partition transform functions(基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(十一)Scala UDF functions(基于 Spark 3.3.0)

Spark SQL functions.scala 源码解析(十二)Java UDF functions(基于 Spark 3.3.0)

正文

asc

  /**
   * 根据列的升序返回排序表达式。
   * 
   * {{{
   *   df.sort(asc("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 1.3.0
   */
  def asc(columnName: String): Column = Column(columnName).asc

  /**
   * 根据列的升序返回排序表达式,空值在非空值之前返回。
   * 
   * {{{
   *   df.sort(asc_nulls_first("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def asc_nulls_first(columnName: String): Column = Column(columnName).asc_nulls_first

  /**
   * 根据列的升序返回排序表达式,空值在非空值之后返回。
   * 
   * {{{
   *   df.sort(asc_nulls_last("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def asc_nulls_last(columnName: String): Column = Column(columnName).asc_nulls_last

用法

========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

========== [ df.sort(asc_nulls_first("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

========== [ df.sort(asc_nulls_last("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
|  David|null|  35|
+-------+----+----+

desc

  /**
   * 根据列的降序返回排序表达式。
   * 
   * {{{
   *   df.sort(asc("dept"), desc("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 1.3.0
   */
  def desc(columnName: String): Column = Column(columnName).desc

  /**
   * 根据列的降序返回排序表达式,空值显示在非空值之前。
   * 
   * {{{
   *   df.sort(asc("dept"), desc_nulls_first("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def desc_nulls_first(columnName: String): Column = Column(columnName).desc_nulls_first

  /**
   * 根据列的降序返回排序表达式,空值显示在非空值之后。
   * 
   * {{{
   *   df.sort(asc("dept"), desc_nulls_last("age"))
   * }}}
   *
   * @group sort_funcs
   * @since 2.1.0
   */
  def desc_nulls_last(columnName: String): Column = Column(columnName).desc_nulls_last

用法

========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

========== [ df.sort(asc("dept"), desc_nulls_first("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Joan|  PD|null|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
+-------+----+----+

========== [ df.sort(asc("dept"), desc_nulls_last("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

实践

数据

employees.csv

Alan,PD,30
Bob,HR,28
Bruce,PD,24
Charles,ED,34
David,,35
Joan,PD,

代码

package com.shockang.study.spark.sql.functions

import com.shockang.study.spark.util.Utils.formatPrint
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

/**
 *
 * @author Shockang
 */
object SortFunctionsExample {

  val DATA_PATH = "/Users/shockang/code/spark-examples/data/simple/read/employees.csv"

  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.OFF)
    val spark = SparkSession.builder().appName("SortFunctionsExample").master("local[*]").getOrCreate()

    val df = spark.read.csv(DATA_PATH).toDF("name", "dept", "age").cache()

    // asc
    formatPrint("""df.sort(asc("dept"), desc("age"))""")
    df.sort(asc("dept"), desc("age")).show()
    formatPrint("""df.sort(asc_nulls_first("dept"), desc("age"))""")
    df.sort(asc_nulls_first("dept"), desc("age")).show()
    formatPrint("""df.sort(asc_nulls_last("dept"), desc("age"))""")
    df.sort(asc_nulls_last("dept"), desc("age")).show()

    //desc
    formatPrint("""df.sort(asc("dept"), desc("age"))""")
    df.sort(asc("dept"), desc("age")).show()
    formatPrint("""df.sort(asc("dept"), desc_nulls_first("age"))""")
    df.sort(asc("dept"), desc_nulls_first("age")).show()
    formatPrint("""df.sort(asc("dept"), desc_nulls_last("age"))""")
    df.sort(asc("dept"), desc_nulls_last("age")).show()
  }
}

输出

========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

========== [ df.sort(asc_nulls_first("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

========== [ df.sort(asc_nulls_last("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
|  David|null|  35|
+-------+----+----+

========== [ df.sort(asc("dept"), desc("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

========== [ df.sort(asc("dept"), desc_nulls_first("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Joan|  PD|null|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
+-------+----+----+

========== [ df.sort(asc("dept"), desc_nulls_last("age")) ] ==========
+-------+----+----+
|   name|dept| age|
+-------+----+----+
|  David|null|  35|
|Charles|  ED|  34|
|    Bob|  HR|  28|
|   Alan|  PD|  30|
|  Bruce|  PD|  24|
|   Joan|  PD|null|
+-------+----+----+

你可能感兴趣的:(大数据技术体系,spark,sql)