Spark 3.0 主要feature

SPARK-11215 Multiple columns support added to various Transformers: StringIndexer

SPARK-11150 Implement Dynamic Partition Pruning

SPARK-13677 Support Tree-Based Feature Transformation

SPARK-16692 Add MultilabelClassificationEvaluator

SPARK-19591 Add sample weights to decision trees

SPARK-19712 Pushing Left Semi and Left Anti joins through Project, Aggregate, Window, Union etc.

SPARK-19827 R API for Power Iteration Clustering

SPARK-20286 Improve logic for timing out executors in dynamic allocation

SPARK-20636 Eliminate unnecessary shuffle with adjacent Window expressions

SPARK-22148 Acquire new executors to avoid hang because of blacklisting

SPARK-22796 Multiple columns support added to various Transformers: PySpark QuantileDiscretizer

SPARK-23128 A new approach to do adaptive execution in Spark SQL

SPARK-23155 Apply custom log URL pattern for executor log URLs in SHS

SPARK-23539 Add support for Kafka headers

SPARK-23674 Add Spark ML Listener for Tracking ML Pipeline Status

SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

SPARK-24333 Add fit with validation set to Gradient Boosted Trees: Python API

SPARK-24417 Build and Run Spark on JDK11

SPARK-24615 Accelerator-aware task scheduling for Spark

SPARK-24920 Allow sharing Netty's memory pool allocators

SPARK-25250 Fix race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple times

SPARK-25341 Support rolling back a shuffle map stage and re-generate the shuffle files

SPARK-25348 Data source for binary files

SPARK-25390 data source V2 API refactoring

SPARK-25501 Add Kafka delegation token support

SPARK-25603 Generalize Nested Column Pruning

SPARK-26132 Remove support for Scala 2.11 in Spark 3.0.0

SPARK-26215 define reserved keywords after SQL standard

SPARK-26412 Allow Pandas UDF to take an iterator of pd.DataFrames

SPARK-26651 Use Proleptic Gregorian calendar

SPARK-26759 Arrow optimization in SparkR's interoperability

SPARK-26848 Introduce new option to Kafka source: offset by timestamp (starting/ending)

SPARK-27064 create StreamingWrite at the beginning of streaming execution

SPARK-27119 Do not infer schema when reading Hive serde table with native data source

SPARK-27225 Implement join strategy hints

SPARK-27240 Use pandas DataFrame for struct type argument in Scalar Pandas UDF

SPARK-27338 Fix deadlock between TaskMemoryManager and UnsafeExternalSorter$SpillableIterator

SPARK-27396 Public APIs for extended Columnar Processing Support

SPARK-27463 Support Dataframe Cogroup via Pandas UDFs

SPARK-27589 Re-implement file sources with data source V2 API

SPARK-27677 Disk-persisted RDD blocks served by shuffle service, and ignored for Dynamic Allocation

SPARK-27699 Partially push down disjunctive predicated in Parquet/ORC

SPARK-27763 Port test cases from PostgreSQL to Spark SQL

SPARK-27884 Deprecate Python 2 support

SPARK-27921 Convert applicable *.sql tests into UDF integrated test base

SPARK-27963 Allow dynamic allocation without an external shuffle service

SPARK-28177 Adjust post shuffle partition number in adaptive execution

SPARK-28199 Move Trigger implementations to Triggers.scala and avoid exposing these to the end users

SPARK-28372 Document Spark WEB UI

SPARK-28399 RobustScaler feature transformer

SPARK-28426 Metadata Handling in Thrift Server

SPARK-28588 Build a SQL reference doc

SPARK-28608 Improve test coverage of ThriftServer

SPARK-28753 Dynamically reuse subqueries in AQE

SPARK-28855 Remove outdated Experimental, Evolving annotations

SPARK-29345 Add an API that allows a user to define and observe arbitrary metrics on streaming queries

SPARK-25908 SPARK-28980 Remove deprecated items since <= 2.2.0

你可能感兴趣的:(Spark 3.0 主要feature)