zbf8441372

Spark Catalyst 源码分析

Architecture

Ø 把输入的SQL，parse成unresolved logical plan，这一步参考SqlParser的实现

Ø 把unresolved logical plan转化成resolved logical plan，这一步参考analysis的实现

Ø 把resolved logical plan转化成optimized logical plan，这一步参考optimize的实现

Ø 把optimized logical plan转化成physical plan，这一步参考QueryPlanner Strategy的实现

Source Code Module

Rule

Rule是一个抽象类，拥有一个名字，默认为类名。Rule的实现有很多，渗透在不同的处理过程(analyze, optimize)里。

RuleExecutor是规则执行类，下面两个实现会具体讲：
Analyzer
Optimizer

RuleExecutor 支持的策略：一次或多次。用来控制 converge结束的条件。这里的Strategy名字感觉有点误导人。

/**
   * An execution strategy for rules that indicates the maximum number of executions. If the
   * execution reaches fix point (i.e. converge) before maxIterations, it will stop.
   */
  abstract class Strategy { def maxIterations: Int }

  /** A strategy that only runs once. */
  case object Once extends Strategy { val maxIterations = 1 }

  /** A strategy that runs until fix point or maxIterations times, whichever comes first. */
  case class FixedPoint(maxIterations: Int) extends Strategy

RuleExecutor的Batch类和batches变量：

/** A batch of rules. */
  protected case class Batch(name: String, strategy: Strategy, rules: Rule[TreeType]*)

  /** Defines a sequence of rule batches, to be overridden by the implementation. */
  protected val batches: Seq[Batch]

一个batch有多个Rule

batches在apply()里执行的时候，把一个plan丢进来后，用左折叠处理每个batch，最后吐出来一个plan。
converge的条件是达到最大策略次数或者两个TreeNode相等。apply()处理过程如下：

/**
   * Executes the batches of rules defined by the subclass. The batches are executed serially
   * using the defined execution strategy. Within each batch, rules are also executed serially.
   */
  def apply(plan: TreeType): TreeType = {
    var curPlan = plan

    batches.foreach { batch =>
      var iteration = 1 
      var lastPlan = curPlan
      curPlan = batch.rules.foldLeft(curPlan) { case (plan, rule) => rule(plan) }

      // Run until fix point (or the max number of iterations as specified in the strategy.
      while (iteration < batch.strategy.maxIterations && !curPlan.fastEquals(lastPlan)) {
        lastPlan = curPlan
        curPlan = batch.rules.foldLeft(curPlan) {
          case (plan, rule) =>
            val result = rule(plan)
            if (!result.fastEquals(plan)) {
              logger.debug(...)
            }
            result
        }
        iteration += 1
      }
    }
    curPlan
  }

下面具体介绍RuleExecutor的实现。

Analyzer

Analyzer使用于对最初的unresolved logical plan转化成为logical plan。这部分的分析会涵盖整个analysis package。

作用是把未确定的属性和关系，通过Schema信息（来自于Catalog类）和方法注册类来确定下来，这个过程中有三步，第三步会包含许多次的迭代。

/**
 * Provides a logical query plan analyzer, which translates [[UnresolvedAttribute]]s and
 * [[UnresolvedRelation]]s into fully typed objects using information in a schema [[Catalog]] and
 * a [[FunctionRegistry]].
 */
class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)
  extends RuleExecutor[LogicalPlan] with HiveTypeCoercion {

首先，Catalog类是一个记录表信息的类，专门提供给Analyzer用。

trait Catalog {
  def lookupRelation(
    databaseName: Option[String],
    tableName: String,
    alias: Option[String] = None): LogicalPlan

  def registerTable(databaseName: Option[String], tableName: String, plan: LogicalPlan): Unit
}

看一个SimpleCatalog的实现，该类在SQLContext里使用，把表名和LogicalPlan存在HashMap里维护起来，生命周期随上下文。提供注册表、删除表、查找表的功能。

class SimpleCatalog extends Catalog {
  val tables = new mutable.HashMap[String, LogicalPlan]()

  def registerTable(databaseName: Option[String],tableName: String, plan: LogicalPlan): Unit = {
    tables += ((tableName, plan))
  }

  def dropTable(tableName: String) = tables -= tableName

  def lookupRelation(
      databaseName: Option[String],
      tableName: String,
      alias: Option[String] = None): LogicalPlan = {
    val table = tables.get(tableName).getOrElse(sys.error(s"Table Not Found: $tableName"))

    // If an alias was specified by the lookup, wrap the plan in a subquery so that attributes are
    // properly qualified with this alias.
    alias.map(a => Subquery(a.toLowerCase, table)).getOrElse(table)
  }
}

在查找的时候可以代入一个别名，会把他包装成一个Subquery。Subquery是个简单的case class。

case class Subquery(alias: String, child: LogicalPlan) extends UnaryNode {
  def output = child.output.map(_.withQualifiers(alias :: Nil))
  def references = Set.empty
}

FunctionRegistry类似于Catalog，记录的是函数，在hive package里，处理的是Hive的UDF

trait FunctionRegistry {
  def lookupFunction(name: String, children: Seq[Expression]): Expression
}

FunctionRegistry的实现在Catalyst里目前只有一个（在Hive模块里有实现，具体在最后一节Hive内），如下，如果你要查找方法，就会抛异常。

/**
 * A trivial catalog that returns an error when a function is requested.  Used for testing when all
 * functions are already filled in and the analyser needs only to resolve attribute references.
 */
object EmptyFunctionRegistry extends FunctionRegistry {
  def lookupFunction(name: String, children: Seq[Expression]): Expression = {
    throw new UnsupportedOperationException
  }
}

回到Analyzer，SQLContext在使用Analyzer前，这样生成：

@transient
protected[sql] lazy val catalog: Catalog = new SimpleCatalog
protected[sql] lazy val analyzer: Analyzer =
    new Analyzer(catalog, EmptyFunctionRegistry, caseSensitive = true)

接下来看Catalyst现在的Analyzer作为一个RuleExecutor，已经实现的功能：

class Analyzer(catalog: Catalog, registry: FunctionRegistry, caseSensitive: Boolean)
  extends RuleExecutor[LogicalPlan] with HiveTypeCoercion {

  // TODO: pass this in as a parameter.
  val fixedPoint = FixedPoint(100)

  val batches: Seq[Batch] = Seq(
    Batch("MultiInstanceRelations", Once,
      NewRelationInstances),
    Batch("CaseInsensitiveAttributeReferences", Once,
      (if (caseSensitive) Nil else LowercaseAttributeReferences :: Nil) : _*),
    Batch("Resolution", fixedPoint,
      ResolveReferences ::
      ResolveRelations ::
      NewRelationInstances ::
      ImplicitGenerate ::
      StarExpansion ::
      ResolveFunctions ::
      GlobalAggregates ::
      typeCoercionRules :_*)
  )

下面分别分析三个batch里面的Rule做的事情。

Batch One

首先是第一个batch里的NewRelationInstance这条Rule，他的作用就是避免一个逻辑计划上同一个实例出现多次，如果出现就生成一个新的plan，保证每个表达式id都唯一。

/**
 * If any MultiInstanceRelation appears more than once in the query plan then the plan is updated so
 * that each instance has unique expression ids for the attributes produced.
 */
object NewRelationInstances extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = {
    val localRelations = plan collect { case l: MultiInstanceRelation => l} // 这一步是搜集所有的多实例关系
    val multiAppearance = localRelations
      .groupBy(identity[MultiInstanceRelation])
      .filter { case (_, ls) => ls.size > 1 }
      .map(_._1)
      .toSet // 这一步是做过滤

    plan transform { // 这一步是把原来plan里的多实例关系，凡是出现多个，就变成一个新的单一实例
      case l: MultiInstanceRelation if multiAppearance contains l => l.newInstance
    }
  }
}

LogicalPlan本身是TreeNode的子类，TreeNode具备collect等一些scala collection操作的能力，这个例子里第一步搜集的过程中体现了collect能力。

TreeNode是Catalyst里的重要基础类，后面有小节会具体讲。

Batch Two

第二个batch是大小写相关的，如果对大小写不敏感，那么就执行LowercaseAttributeReferences这条Rule，会把所有的属性都变成小写

/**
   * Makes attribute naming case insensitive by turning all UnresolvedAttributes to lowercase.
   */
  object LowercaseAttributeReferences extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
      case UnresolvedRelation(databaseName, name, alias) => // 第一类：未确定的关系
        UnresolvedRelation(databaseName, name, alias.map(_.toLowerCase))
      case Subquery(alias, child) => Subquery(alias.toLowerCase, child) // 第二类：子查询
      case q: LogicalPlan => q transformExpressions { // 第三类： 其他类型
        case s: Star => s.copy(table = s.table.map(_.toLowerCase))  // 指的是 * 号
        case UnresolvedAttribute(name) => UnresolvedAttribute(name.toLowerCase) // 未确定的属性
        case Alias(c, name) => Alias(c, name.toLowerCase)() // 别名
      }
    }
  }

transform，transformExpressions是TreeNode提供的方法，用于前序遍历树(pre-order)。

从这个处理可以看到logicalPlan里面包含的种类。后续Expression这一块具体还要展开介绍。

Alias的一点注释：

/**
 * Used to assign a new name to a computation.
 * For example the SQL expression "1 + 1 AS a" could be represented as follows:
 *  Alias(Add(Literal(1), Literal(1), "a")()
 *

Batch Three

Resulotion是第三类batch，定义的结束条件是循环100次。下面是我加的注释，大致介绍Rule的作用，并挑选几个Rule的实现介绍。

Batch("Resolution", fixedPoint,
      ResolveReferences :: // 确定属性
      ResolveRelations :: // 确定关系（从catalog里）
      NewRelationInstances :: // 去掉同一个实例出现多次的情况
      ImplicitGenerate :: // 把包含Generator且只有一条的表达式转化成Generate操作
      StarExpansion :: // 扩张 * 
      ResolveFunctions :: // 确定方法（从FunctionRegistry里）
      GlobalAggregates :: // 把包含Aggregate的表达式转化成Aggregate操作
      typeCoercionRules :_*) // 来自于HiveTypeCoercion，主要针对Hive语法做强制转换，包含多种规则

用post-order遍历树，把未确定的属性确定下来。如果没有做成功，未确定的属性依然会留下来，留给下一次迭代的时候再确定。

/**
   * Replaces [[UnresolvedAttribute]]s with concrete
   * [[expressions.AttributeReference AttributeReferences]] from a logical plan node's children.
   */
  object ResolveReferences extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
      case q: LogicalPlan if q.childrenResolved =>
        logger.trace(s"Attempting to resolve ${q.simpleString}")
        q transformExpressions {
          case u @ UnresolvedAttribute(name) =>
            // Leave unchanged if resolution fails.  Hopefully will be resolved next round.
            val result = q.resolve(name).getOrElse(u)
            logger.debug(s"Resolving $u to $result")
            result
        }
    }
  }

确定是通过LogicalPlan的resolve方法做的。这个具体在LogicalPlan里介绍，resolve方法是LogicalPlan的唯一且重要方法。

从catalog里查找关系

/**
   * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog.
   */
  object ResolveRelations extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
      case UnresolvedRelation(databaseName, name, alias) =>
        catalog.lookupRelation(databaseName, name, alias)
    }
  }

Generator是表达式的一种，根据一种inputrow产生0个或多个rows。

/**
   * When a SELECT clause has only a single expression and that expression is a
   * [[catalyst.expressions.Generator Generator]] we convert the
   * [[catalyst.plans.logical.Project Project]] to a [[catalyst.plans.logical.Generate Generate]].
   */
  object ImplicitGenerate extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
      case Project(Seq(Alias(g: Generator, _)), child) =>
        Generate(g, join = false, outer = false, None, child)
    }
  }

确定方法类似确定关系。

/**
   * Replaces [[UnresolvedFunction]]s with concrete [[expressions.Expression Expressions]].
   */
  object ResolveFunctions extends Rule[LogicalPlan] {
    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
      case q: LogicalPlan =>
        q transformExpressions {
          case u @ UnresolvedFunction(name, children) if u.childrenResolved =>
            registry.lookupFunction(name, children)
        }
    }
  }

换针对Hive语法做强制转换，规则如下

trait HiveTypeCoercion {
  val typeCoercionRules = List(PropagateTypes, ConvertNaNs, WidenTypes, PromoteStrings, BooleanComparisons, BooleanCasts, StringToIntegralCasts, FunctionArgumentConversion)

举个简单的例子来看下表达式的使用和替换：

/**
   * Converts string "NaN"s that are in binary operators with a NaN-able types (Float / Double) * to the appropriate numeric equivalent.
   */
  object ConvertNaNs extends Rule[LogicalPlan] {
    val stringNaN = Literal("NaN", StringType)

    def apply(plan: LogicalPlan): LogicalPlan = plan transform {
      case q: LogicalPlan => q transformExpressions {
        // Skip nodes who's children have not been resolved yet.
        case e if !e.childrenResolved => e

        /* Double Conversions */
        case b: BinaryExpression if b.left == stringNaN && b.right.dataType == DoubleType =>
          b.makeCopy(Array(b.right, Literal(Double.NaN)))
        case b: BinaryExpression if b.left.dataType == DoubleType && b.right == stringNaN =>
          b.makeCopy(Array(Literal(Double.NaN), b.left))
        case b: BinaryExpression if b.left == stringNaN && b.right == stringNaN =>
          b.makeCopy(Array(Literal(Double.NaN), b.left))

        /* Float Conversions */
        case b: BinaryExpression if b.left == stringNaN && b.right.dataType == FloatType =>
          b.makeCopy(Array(b.right, Literal(Float.NaN)))
        case b: BinaryExpression if b.left.dataType == FloatType && b.right == stringNaN =>
          b.makeCopy(Array(Literal(Float.NaN), b.left))
        case b: BinaryExpression if b.left == stringNaN && b.right == stringNaN =>
          b.makeCopy(Array(Literal(Float.NaN), b.left))
      }
    }
  }

Optimizer

Optimizer用于把analyzedplan转化成为optimized plan。目前Catalyst的optimizer包下就这一个类，SQLContext也是直接使用的这个类。

同样，我们看一下里面包括了哪些处理过程：

object Optimizer extends RuleExecutor[LogicalPlan] {
  val batches =
    Batch("Subqueries", Once,
      EliminateSubqueries) ::
    Batch("ConstantFolding", Once,
      ConstantFolding,
      BooleanSimplification,
      SimplifyCasts) ::
    Batch("Filter Pushdown", Once,
      EliminateSubqueries,
      CombineFilters,
      PushPredicateThroughProject,
      PushPredicateThroughInnerJoin) :: Nil
}

Batch One

和子查询相关的一批规则，包含一条消除子查询的规则：EliminateSubqueries

/**
 * Removes [[catalyst.plans.logical.Subquery Subquery]] operators from the plan.  Subqueries are
 * only required to provide scoping information for attributes and can be removed once analysis is
 * complete.
 */
object EliminateSubqueries extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case Subquery(_, child) => child // 处理方式是凡是带child的，都用child替换自己
  }
}

注释提到，过了analysis这一步之后，子查询就可以移除了。

Batch Two

第二批规则，常量折叠。

Batch("ConstantFolding", Once,
      ConstantFolding, // 常量折叠
      BooleanSimplification, // 提早短路掉布尔表达式
      SimplifyCasts) // 去掉多余的Cast操作

具体看：

/**
 * Replaces [[catalyst.expressions.Expression Expressions]] that can be statically evaluated with
 * equivalent [[catalyst.expressions.Literal Literal]] values.
 */
object ConstantFolding extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case q: LogicalPlan => q transformExpressionsDown {
      // Skip redundant folding of literals.
      case l: Literal => l
      case e if e.foldable => Literal(e.apply(null), e.dataType)
    }
  }
}

这里不得不提一下foldable字段在Expression类里的定义：

/**
   * Returns true when an expression is a candidate for static evaluation before the query is
   * executed.
   *
   * The following conditions are used to determine suitability for constant folding:
   *  - A [[expressions.Coalesce Coalesce]] is foldable if all of its children are foldable
   *  - A [[expressions.BinaryExpression BinaryExpression]] is foldable if its both left and right
   *    child are foldable
   *  - A [[expressions.Not Not]], [[expressions.IsNull IsNull]], or
   *    [[expressions.IsNotNull IsNotNull]] is foldable if its child is foldable.
   *  - A [[expressions.Literal]] is foldable.
   *  - A [[expressions.Cast Cast]] or [[expressions.UnaryMinus UnaryMinus]] is foldable if its
   *    child is foldable.
   */
  // TODO: Supporting more foldable expressions. For example, deterministic Hive UDFs.
  def foldable: Boolean = false

只有Literal表达式是foldable的，其余表达式必须表达式中每个元素都满足foldable。

第二种规则也好理解，简化布尔表达式。也就是早早地给表达式做一个短路判断。

/**
 * Simplifies boolean expressions where the answer can be determined without evaluating both sides.
 * Note that this rule can eliminate expressions that might otherwise have been evaluated and thus
 * is only safe when evaluations of expressions does not result in side effects.
 */
object BooleanSimplification extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
    case q: LogicalPlan => q transformExpressionsUp {
      case and @ And(left, right) =>
        (left, right) match {
          case (Literal(true, BooleanType), r) => r
          case (l, Literal(true, BooleanType)) => l
          case (Literal(false, BooleanType), _) => Literal(false)
          case (_, Literal(false, BooleanType)) => Literal(false)
          case (_, _) => and
        }

      case or @ Or(left, right) =>
        (left, right) match {
          case (Literal(true, BooleanType), _) => Literal(true)
          case (_, Literal(true, BooleanType)) => Literal(true)
          case (Literal(false, BooleanType), r) => r
          case (l, Literal(false, BooleanType)) => l
          case (_, _) => or
        }
    }
  }
}

把Cast操作全部移走。

/**
 * Removes [[catalyst.expressions.Cast Casts]] that are unnecessary because the input is already
 * the correct type.
 */
object SimplifyCasts extends Rule[LogicalPlan] {
  def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
    case Cast(e, dataType) if e.dataType == dataType => e
  }
}

Batch Three

一批过滤下推规则，

Batch("Filter Pushdown", Once,
      EliminateSubqueries, // 消除子查询
      CombineFilters, // 过滤操作取合集
      PushPredicateThroughProject, // 为映射操作下推谓词
      PushPredicateThroughInnerJoin) // 为inner join下推谓词

具体不一一列举了。

SQLContext

SQLContext的这一个RuleExecutor实现已经到了物理执行计划SparkPlan的处理了。也是一种实现，注册了自己的batch，如下：

/**
   * Prepares a planned SparkPlan for execution by binding references to specific ordinals, and
   * inserting shuffle operations as needed.
   */
  @transient
  protected[sql] val prepareForExecution = new RuleExecutor[SparkPlan] {
    val batches =
      Batch("Add exchange", Once, AddExchange) ::
      Batch("Prepare Expressions", Once, new BindReferences[SparkPlan]) :: Nil
  }

以上就是Rule包，及RuleExecutor在各处的实现。其中Analyze和Optimize是Catalyst目前提供的，SQL组件直接拿来使用。

TreeNode

TreeNode Library支持的三个特性：

· Scala collection like methods (foreach, map, flatMap, collect, etc)

· transform accepts a partial function that is used to generate a newtree.

· debugging support pretty printing, easy splicing of trees, etc.

Collection操作能力

偏函数

继承结构

全局唯一id

object TreeNode {
  private val currentId = new java.util.concurrent.atomic.AtomicLong
  protected def nextId() = currentId.getAndIncrement()
}

几种节点

/**
 * A [[TreeNode]] that has two children, [[left]] and [[right]].
 */
trait BinaryNode[BaseType <: TreeNode[BaseType]] {
  def left: BaseType
  def right: BaseType

  def children = Seq(left, right)
}

/**
 * A [[TreeNode]] with no children.
 */
trait LeafNode[BaseType <: TreeNode[BaseType]] {
  def children = Nil
}

/**
 * A [[TreeNode]] with a single [[child]].
 */
trait UnaryNode[BaseType <: TreeNode[BaseType]] {
  def child: BaseType
  def children = child :: Nil
}

每个node唯一id，导致在比较的时候，不同分支上长得一样结构的node也不相同，比较如下：

  def sameInstance(other: TreeNode[_]): Boolean = {
    this.id == other.id
  }

  def fastEquals(other: TreeNode[_]): Boolean = {
    sameInstance(other) || this == other
  }

foreach的时候，先做自己，再把孩子们做一遍
def foreach(f: BaseType => Unit): Unit = {
    f(this)
    children.foreach(_.foreach(f))
  }

map的时候是按前序对每个节点都做一次处理

def map[A](f: BaseType => A): Seq[A] = {
    val ret = new collection.mutable.ArrayBuffer[A]()
    foreach(ret += f(_))
    ret
  }

其他的很多变化都类似，接收的是函数或偏函数，把他们作用到匹配的节点上去执行

变化总共有这些，按类别分：

map, flatMap, collect,

mapChildren, withNewChildren,

transform, transformDown, transformChildrenDown 前序

transformUp, transformChildrenUp 后序

基本上就这些，其实就是提供对这棵树及其子节点的顺序遍历和处理能力

Plan

QueryPlan的继承结构

QueryPlan提供了三个东西，

Ø 其一是定义了output，是对外输出的一个属性序列

def output:Seq[Attribute]

Ø 其二是借用TreeNode的那套transform方法，实现了一套transformExpression方法，用途是把partialfunction遍历到各个子节点上。

Ø 其三是一个expressions方法，返回Seq[expression]，用于搜集本query里所有的表达式。

QueryPlan在Catalyst里的实现是LogicalPlan，在SQL组件里的实现是SparkPlan，前者主要要被处理、分析和优化，后者是真正被处理执行的，下面简单介绍两者。

Logical Plan

在QueryPlan上增加的几个属性：

1. references 用于生成output属性列表的参考属性列表

def references: Set[Attribute]

2. lazy val inputSet: Set[Attribute] = children.flatMap(_.output).toSet

3. 自己及children是否resolved

4. resolve方法，重要，看起来费劲

def resolve(name: String): Option[NamedExpression] = {
    val parts = name.split("\\.")
    // Collect all attributes that are output by this nodes children where either the first part
    // matches the name or where the first part matches the scope and the second part matches the
    // name.  Return these matches along with any remaining parts, which represent dotted access to
    // struct fields.
    val options = children.flatMap(_.output).flatMap { option =>
      // If the first part of the desired name matches a qualifier for this possible match, drop it.
      val remainingParts = if (option.qualifiers contains parts.head) parts.drop(1) else parts
      if (option.name == remainingParts.head) (option, remainingParts.tail.toList) :: Nil else Nil
    }

    options.distinct match {
      case (a, Nil) :: Nil => Some(a) // One match, no nested fields, use it.
      // One match, but we also need to extract the requested nested field.
      case (a, nestedFields) :: Nil =>
        a.dataType match {
          case StructType(fields) =>
            Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)())
          case _ => None // Don't know how to resolve these field references
        }
      case Nil => None         // No matches.
      case ambiguousReferences =>
        throw new TreeNodeException(
          this, s"Ambiguous references to $name: ${ambiguousReferences.mkString(",")}")
    }
  }

三种抽象子类：

/**
 * A logical plan node with no children.
 */
abstract class LeafNode extends LogicalPlan with trees.LeafNode[LogicalPlan] {
  self: Product =>
  // Leaf nodes by definition cannot reference any input attributes.
  def references = Set.empty
}

/**
 * A logical plan node with single child.
 */
abstract class UnaryNode extends LogicalPlan with trees.UnaryNode[LogicalPlan] {
  self: Product =>
}

/**
 * A logical plan node with a left and right child.
 */
abstract class BinaryNode extends LogicalPlan with trees.BinaryNode[LogicalPlan] {
  self: Product =>
}

分别看LogicalPlan的三种Node的实现结构：LeafNode，UnaryNode，BinaryNode

LeafNode

/**
 * A logical node that represents a non-query command to be executed by the system.  For example,
 * commands can be used by parsers to represent DDL operations.
 */
abstract class Command extends LeafNode {
  self: Product => 
  def output = Seq.empty
}

/**
 * Returned for commands supported by a given parser, but not catalyst.  In general these are DDL
 * commands that are passed directly to another system.
 */
case class NativeCommand(cmd: String) extends Command

/**
 * Returned by a parser when the users only wants to see what query plan would be executed, without
 * actually performing the execution.
 */
case class ExplainCommand(plan: LogicalPlan) extends Command

case object NoRelation extends LeafNode {
  def output = Nil
}

对于Command和BaseRelation，在sql.hive包内有更多实现

MetastoreRelation的作用在Hive一节会说明。

Command略。

UnaryNode

BinaryNode

Spark Plan

SparkPlan类继承结构如下图：

在SQL模块的execution package的basicOperator类里，有许多SparkPlan的实现，包括

Project，Filter，Sample，Union，StopAfter，TopK，Sort，ExsitingRdd

这些实现和Catalyst的basicOperator类里有很多重了，区别在于，SparkPlan是QueryPlan的实现，同logical plan不同的是，SparkPlan会被Spark实现的Strategy真正执行，所以SQL模块里的basicOperator内的这些caseclass，比Catalyst多了execute()方法

具体Spark策略的实现参考下一小节。

Planning

Query Planner

QueryPlanner的职责是把逻辑执行计划转化成为物理执行计划，具备一系列Strategy的实现。

abstract class QueryPlanner[PhysicalPlan <: TreeNode[PhysicalPlan]] {
  /** A list of execution strategies that can be used by the planner */
  def strategies: Seq[Strategy]

  /**
   * Given a [[plans.logical.LogicalPlan LogicalPlan]], returns a list of `PhysicalPlan`s that can
   * be used for execution. If this strategy does not apply to the give logical operation then an
   * empty list should be returned.
   */
  abstract protected class Strategy extends Logging {
    def apply(plan: LogicalPlan): Seq[PhysicalPlan]
  }

  /**
   * Returns a placeholder for a physical plan that executes `plan`. This placeholder will be
   * filled in automatically by the QueryPlanner using the other execution strategies that are
   * available.
   */
  protected def planLater(plan: LogicalPlan) = apply(plan).next()

  def apply(plan: LogicalPlan): Iterator[PhysicalPlan] = {
    // Obviously a lot to do here still...
    val iter = strategies.view.flatMap(_(plan)).toIterator
    assert(iter.hasNext, s"No plan for $plan")
    iter
  }
}

QueryPlanner impl

目前的实现是SparkStrategies

在SQLContext里的使用是SparkPlanner：

protected[sql] class SparkPlanner extends SparkStrategies {
    val sparkContext = self.sparkContext

    val strategies: Seq[Strategy] =
      TopK ::
      PartialAggregation ::
      SparkEquiInnerJoin ::
      BasicOperators ::
      CartesianProduct ::
      BroadcastNestedLoopJoin :: Nil
  }

在HiveContext里的使用是带了hive策略的SparkPlanner：

val hivePlanner = new SparkPlanner with HiveStrategies {
    val hiveContext = self

    override val strategies: Seq[Strategy] = Seq(
      TopK,
      ColumnPrunings,
      PartitionPrunings,
      HiveTableScans,
      DataSinks,
      Scripts,
      PartialAggregation,
      SparkEquiInnerJoin,
      BasicOperators,
      CartesianProduct,
      BroadcastNestedLoopJoin
    )
  }

Strategy & impl

Strategy的实现主要包含Spark Strategy和Hive Strategy。前者基本上对应了sql.execution包里的类。后者是在Spark策略的基础上附加的一些策略。

Expression

Expression几个属性：

1. 带DataType，并且自带一些inline方法帮助一些dataType的转换

2. 带reference，reference是Seq[Attribute]，Attribute是NamedExpression子类。

3. foldable ，即静态可以直接执行的表达式

Expression里只有Literal可折叠，Literal是LeafExpression，根据dataType生成不同类型表达式

object Literal {
  def apply(v: Any): Literal = v match {
    case i: Int => Literal(i, IntegerType)
    case l: Long => Literal(l, LongType)
    case d: Double => Literal(d, DoubleType)
    case f: Float => Literal(f, FloatType)
    case b: Byte => Literal(b, ByteType)
    case s: Short => Literal(s, ShortType)
    case s: String => Literal(s, StringType)
    case b: Boolean => Literal(b, BooleanType)
    case null => Literal(null, NullType)
  }
}

case class Literal(value: Any, dataType: DataType) extends LeafExpression {

  override def foldable = true
  def nullable = value == null
  def references = Set.empty

  override def toString = if (value != null) value.toString else "null"

  type EvaluatedType = Any
  override def apply(input: Row):Any = value // 执行这个叶子表达式的话就是返回value值
}

4. resolved 具体关心children是否都resolved。

childeren是TreeNode里的概念，在TreeNode里是一个Seq[BaseType]，而BaseType是TreeNode[T]里的范型。在Expression这里，即TreeNode[Expression]，BaseType就是Expression。

Expression继承结构

抽象子类如下：

abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression] {
  self: Product =>
  def symbol: String
  override def foldable = left.foldable && right.foldable
  def references = left.references ++ right.references
  override def toString = s"($left $symbol $right)"
}

abstract class LeafExpression extends Expression with trees.LeafNode[Expression] {
  self: Product =>
}

abstract class UnaryExpression extends Expression with trees.UnaryNode[Expression] {
  self: Product =>
  def references = child.references
}

Expression impl

略

SchemaRDD

SchemaRDD是一个RDD[Row]，Row在Catalyst对应的是Table里的一行，定义是

trait Row extends Seq[Any] with Serializable

SchemaRDD就两部分实现，还有几个SQLContext的方法调用

一是RDD的Function的实现

  // =========================================================================================
  // RDD functions: Copy the interal row representation so we present immutable data to users.
  // =========================================================================================
  override def compute(split: Partition, context: TaskContext): Iterator[Row] =
    firstParent[Row].compute(split, context).map(_.copy())

  override def getPartitions: Array[Partition] = firstParent[Row].partitions

  override protected def getDependencies: Seq[Dependency[_]] =
    List(new OneToOneDependency(queryExecution.toRdd))  // 该SchemaRDD与优化后的RDD是窄依赖

二是DSL function的实现，如

def select(exprs: NamedExpression*): SchemaRDD =
    new SchemaRDD(sqlContext, Project(exprs, logicalPlan))

每次DSL的操作会转化成为新的SchemaRDD，

SchemaRDD的DSL操作与Catalyst组件提供的操作的对应关系为

DSL Operator的实现都依赖Catalyst的basicOperator，basicOperator里的操作都是LogicalPlan的继承类，主要分两类，一元UnaryNode和二元BinaryNode操作。而UnaryNode和BinaryNode都是TreeNode的实现，TreeNode里还有一种就是LeafNode。

basicOperator的各种实现都是caseclass，都是LogicalPlan，不具备execute能力

Hive

Hive Context

HiveContext是Spark SQL执行引擎之一，将hive数据结合到Spark环境中，读取的配置在hive-site.xml里指定。

继承关系

HiveContext里的sql parser使用的是HiveQl，

执行hql的时候，runHive方法接收cmd，且设置了最大返回行数

protected def runHive(cmd: String, maxRows: Int = 1000): Seq[String]

调用的方法是hive里的类，返回结果存在java的ArrayList里

错误日志会记录在outputBuffer里，用于打印输出

逻辑执行计划的几个步骤仍然类似SqlContext，因为QueryExecution也继承了过来

abstract class QueryExecution extends super.QueryExecution {

区别在于使用的实例不一样，且toRdd操作逻辑不一样

Hive Catalog

使用HiveMetastoreCatalog存表信息

HiveMetastoreCatalog内，通过HiveContext的hiveconf，创建了hiveclient，所以可以进行getTable，getPartition，createTable操作

HiveMetastoreCatalog内的MetastoreRelation，继承结构如下

通过hive的接口创建了Table，Partition，TableDesc，并带一个隐式转换HiveMetastoreTypes类，因为在把Schema里的Field转成Attribute的过程中，借助HiveMetastoreTypes的toDataType把Catalyst支持的DataType parse成hive支持的类型

Hive QL

参考HiveQl类

Hive UDF

object HiveFunctionRegistry
  extends analysis.FunctionRegistry with HiveFunctionFactory with HiveInspectors {

继承FunctionRegistry，实现的是lookupFunction方法

HiveFunctionFactory主要做反射的事情，以及把hive的类型转化成为catalyst type

包括

  def getFunctionInfo(name: String) = FunctionRegistry.getFunctionInfo(name)
  def getFunctionClass(name: String) = getFunctionInfo(name).getFunctionClass
  def createFunction[UDFType](name: String) =
    getFunctionClass(name).newInstance.asInstanceOf[UDFType]

HiveInspectors是Catalyst DataType和Hive ObjectInspector的转化

Java类到Catalyst dataType的转化

def javaClassToDataType(clz: Class[_]): DataType = clz match

Hive Strategy

val hivePlanner = new SparkPlanner with HiveStrategies {
    val hiveContext = self

    override val strategies: Seq[Strategy] = Seq(
      TopK,
      ColumnPrunings,
      PartitionPrunings,
      HiveTableScans,
      DataSinks,
      Scripts,
      PartialAggregation,
      SparkEquiInnerJoin,
      BasicOperators,
      CartesianProduct,
      BroadcastNestedLoopJoin
    )
  }

Summary

之前的那篇 Spark SQL组件源码分析走读了SQLContext的整个执行过程，有很多内容不够具体。本文结合Catalyst，做了更详细的说明。

全文完 :)

你可能感兴趣的:(源码,spark)

Goose开源程序本地机上 AI 代理，能够从头到尾自动执行复杂的开发任务。Goose 不仅可以提供代码建议，还可以自主构建整个项目、编写和执行代码、调试故障、编排工作流程以及与外部 API 交互 struggle2025 策略模式人工智能交互
一、软件下载文末提供程序和源码下载Goose是您的机上AI代理，能够从头到尾自动执行复杂的开发任务。Goose不仅可以提供代码建议，还可以自主构建整个项目、编写和执行代码、调试故障、编排工作流程以及与外部API交互。无论您是在构建想法原型、优化现有代码，还是管理复杂的工程管道，goose都能适应您的工作流程并精确执行任务。goose专为实现最大的灵活性而设计，可与任何LLMAPI配合使用，并与支持
Hive与Spark的UDF：数据处理利器的对比与实践窝窝和牛牛 hive spark hadoop
文章目录Hive与Spark的UDF：数据处理利器的对比与实践一、UDF概述二、HiveUDF解析实现原理代码示例业务应用三、SparkUDF剖析-JDBC方式使用SparkThriftServer设置通过JDBC使用UDFSparkUDF的Java实现（用于JDBC方式）通过beeline客户端连接使用业务应用场景四、Hive与SparkUDF在JDBC模式下的对比五、实际部署与最佳实践六、总结
仿新浪微博typecho主题源码酷爱码 php PHP typecho 博客源码
源码介绍仿新浪微博typecho主题源码，简约美观，适合做个人博客，该源码为主题模板，需要先搭建typecho，然后吧源码放到对应的模板目录下，后台启用即可源码特点支持自适应个性化程度高可设置背景图、顶栏背景图可自定义导航栏、资料卡、关注按钮等文章大图多样化选择，支持随机图适配Typecho最新版本（1.2.1）与PHP8.0源码免费获取仿新浪微博typecho主题源码
尚硅谷电商数仓6.0，hive on spark,spark启动不了新时代赚钱战士 hive spark hadoop
在datagrip执行分区插入语句时报错[42000][40000]Errorwhilecompilingstatement:FAILED:SemanticExceptionFailedtogetasparksession:org.apache.hadoop.hive.ql.metadata.HiveException:FailedtocreateSparkclientforSparksessio
DataEase二开记录--踩坑和详细步骤（一）风_间 DataEase 数据库 mysql java
最近在看DataEase，发现挺好用的，推荐使用。用的过程中萌生了二开的想法，于是自己玩了玩，并做了一些记录。开发环境问题下载源码，选稳定版本的，本案例是1.17.0版本。下载地址开源社区-FIT2CLOUD飞致云数据库配置数据库初始化：DataEase使用MySQL数据库，推荐使用MySQL5.7版本。同时DataEase对数据库部分配置项有要求，请参考下附的数据库配置，修改开发环境中的数据库配
基于51单片机设计的呼吸灯鱼弦单片机系统合集 51单片机嵌入式硬件单片机
鱼弦：公众号【红尘灯塔】，CSDN博客专家、内容合伙人、新星导师、全栈领域优质创作者、51CTO(Top红人+专家博主)、github开源爱好者（go-zero源码二次开发、游戏后端架构https://github.com/Peakchen）基于51单片机设计的呼吸灯是一种常见的LED灯效应果，通过控制LED的亮度逐渐增加和减小，模拟人类呼吸的效果。下面将对其原理、应用场景、算法实现、代码实现等进
实现图片压缩功能鸿蒙示例代码
本文原创发布在华为开发者社区。介绍本示例基于imagePackerssApi实现了图片压缩功能，并将压缩后的图片转成base64格式。开发者可将压缩后的图片用于arkui或者H5中进行图片展示。实现图片压缩功能源码链接效果预览使用说明打开应用，展示选择图片并压缩按钮，点击按钮，拉起系统相册，相册里选择图片或者拍照获取图片，选择完毕后点击完成，即可返回应用主页面，展示压缩后的图片。实现思路构造sel
实现图片处理功能鸿蒙示例代码
本文原创发布在华为开发者社区。介绍本项目基于OpenHarmony三方库ImageKnife进行图片处理场景开发使用：支持不同类型的本地与网络图片展示。支持拉起相机拍照展示与图库照片选择展示。支持图片单一种变换效果。支持本地/在线图片格式：JPG、PNG、SVG、GIF、DPG、WEBP、BMP实现图片处理功能源码链接效果预览使用说明下载安装根目录下的oh-package.json5中depend
实现系统分享功能鸿蒙示例代码
本文原创发布在华为开发者社区。介绍本示例基于ShareKit能力实现了宿主应用分享图片的功能。开发者可结合具体业务场景设定目标应用并处理分享内容。实现系统分享功能源码链接效果预览使用说明点击“查看并下载图片”按钮，从网络上下载图片。点击“系统分享”按钮，选择图片，在底部选择shareget可拉起接受方应用，分享图片。实现思路分享图片使用request.downloadFile接口，根据开发者自己设
多种弹窗实现方法鸿蒙示例代码
本文原创发布在华为开发者社区。介绍本示例介绍以下五种常见的弹窗场景化案例。应用启动时的隐私政策和用户协议弹窗网络请求完成的结果提示弹窗应用返回上一级页面的退出确认弹窗个人信息填写的信息弹窗应用使用过程中出现的付费类广告弹窗弹窗场景化源码链接效果预览使用说明进入应用会立马弹出一个隐私协议窗口，点同意关闭该窗口，点不同意退出应用。点击网络请求完成的结果提示弹窗，会弹出一个等待的子窗口弹窗，网络请求完毕
美团Leaf分布式ID生成器使用教程：号段模式与Snowflake模式详解 Cloud_. 分布式
引言在分布式系统中，生成全局唯一ID是核心需求之一。美团开源的Leaf提供了两种分布式ID生成方案：号段模式（高可用、依赖数据库）和Snowflake模式（高性能、去中心化）。本文将手把手教你如何配置和使用这两种模式，并解析其核心机制。一、Leaf号段模式使用教程1.环境准备数据库：MySQL5.7+Java环境：JDK1.8+Leaf源码：从GitHub克隆Leaf仓库（推荐使用feature/
qt-5.15.2 源码编译 Linux weixin_40857106 服务器运维
QT官方源码下载地址：https://download.qt.io/archive/qt/5.15/5.15.12/single/qt-everywhere-opensource-src-5.15.12.tar.xz安装Qt所需的依赖：sudoaptinstallbuild-essentiallibgl1-mesa-devlibxkbcommon-devlibnss3-devlibdbus-1-d
Lodash源码分析-every,some,size,includes 初学者7. Loadsh源码分析 javascript 前端
collection相关的函数，collection指的是一组用于处理集合（如数组或对象）的工具函数。lodash源码研读之every,some,size,includes一、源码地址GitHub地址:GitHub-lodash/lodash:AmodernJavaScriptutilitylibrarydeliveringmodularity,performance,&extras.官方文档地址
Lodash源码分析-uniq,uniqBy,uniqWith 初学者7. Loadsh源码分析 javascript 前端
lodash源码研读之uniq,uniqBy,uniqWith一、源码地址GitHub地址:GitHub-lodash/lodash:AmodernJavaScriptutilitylibrarydeliveringmodularity,performance,&extras.官方文档地址:Lodash官方文档二、结构分析uniq,uniqBy,uniqWith基于baseUniq模块。三、函数介
一文搞懂Nginx: 域名配置、SSL、HTTP转HTTPS 千层冷面知识类 http nginx ssl linux
本文将在Centos系统下详解Nginx服务器，从概念、下载、安装、编译、配置(含域名和证书)到启动。本文先讲Nginx如何使用，然后再谈概念。一、实践1.下载下载通常有2种方式：Centos自带的包管理工具、源码编译安装(推荐，拓展性强)，本文使用源码编译安装的形式下载从Nginx官网（nginx.org）下载Nginx的源代码。亦可以使用wget命令或者浏览器下载后通过FTP等方式传输到服务器
鸿蒙HarmonyOS 5.0开发：应用程序包-HAP 炫酷盖茨猫先生鸿蒙5.0开发 ArkTS组件 ArkUI框架 harmonyos 华为前端 android ArkUI ArkTS 鸿蒙系统
往期鸿蒙全套实战文章必看：（文中附带鸿蒙全栈学习资料）鸿蒙开发核心知识点，看这篇文章就够了最新版！鸿蒙HarmonyOSNext应用开发实战学习路线鸿蒙HarmonyOSNEXT开发技术最全学习路线指南鸿蒙应用开发实战项目，看这一篇文章就够了（部分项目附源码）HAPHAP（HarmonyAbilityPackage）是应用安装和运行的基本单元。HAP包是由代码、资源、第三方库、配置文件等打包生成的
Optional源码解析和示例解析飞翔中文网 Java 开发语言 java jdk
Optional源码解析packagejava.util;importjava.util.function.Consumer;importjava.util.function.Function;importjava.util.function.Predicate;importjava.util.function.Supplier;/***这是一个容器对象，它可能包含一个非空值，也可能不包含。*如果
多功能电子医药盒设计方案（含有源码）妄北y 竞赛项目研究实战汇集 xcode macos ide
一、设计背景与目的随着科技的迅速发展，数字化和智能化已经成为现代社会的主流趋势。计算机和网络技术的广泛应用正在改变人们的生活方式，尤其是在老龄化社会中，智能化设备的需求日益增长。多功能电子医药盒的设计旨在提高人们的生活效率，尤其是为老年人和忙碌的年轻人提供便利的用药提醒和管理系统。1.设计目的本设计的目标是开发一种多功能语音电子医药盒，能够根据用户的语音指令进行操作，提高用户的用药安全和便捷性。该
基于FSK调制的多点无线数据传输系统设计（含有源码）妄北y 竞赛项目研究实战汇集 mongodb 单片机嵌入式硬件
摘要本系统设计了一种基于FSK（频移键控）调制的多点无线数据传输系统，主要由一个主接收机和两个发射机组成。系统以89S52单片机为核心，负责数据的编码、解码及控制功能，采用FSK调制方式实现文字和语音数据的无线传输。系统配备LCD显示屏，支持数据的实时显示与存储，具备多功能传输与存储能力。本文详细介绍了系统的设计方案、硬件模块实现、软件设计及调试过程，并展示了系统的测试结果与未来应用前景。关键词：
【C++篇】深入剖析C++ Vector底层源码及实现机制 far away4002 C++c++开发语言 vector visual studio vscode
文章目录须知欢迎讨论：如果你在学习过程中有任何问题或想法，欢迎在评论区留言，我们一起交流学习。你的支持是我继续创作的动力！点赞、收藏与分享：觉得这篇文章对你有帮助吗？别忘了点赞、收藏并分享给更多的小伙伴哦！你们的支持是我不断进步的动力！分享给更多人：如果你觉得这篇文章对你有帮助，欢迎分享给更多对C++感兴趣的朋友，让我们一起进步！全面剖析vector底层及实现机制接上篇：【C++篇】探索STL之美
【开题报告+论文+源码】基于SpringBoot+Vue的社区团购配送系统编程毕设 spring boot 后端 java
项目背景与意义随着社会的进步和收入的提高，消费者对购物体验有了更高的要求。他们希望获得更多样化的商品选择，更加便捷的购物方式，以及更加优质的售后服务。同时，越来越多的老年人开始关注健康饮食和食品质量。他们不再满足于传统的购物方式，而是希望通过更加方便的方式来获取更加安全和健康的食品。社区团购配送系统在满足用户日常生活需求的同时，也带来了许多便利和机遇。项目介绍本课程演示的是一款基于SpringBo
PHP从零实现区块链（网页版五）地址、密钥和钱包 Bczheng1 #php从零实现区块链(网页版)区块链
源码地址：PHP从零实现区块链（五）地址、密钥和钱包-简书注：本例只是从网页版实现一下原理，源码非本人所写，只是将原帖的源码更改了一下，变成网页版在开始例子之前，我们需要安装两个库,并了解库中一些函数的用法。我们先进入mylaravel6目录，然后输入：composerrequirebitwasp/bitcoin安装bitwasp/bitcoin库。但是报一堆错，最下面有这两句：Alternati
计算机专业开题报告案例19：基于spring boot的养老院信息管理系统的设计与实现平姐设计计算机毕业设计100套 java项目实战网站开发与搭建实战项目 spring boot 后端 java 计算机毕业设计养老院信息管理系统开题报告老人信息
计算机毕业设计100套微信小程序项目实战java项目实战需要源码可以滴滴我一、课题论证1.1国内外研究动态目前，基于springboot的养老院信息管理系统的研究和开发已经在国内外得到了较多关注和实践。北京大学医学部的研究人员开发了一套养老院信息管理系统，该系统可以实现对老人的生活、医疗、营养等方面的全面管理和监测。此外，南开大学、清华大学等高校也都开展了相关研究。其中就有采取建立于微信小程序平台
【AI Agent教程】各种Agent开发框架都是如何实现ReAct思想的？深入源码学习一下同学小张大模型人工智能学习笔记经验分享 AIGC AI Agent ReAct
大家好，我是同学小张，持续学习C++进阶知识和AI大模型应用实战案例，持续分享，欢迎大家点赞+关注，共同学习和进步。驱动大模型有很多种方式，例如纯Prompt方式、思维链方式、ReAct方式等。ReAct方式是AIAgent最常用的实现思路之一，它强调在执行任务时结合推理（Reasoning）和行动（Acting）两个方面，使得Agent能够在复杂和动态的环境中更有效地工作。本文我们来看看常用的那
数据中台（二）数据中台相关技术栈 Yuan_CSDF #数据中台
1.平台搭建1.1.Amabari+HDP1.2.CM+CDH2.相关的技术栈数据存储：HDFS，HBase，Kudu等数据计算：MapReduce,Spark,Flink交互式查询：Impala,Presto在线实时分析：ClickHouse，Kylin，Doris，Druid，Kudu等资源调度：YARN，Mesos，Kubernetes任务调度：Oozie，Azakaban，AirFlow，
Python之pip的安装和使用详细教程叫我技术帝 Python python
我们都知道python有海量的第三方库或者说模块，这些库针对不同的应用，发挥不同的作用。我们在实际的项目中，或多或少的都要使用到第三方库，那么如何将他人的库加入到自己的项目中内呢？打个电话？大哥你好，想用下你那个库，麻烦给邮箱发个源码呗！显然这是个笑话。Python官方的PyPi仓库为我们提供了一个统一的代码托管仓库，所有的第三方库，甚至你自己写的开源模块，都可以发布到这里，让全世界的人分享下载。
一文搞懂大数据神器Spark，真的太牛了！ qq_23519469 大数据 spark 分布式
Spark是什么在如今这个大数据时代，数据量呈爆炸式增长，传统的数据处理方式已经难以满足需求。就拿电商平台来说，每天产生的交易数据、用户浏览数据、评论数据等，数量巨大且种类繁多。假如要对这些数据进行分析，比如分析用户的购买行为，找出最受欢迎的商品，预测未来的销售趋势等，用普通的单机处理方式，可能需要花费很长时间，甚至根本无法完成。这时，Spark就应运而生了。Spark是一个开源的、基于内存计算的
大模型应用编排工具Dify二开之登录Token改造 Daphnis_z Python开发 LLM chatgpt python docker web
1.前言dify工作室支持在画布上直接编辑业务流程，通过调用开源大模型可以实现特定场景的业务，而且可以迅速更新发布。因此，某些项目要求在产品里面能够直接编辑dify业务流程，使得现场开发人员能够迅速响应客户需求。另外，方便对dify进行运维，比如更新开源大模型认证信息。环境信息：dify-0.8.3,docker-212.实现思路分析常规的思路有两种：把dify源码迁移到产品中代码改造量大、难度高
代码逐行解析 | 教你在C++中使用深度学习提取特征点 3Ｄ视觉工坊 3D视觉从入门到精通 c++深度学习开发语言人工智能
点击下方卡片，关注「3D视觉工坊」公众号选择星标，干货第一时间送达扫描下方二维码，加入3D视觉技术星球，星球内汇集了众多3D视觉实战问题，以及各个模块的学习资料：最新顶会论文、书籍、源码、视频（近20门系统课程[星球成员可免费学习]）等。想要入门3D视觉、做项目、搞科研，就加入我们吧。作者：泡椒味的口香糖|来源：3DCV添加微信：dddvision
2025年毕设ssm校园二手交易平台论文+源码锦程学长--毕设程序课程设计
本系统（程序+源码）带文档lw万字以上文末可获取一份本项目的java源码和数据库参考。系统程序文件列表开题报告内容选题背景关于校园二手交易平台的研究，现有成果多集中于社会综合型平台（如闲鱼、转转）的商业模式分析，或理论层面的共享经济模型探讨，而针对高校场景特殊性（如用户密度高、交易标的额小、社交属性强）的垂直型平台研究存在明显缺口。当前高校内二手交易多依赖社群、论坛等分散渠道，存在信息不对称、交易
Spring4.1新特性——Spring MVC增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
mysql 性能查询优化 annan211 java sql 优化 mysql 应用服务器
1 时间到底花在哪了？ mysql在执行查询的时候需要执行一系列的子任务，这些子任务包含了整个查询周期最重要的阶段，这其中包含了大量为了检索数据列到存储引擎的调用以及调用后的数据处理，包括排序、分组等。在完成这些任务的时候，查询需要在不同的地方花费时间，包括网络、cpu计算、生成统计信息和执行计划、锁等待等。尤其是向底层存储引擎检索数据的调用操作。这些调用需要在内存操
windows系统配置 cherishLC windows
删除Hiberfil.sys ：使用命令powercfg -h off 关闭休眠功能即可： http://jingyan.baidu.com/article/f3ad7d0fc0992e09c2345b51.html 类似的还有pagefile.sys msconfig 配置启动项 shutdown 定时关机 ipconfig 查看网络配置 ipconfig /flushdns
人体的排毒时间 Array_06 工作
======================== || 人体的排毒时间是什么时候？|| ======================== 转载于： http://zhidao.baidu.com/link?url=ibaGlicVslAQhVdWWVevU4TMjhiKaNBWCpZ1NS6igCQ78EkNJZFsEjCjl3T5EdXU9SaPg04bh8MbY1bR
ZooKeeper cugfy zookeeper
Zookeeper是一个高性能，分布式的，开源分布式应用协调服务。它提供了简单原始的功能，分布式应用可以基于它实现更高级的服务，比如同步，配置管理，集群管理，名空间。它被设计为易于编程，使用文件系统目录树作为数据模型。服务端跑在java上，提供java和C的客户端API。 Zookeeper是Google的Chubby一个开源的实现，是高有效和可靠的协同工作系统，Zookeeper能够用来lea
网络爬虫的乱码处理随意而生爬虫网络
下边简单总结下关于网络爬虫的乱码处理。注意，这里不仅是中文乱码，还包括一些如日文、韩文、俄文、藏文之类的乱码处理，因为他们的解决方式是一致的，故在此统一说明。网络爬虫，有两种选择，一是选择nutch、hetriex，二是自写爬虫，两者在处理乱码时，原理是一致的，但前者处理乱码时，要看懂源码后进行修改才可以，所以要废劲一些；而后者更自由方便，可以在编码处理
Xcode常用快捷键张亚雄 xcode
一、总结的常用命令：隐藏xcode command+h 退出xcode command+q 关闭窗口 command+w 关闭所有窗口 command+option+w 关闭当前
mongoDB索引操作 adminjun mongodb 索引
一、索引基础： MongoDB的索引几乎与传统的关系型数据库一模一样，这其中也包括一些基本的优化技巧。下面是创建索引的命令： > db.test.ensureIndex({"username":1}) 可以通过下面的名称查看索引是否已经成功建立： &nbs
成都软件园实习那些话 aijuans 成都软件园实习
无聊之中，翻了一下日志，发现上一篇经历是很久以前的事了，悔过~~ 　　断断续续离开了学校快一年了，习惯了那里一天天的幼稚、成长的环境，到这里有点与世隔绝的感觉。不过还好，那是刚到这里时的想法，现在感觉在这挺好，不管怎么样，最要感谢的还是老师能给这么好的一次催化成长的机会，在这里确实看到了好多好多能想到或想不到的东西。　　都说在外面和学校相比最明显的差距就是与人相处比较困难，因为在外面每个人都
Linux下FTP服务器安装及配置 ayaoxinchao linux FTP服务器 vsftp
检测是否安装了FTP [root@localhost ~]# rpm -q vsftpd 如果未安装：package vsftpd is not installed 安装了则显示：vsftpd-2.0.5-28.el5累死的版本信息安装FTP 运行yum install vsftpd命令，如[root@localhost ~]# yum install vsf
使用mongo-java-driver获取文档id和查找文档 BigBird2012 driver
注：本文所有代码都使用的mongo-java-driver实现。在MongoDB中，一个集合（collection）在概念上就类似我们SQL数据库中的表（Table），这个集合包含了一系列文档（document）。一个DBObject对象表示我们想添加到集合（collection）中的一个文档（document），MongoDB会自动为我们创建的每个文档添加一个id，这个id在
JSONObject以及json串 bijian1013 json JSONObject
一.JAR包简介要使程序可以运行必须引入JSON-lib包，JSON-lib包同时依赖于以下的JAR包： 1.commons-lang-2.0.jar 2.commons-beanutils-1.7.0.jar 3.commons-collections-3.1.jar &n
[Zookeeper学习笔记之三]Zookeeper实例创建和会话建立的异步特性 bit1129 zookeeper
为了说明问题，看个简单的代码， import org.apache.zookeeper.*; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ThreadLocal
【Scala十二】Scala核心六：Trait bit1129 scala
Traits are a fundamental unit of code reuse in Scala. A trait encapsulates method and field definitions, which can then be reused by mixing them into classes. Unlike class inheritance, in which each c
weblogic version 10.3破解 ronin47 weblogic
版本：WebLogic Server 10.3 说明：%DOMAIN_HOME%：指WebLogic Server 域(Domain）目录例如我的做测试的域的根目录 DOMAIN_HOME=D:/Weblogic/Middleware/user_projects/domains/base_domain 1.为了保证操作安全，备份%DOMAIN_HOME%/security/Defa
求第n个斐波那契数 BrokenDreams
今天看到群友发的一个问题：写一个小程序打印第n个斐波那契数。自己试了下，搞了好久。。。基础要加强了。 &nbs
读《研磨设计模式》-代码笔记-访问者模式-Visitor bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; interface IVisitor { //第二次分派，Visitor调用Element void visitConcret
MatConvNet的excise 3改为网络配置文件形式 cherishLC matlab
MatConvNet为vlFeat作者写的matlab下的卷积神经网络工具包，可以使用GPU。主页： http://www.vlfeat.org/matconvnet/ 教程： http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html 注意：需要下载新版的MatConvNet替换掉教程中工具包中的matconvnet： http
ZK Timeout再讨论 chenchao051 zookeeper timeout hbase
http://crazyjvm.iteye.com/blog/1693757 文中提到相关超时问题，但是又出现了一个问题，我把min和max都设置成了180000，但是仍然出现了以下的异常信息： Client session timed out, have not heard from server in 154339ms for sessionid 0x13a3f7732340003
CASE WHEN 用法介绍 daizj sql group by case when
CASE WHEN 用法介绍 1. CASE WHEN 表达式有两种形式 --简单Case函数 CASE sex WHEN '1' THEN '男' WHEN '2' THEN '女' ELSE '其他' END --Case搜索函数 CASE WHEN sex = '1' THEN
PHP技巧汇总:提高PHP性能的53个技巧 dcj3sjt126com PHP
PHP技巧汇总:提高PHP性能的53个技巧　　用单引号代替双引号来包含字符串，这样做会更快一些。因为PHP会在双引号包围的字符串中搜寻变量，　　单引号则不会，注意：只有echo能这么做，它是一种可以把多个字符串当作参数的函数译注：　　PHP手册中说echo是语言结构，不是真正的函数，故把函数加上了双引号)。　　1、如果能将类的方法定义成static，就尽量定义成static，它的速度会提升将近4倍
Yii框架中CGridView的使用方法以及详细示例 dcj3sjt126com yii
CGridView显示一个数据项的列表中的一个表。表中的每一行代表一个数据项的数据,和一个列通常代表一个属性的物品(一些列可能对应于复杂的表达式的属性或静态文本)。　　CGridView既支持排序和分页的数据项。排序和分页可以在AJAX模式或正常的页面请求。使用CGridView的一个好处是,当用户浏览器禁用JavaScript,排序和分页自动退化普通页面请求和仍然正常运行。实例代码如下：
Maven项目打包成可执行Jar文件 dyy_gusi assembly
Maven项目打包成可执行Jar文件在使用Maven完成项目以后，如果是需要打包成可执行的Jar文件，我们通过eclipse的导出很麻烦，还得指定入口文件的位置，还得说明依赖的jar包，既然都使用Maven了，很重要的一个目的就是让这些繁琐的操作简单。我们可以通过插件完成这项工作，使用assembly插件。具体使用方式如下： 1、在项目中加入插件的依赖： <plugin>
php常见错误 geeksun PHP
1. kevent() reported that connect() failed (61: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastc
修改linux的用户名 hongtoushizi linux change password
Change Linux Username 更改Linux用户名，需要修改4个系统的文件： /etc/passwd /etc/shadow /etc/group /etc/gshadow 古老/传统的方法是使用vi去直接修改，但是这有安全隐患（具体可自己搜一下），所以后来改成使用这些命令去代替： vipw vipw -s vigr vigr -s 具体的操作顺
第五章常用Lua开发库1-redis、mysql、http客户端 jinnianshilongnian nginx lua
对于开发来说需要有好的生态开发库来辅助我们快速开发，而Lua中也有大多数我们需要的第三方开发库如Redis、Memcached、Mysql、Http客户端、JSON、模板引擎等。一些常见的Lua库可以在github上搜索，https://github.com/search?utf8=%E2%9C%93&q=lua+resty。 Redis客户端 lua-resty-r
zkClient 监控机制实现 liyonghui160com zkClient 监控机制实现
直接使用zk的api实现业务功能比较繁琐。因为要处理session loss，session expire等异常，在发生这些异常后进行重连。又因为ZK的watcher是一次性的，如果要基于wather实现发布/订阅模式，还要自己包装一下，将一次性订阅包装成持久订阅。另外如果要使用抽象级别更高的功能，比如分布式锁，leader选举
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句 pda158 mysql
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句：　　方法一：SELECT table_name, column_name from information_schema.columns WHERE column_name LIKE 'Name'; 　　方法二：SELECT column_name from information_schema.colum
程序员对英语的依赖 Smile.zeng 英语程序猿
1、程序员最基本的技能，至少要能写得出代码，当我们还在为建立类的时候思考用什么单词发牢骚的时候，英语与别人的差距就直接表现出来咯。 2、程序员最起码能认识开发工具里的英语单词，不然怎么知道使用这些开发工具。 3、进阶一点，就是能读懂别人的代码，有利于我们学习人家的思路和技术。 4、写的程序至少能有一定的可读性，至少要人别人能懂吧... 以上一些问题，充分说明了英语对程序猿的重要性。骚年
Oracle学习笔记(8) 使用PLSQL编写触发器 vipbooks oracle sql 编程活动 Access
时间过得真快啊，转眼就到了Oracle学习笔记的最后个章节了，通过前面七章的学习大家应该对Oracle编程有了一定了了解了吧，这东东如果一段时间不用很快就会忘记了，所以我会把自己学习过的东西做好详细的笔记，用到的时候可以随时查找，马上上手！希望这些笔记能对大家有些帮助！这是第八章的学习笔记，学习完第七章的子程序和包之后