最笨的羊羊

Flink系列之：自定义函数

一、自定义函数
二、概述
三、开发指南
四、函数类
五、求值方法
六、类型推导
七、自动类型推导
八、定制类型推导
九、确定性
十、内置函数的确定性
十一、运行时集成
十二、标量函数
十三、表值函数
十四、聚合函数
十五、表值聚合函数

一、自定义函数

自定义函数（UDF）是一种扩展开发机制，可以用来在查询语句里调用难以用其他方式表达的频繁使用或自定义的逻辑。

自定义函数可以用 JVM 语言（例如 Java 或 Scala）或 Python 实现，实现者可以在 UDF 中使用任意第三方库，本文聚焦于使用 JVM 语言开发自定义函数。

二、概述

当前 Flink 有如下几种函数：

标量函数将标量值转换成一个新标量值；
表值函数将标量值转换成新的行数据；
聚合函数将多行数据里的标量值转换成一个新标量值；
表值聚合函数将多行数据里的标量值转换成新的行数据；
异步表值函数是异步查询外部数据系统的特殊函数。

注意：标量和表值函数已经使用了新的基于数据类型的类型系统，聚合函数仍然使用基于 TypeInformation 的旧类型系统。

以下示例展示了如何创建一个基本的标量函数，以及如何在 Table API 和 SQL 里调用这个函数。

函数用于 SQL 查询前要先经过注册；而在用于 Table API 时，函数可以先注册后调用，也可以内联后直接使用。

Java版本：

import org.apache.flink.table.api.*;
import org.apache.flink.table.functions.ScalarFunction;
import static org.apache.flink.table.api.Expressions.*;

// 定义函数逻辑
public static class SubstringFunction extends ScalarFunction {
  public String eval(String s, Integer begin, Integer end) {
    return s.substring(begin, end);
  }
}

TableEnvironment env = TableEnvironment.create(...);

// 在 Table API 里不经注册直接“内联”调用函数
env.from("MyTable").select(call(SubstringFunction.class, $("myField"), 5, 12));

// 注册函数
env.createTemporarySystemFunction("SubstringFunction", SubstringFunction.class);

// 在 Table API 里调用注册好的函数
env.from("MyTable").select(call("SubstringFunction", $("myField"), 5, 12));

// 在 SQL 里调用注册好的函数
env.sqlQuery("SELECT SubstringFunction(myField, 5, 12) FROM MyTable");

Scala版本：

import org.apache.flink.table.api._
import org.apache.flink.table.functions.ScalarFunction

// define function logic
class SubstringFunction extends ScalarFunction {
  def eval(s: String, begin: Integer, end: Integer): String = {
    s.substring(begin, end)
  }
}

val env = TableEnvironment.create(...)

// 在 Table API 里不经注册直接“内联”调用函数
env.from("MyTable").select(call(classOf[SubstringFunction], $"myField", 5, 12))

// 注册函数
env.createTemporarySystemFunction("SubstringFunction", classOf[SubstringFunction])

// 在 Table API 里调用注册好的函数
env.from("MyTable").select(call("SubstringFunction", $"myField", 5, 12))

// 在 SQL 里调用注册好的函数
env.sqlQuery("SELECT SubstringFunction(myField, 5, 12) FROM MyTable")

对于交互式会话，还可以在使用或注册函数之前对其进行参数化，这样可以把函数实例而不是函数类用作临时函数。

为确保函数实例可应用于集群环境，参数必须是可序列化的。

Java版本：

import org.apache.flink.table.api.*;
import org.apache.flink.table.functions.ScalarFunction;
import static org.apache.flink.table.api.Expressions.*;

// 定义可参数化的函数逻辑
public static class SubstringFunction extends ScalarFunction {

  private boolean endInclusive;

  public SubstringFunction(boolean endInclusive) {
    this.endInclusive = endInclusive;
  }

  public String eval(String s, Integer begin, Integer end) {
    return s.substring(begin, endInclusive ? end + 1 : end);
  }
}

TableEnvironment env = TableEnvironment.create(...);

// 在 Table API 里不经注册直接“内联”调用函数
env.from("MyTable").select(call(new SubstringFunction(true), $("myField"), 5, 12));

// 注册函数
env.createTemporarySystemFunction("SubstringFunction", new SubstringFunction(true));

Scala版本：

import org.apache.flink.table.api._
import org.apache.flink.table.functions.ScalarFunction

// 定义可参数化的函数逻辑
class SubstringFunction(val endInclusive) extends ScalarFunction {
  def eval(s: String, begin: Integer, end: Integer): String = {
    s.substring(endInclusive ? end + 1 : end)
  }
}

val env = TableEnvironment.create(...)

// 在 Table API 里不经注册直接“内联”调用函数
env.from("MyTable").select(call(new SubstringFunction(true), $"myField", 5, 12))

// 注册函数
env.createTemporarySystemFunction("SubstringFunction", new SubstringFunction(true))

你可以在 Table API 中使用 * 表达式作为函数的一个参数，它将被扩展为该表所有的列作为函数对应位置的参数。

Java版本：

import org.apache.flink.table.api.*;
import org.apache.flink.table.functions.ScalarFunction;
import static org.apache.flink.table.api.Expressions.*;

public static class MyConcatFunction extends ScalarFunction {
  public String eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object... fields) {
    return Arrays.stream(fields)
        .map(Object::toString)
        .collect(Collectors.joining(","));
  }
}

TableEnvironment env = TableEnvironment.create(...);

// 使用 $("*") 作为函数的参数，如果 MyTable 有 3 列 (a, b, c)，
// 它们都将会被传给 MyConcatFunction。
env.from("MyTable").select(call(MyConcatFunction.class, $("*")));

// 它等价于显式地将所有列传给 MyConcatFunction。
env.from("MyTable").select(call(MyConcatFunction.class, $("a"), $("b"), $("c")));

Scala版本：

import org.apache.flink.table.api._
import org.apache.flink.table.functions.ScalarFunction

import scala.annotation.varargs

class MyConcatFunction extends ScalarFunction {
  @varargs
  def eval(@DataTypeHint(inputGroup = InputGroup.ANY) row: AnyRef*): String = {
    row.map(f => f.toString).mkString(",")
  }
}

val env = TableEnvironment.create(...)

// 使用 $"*" 作为函数的参数，如果 MyTable 有 3 个列 (a, b, c)，
// 它们都将会被传给 MyConcatFunction。
env.from("MyTable").select(call(classOf[MyConcatFunction], $"*"));

// 它等价于显式地将所有列传给 MyConcatFunction。
env.from("MyTable").select(call(classOf[MyConcatFunction], $"a", $"b", $"c"));

三、开发指南

注意在聚合函数使用新的类型系统前，本节仅适用于标量和表值函数。

所有的自定义函数都遵循一些基本的实现原则。

四、函数类

实现类必须继承自合适的基类之一（例如 org.apache.flink.table.functions.ScalarFunction ）。

该类必须声明为 public ，而不是 abstract ，并且可以被全局访问。不允许使用非静态内部类或匿名类。

为了将自定义函数存储在持久化的 catalog 中，该类必须具有默认构造器，且在运行时可实例化。

Anonymous functions in Table API can only be persisted if the function is not stateful (i.e. containing only transient and static fields).

五、求值方法

基类提供了一组可以被重写的方法，例如 open()、 close() 或 isDeterministic() 。

但是，除了上述方法之外，作用于每条传入记录的主要逻辑还必须通过专门的求值方法来实现。

根据函数的种类，后台生成的运算符会在运行时调用诸如 eval()、accumulate() 或 retract() 之类的求值方法。

这些方法必须声明为 public ，并带有一组定义明确的参数。

常规的 JVM 方法调用语义是适用的。因此可以：

实现重载的方法，例如 eval(Integer) 和 eval(LocalDateTime)；
使用变长参数，例如 eval(Integer…);
使用对象继承，例如 eval(Object) 可接受 LocalDateTime 和 Integer 作为参数；
也可组合使用，例如 eval(Object…) 可接受所有类型的参数。

以下代码片段展示了一个重载函数的示例：

import org.apache.flink.table.functions.ScalarFunction;

// 有多个重载求值方法的函数
public static class SumFunction extends ScalarFunction {

  public Integer eval(Integer a, Integer b) {
    return a + b;
  }

  public Integer eval(String a, String b) {
    return Integer.valueOf(a) + Integer.valueOf(b);
  }

  public Integer eval(Double... d) {
    double result = 0;
    for (double value : d)
      result += value;
    return (int) result;
  }
}

Scala代码：

import org.apache.flink.table.functions.ScalarFunction
import scala.annotation.varargs

// 有多个重载求值方法的函数
class SumFunction extends ScalarFunction {

  def eval(a: Integer, b: Integer): Integer = {
    a + b
  }

  def eval(a: String, b: String): Integer = {
    Integer.valueOf(a) + Integer.valueOf(b)
  }

  @varargs // generate var-args like Java
  def eval(d: Double*): Integer = {
    d.sum.toInt
  }
}

六、类型推导

Table（类似于 SQL 标准）是一种强类型的 API。因此，函数的参数和返回类型都必须映射到数据类型。

从逻辑角度看，Planner 需要知道数据类型、精度和小数位数；从 JVM 角度来看，Planner 在调用自定义函数时需要知道如何将内部数据结构表示为 JVM 对象。

术语类型推导概括了意在验证输入值、派生出参数/返回值数据类型的逻辑。

Flink 自定义函数实现了自动的类型推导提取，通过反射从函数的类及其求值方法中派生数据类型。如果这种隐式的反射提取方法不成功，则可以通过使用 @DataTypeHint 和 @FunctionHint 注解相关参数、类或方法来支持提取过程，下面展示了有关如何注解函数的例子。

如果需要更高级的类型推导逻辑，实现者可以在每个自定义函数中显式重写 getTypeInference() 方法。但是，建议使用注解方式，因为它可使自定义类型推导逻辑保持在受影响位置附近，而在其他位置则保持默认状态。

七、自动类型推导

自动类型推导会检查函数的类和求值方法，派生出函数参数和结果的数据类型， @DataTypeHint 和 @FunctionHint 注解支持自动类型推导。

@DataTypeHint

在许多情况下，需要支持以内联方式自动提取出函数参数、返回值的类型。

以下例子展示了如何使用 @DataTypeHint。

Java代码：

import org.apache.flink.table.annotation.DataTypeHint;
import org.apache.flink.table.annotation.InputGroup;
import org.apache.flink.table.functions.ScalarFunction;
import org.apache.flink.types.Row;

// 有多个重载求值方法的函数
public static class OverloadedFunction extends ScalarFunction {

  // no hint required
  public Long eval(long a, long b) {
    return a + b;
  }

  // 定义 decimal 的精度和小数位
  public @DataTypeHint("DECIMAL(12, 3)") BigDecimal eval(double a, double b) {
    return BigDecimal.valueOf(a + b);
  }

  // 定义嵌套数据类型
  @DataTypeHint("ROW")
  public Row eval(int i) {
    return Row.of(String.valueOf(i), Instant.ofEpochSecond(i));
  }

  // 允许任意类型的符入，并输出序列化定制后的值
  @DataTypeHint(value = "RAW", bridgedTo = ByteBuffer.class)
  public ByteBuffer eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o) {
    return MyUtils.serializeToByteBuffer(o);
  }
}

Scala代码：

import org.apache.flink.table.annotation.DataTypeHint import org.apache.flink.table.annotation.InputGroup import org.apache.flink.table.functions.ScalarFunction import org.apache.flink.types.Row import scala.annotation.varargs // function with overloaded evaluation methods class OverloadedFunction extends ScalarFunction { // no hint required def eval(a: Long, b: Long): Long = { a + b } // 定义 decimal 的精度和小数位 @DataTypeHint("DECIMAL(12, 3)") def eval(double a, double b): BigDecimal = { java.lang.BigDecimal.valueOf(a + b) } // 定义嵌套数据类型 @DataTypeHint("ROW") def eval(Int i): Row = { Row.of(java.lang.String.valueOf(i), java.time.Instant.ofEpochSecond(i)) } // 允许任意类型的符入，并输出定制序列化后的值 @DataTypeHint(value = "RAW", bridgedTo = classOf[java.nio.ByteBuffer]) def eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o): java.nio.ByteBuffer = { MyUtils.serializeToByteBuffer(o) } }

@FunctionHint

有时我们希望一种求值方法可以同时处理多种数据类型，有时又要求对重载的多个求值方法仅声明一次通用的结果类型。

@FunctionHint 注解可以提供从入参数据类型到结果数据类型的映射，它可以在整个函数类或求值方法上注解输入、累加器和结果的数据类型。可以在类顶部声明一个或多个注解，也可以为类的所有求值方法分别声明一个或多个注解。所有的 hint 参数都是可选的，如果未定义参数，则使用默认的基于反射的类型提取。在函数类顶部定义的 hint 参数被所有求值方法继承。

以下例子展示了如何使用 @FunctionHint。

Java代码：

import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; // 为函数类的所有求值方法指定同一个输出类型 @FunctionHint(output = @DataTypeHint("ROW")) public static class OverloadedFunction extends TableFunction<Row> { public void eval(int a, int b) { collect(Row.of("Sum", a + b)); } // overloading of arguments is still possible public void eval() { collect(Row.of("Empty args", -1)); } } // 解耦类型推导与求值方法，类型推导完全取决于 FunctionHint @FunctionHint( input = {@DataTypeHint("INT"), @DataTypeHint("INT")}, output = @DataTypeHint("INT") ) @FunctionHint( input = {@DataTypeHint("BIGINT"), @DataTypeHint("BIGINT")}, output = @DataTypeHint("BIGINT") ) @FunctionHint( input = {}, output = @DataTypeHint("BOOLEAN") ) public static class OverloadedFunction extends TableFunction<Object> { // an implementer just needs to make sure that a method exists // that can be called by the JVM public void eval(Object... o) { if (o.length == 0) { collect(false); } collect(o[0]); } }

Scala代码：

import org.apache.flink.table.annotation.DataTypeHint import org.apache.flink.table.annotation.FunctionHint import org.apache.flink.table.functions.TableFunction import org.apache.flink.types.Row // 为函数类的所有求值方法指定同一个输出类型 @FunctionHint(output = new DataTypeHint("ROW")) class OverloadedFunction extends TableFunction[Row] { def eval(a: Int, b: Int): Unit = { collect(Row.of("Sum", Int.box(a + b))) } // overloading of arguments is still possible def eval(): Unit = { collect(Row.of("Empty args", Int.box(-1))) } } // 解耦类型推导与求值方法，类型推导完全取决于 @FunctionHint @FunctionHint( input = Array(new DataTypeHint("INT"), new DataTypeHint("INT")), output = new DataTypeHint("INT") ) @FunctionHint( input = Array(new DataTypeHint("BIGINT"), new DataTypeHint("BIGINT")), output = new DataTypeHint("BIGINT") ) @FunctionHint( input = Array(), output = new DataTypeHint("BOOLEAN") ) class OverloadedFunction extends TableFunction[AnyRef] { // an implementer just needs to make sure that a method exists // that can be called by the JVM @varargs def eval(o: AnyRef*) = { if (o.length == 0) { collect(Boolean.box(false)) } collect(o(0)) } }

八、定制类型推导

在大多数情况下，@DataTypeHint 和 @FunctionHint 足以构建自定义函数，然而通过重写 getTypeInference() 定制自动类型推导逻辑，实现者可以创建任意像系统内置函数那样有用的函数。

以下用 Java 实现的例子展示了定制类型推导的潜力，它根据字符串参数来确定函数的结果类型。该函数带有两个字符串参数：第一个参数表示要分析的字符串，第二个参数表示目标类型。

Java代码：

import org.apache.flink.table.api.DataTypes; import org.apache.flink.table.catalog.DataTypeFactory; import org.apache.flink.table.functions.ScalarFunction; import org.apache.flink.table.types.inference.TypeInference; import org.apache.flink.types.Row; public static class LiteralFunction extends ScalarFunction { public Object eval(String s, String type) { switch (type) { case "INT": return Integer.valueOf(s); case "DOUBLE": return Double.valueOf(s); case "STRING": default: return s; } } // 禁用自动的反射式类型推导，使用如下逻辑进行类型推导 @Override public TypeInference getTypeInference(DataTypeFactory typeFactory) { return TypeInference.newBuilder() // 指定输入参数的类型，必要时参数会被隐式转换 .typedArguments(DataTypes.STRING(), DataTypes.STRING()) // specify a strategy for the result data type of the function .outputTypeStrategy(callContext -> { if (!callContext.isArgumentLiteral(1) || callContext.isArgumentNull(1)) { throw callContext.newValidationError("Literal expected for second argument."); } // 基于字符串值返回数据类型 final String literal = callContext.getArgumentValue(1, String.class).orElse("STRING"); switch (literal) { case "INT": return Optional.of(DataTypes.INT().notNull()); case "DOUBLE": return Optional.of(DataTypes.DOUBLE().notNull()); case "STRING": default: return Optional.of(DataTypes.STRING()); } }) .build(); } }

九、确定性

每个用户自定义函数类都可以通过重写 isDeterministic() 方法来声明它是否产生确定性的结果。如果该函数不是纯粹函数式的（如random(), date(), 或now()），该方法必须返回 false。默认情况下，isDeterministic() 返回 true。

此外，重写 isDeterministic() 方法也可能影响运行时行为。运行时实现可能会在两个不同的阶段被调用：

在生成执行计划期间：如果一个函数是通过常量表达式调用的或者常量表达式可以从给定的语句中推导出来，那么一个函数就会被预计算以减少常量表达式，并且可能不再在集群上执行。除非 isDeterministic() 被重写为 false 用来在这种情况下禁用常量表达式简化。比如说，以下对 ABS 的调用在生成执行计划期间被执行：SELECT ABS(-1) FROM t 和 SELECT ABS(field) FROM t WHERE field = -1，而 SELECT ABS(field) FROM t 则不执行。

在运行时（即在集群执行）：如果一个函数被调用时带有非常量表达式或 isDeterministic() 返回 false。

十、内置函数的确定性

系统（内置）函数的确定性是不可改变的。存在两种不具有确定性的函数：动态函数和非确定性函数，根据 Apache Calcite SqlOperator 的定义：

/** * Returns whether a call to this operator is guaranteed to always return * the same result given the same operands; true is assumed by default. */ public boolean isDeterministic() { return true; } /** * Returns whether it is unsafe to cache query plans referencing this * operator; false is assumed by default. */ public boolean isDynamicFunction() { return false; }

isDeterministic 表示函数的确定性，声明返回 false 时将在运行时对每个记录进行计算。 isDynamicFunction 声明返回 true 时意味着该函数只能在查询开始时被计算，对于批处理模式，它只在生成执行计划期间被执行，而对于流模式，它等效于一个非确定性的函数，这是因为查询在逻辑上是连续执行的（流模式对动态表的连续查询抽象），所以动态函数在每次查询执行时也会被重新计算（当前实现下等效于每条记录计算）。

以下内置函数总是非确定性的（批和流模式下，都在运行时对每条记录进行计算）

UUID

RAND

RAND_INTEGER

CURRENT_DATABASE

UNIX_TIMESTAMP

CURRENT_ROW_TIMESTAMP

以下内置时间函数是动态的，批处理模式下，将在生成执行计划期间被执行（查询开始），对于流模式，将在运行时对每条记录进行计算

CURRENT_DATE

CURRENT_TIME

CURRENT_TIMESTAMP

NOW

LOCALTIME

LOCALTIMESTAMP

注意：isDynamicFunction 仅适用于内置函数

十一、运行时集成

有时候自定义函数需要获取一些全局信息，或者在真正被调用之前做一些配置（setup）/清理（clean-up）的工作。自定义函数也提供了 open() 和 close() 方法，你可以重写这两个方法做到类似于 DataStream API 中 RichFunction 的功能。

open() 方法在求值方法被调用之前先调用。close() 方法在求值方法调用完之后被调用。

open() 方法提供了一个 FunctionContext，它包含了一些自定义函数被执行时的上下文信息，比如 metric group、分布式文件缓存，或者是全局的作业参数等。

下面的信息可以通过调用 FunctionContext 的对应的方法来获得：

方法描述

getMetricGroup() 执行该函数的 subtask 的 Metric Group。

getCachedFile(name) 分布式文件缓存的本地临时文件副本。

getJobParameter(name, defaultValue) 跟对应的 key 关联的全局参数值。

下面的例子展示了如何在一个标量函数中通过 FunctionContext 来获取一个全局的任务参数：

Java代码：

import org.apache.flink.table.api.*; import org.apache.flink.table.functions.FunctionContext; import org.apache.flink.table.functions.ScalarFunction; public static class HashCodeFunction extends ScalarFunction { private int factor = 0; @Override public void open(FunctionContext context) throws Exception { // 获取参数 "hashcode_factor" // 如果不存在，则使用默认值 "12" factor = Integer.parseInt(context.getJobParameter("hashcode_factor", "12")); } public int eval(String s) { return s.hashCode() * factor; } } TableEnvironment env = TableEnvironment.create(...); // 设置任务参数 env.getConfig().addJobParameter("hashcode_factor", "31"); // 注册函数 env.createTemporarySystemFunction("hashCode", HashCodeFunction.class); // 调用函数 env.sqlQuery("SELECT myField, hashCode(myField) FROM MyTable");

Scala代码：

import org.apache.flink.table.api._ import org.apache.flink.table.functions.FunctionContext import org.apache.flink.table.functions.ScalarFunction class HashCodeFunction extends ScalarFunction { private var factor: Int = 0 override def open(context: FunctionContext): Unit = { // 获取参数 "hashcode_factor" // 如果不存在，则使用默认值 "12" factor = context.getJobParameter("hashcode_factor", "12").toInt } def eval(s: String): Int = { s.hashCode * factor } } val env = TableEnvironment.create(...) // 设置任务参数 env.getConfig.addJobParameter("hashcode_factor", "31") // 注册函数 env.createTemporarySystemFunction("hashCode", classOf[HashCodeFunction]) // 调用函数 env.sqlQuery("SELECT myField, hashCode(myField) FROM MyTable")

十二、标量函数

自定义标量函数可以把 0 到多个标量值映射成 1 个标量值，数据类型里列出的任何数据类型都可作为求值方法的参数和返回值类型。

想要实现自定义标量函数，你需要扩展 org.apache.flink.table.functions 里面的 ScalarFunction 并且实现一个或者多个求值方法。标量函数的行为取决于你写的求值方法。求值方法必须是 public 的，而且名字必须是 eval。

下面的例子展示了如何实现一个求哈希值的函数并在查询里调用它

Java代码：

import org.apache.flink.table.annotation.InputGroup; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.ScalarFunction; import static org.apache.flink.table.api.Expressions.*; public static class HashFunction extends ScalarFunction { // 接受任意类型输入，返回 INT 型输出 public int eval(@DataTypeHint(inputGroup = InputGroup.ANY) Object o) { return o.hashCode(); } } TableEnvironment env = TableEnvironment.create(...); // 在 Table API 里不经注册直接“内联”调用函数 env.from("MyTable").select(call(HashFunction.class, $("myField"))); // 注册函数 env.createTemporarySystemFunction("HashFunction", HashFunction.class); // 在 Table API 里调用注册好的函数 env.from("MyTable").select(call("HashFunction", $("myField"))); // 在 SQL 里调用注册好的函数 env.sqlQuery("SELECT HashFunction(myField) FROM MyTable");

Scala代码：

import org.apache.flink.table.annotation.InputGroup import org.apache.flink.table.api._ import org.apache.flink.table.functions.ScalarFunction class HashFunction extends ScalarFunction { // 接受任意类型输入，返回 INT 型输出 def eval(@DataTypeHint(inputGroup = InputGroup.ANY) o: AnyRef): Int { return o.hashCode(); } } val env = TableEnvironment.create(...) // 在 Table API 里不经注册直接“内联”调用函数 env.from("MyTable").select(call(classOf[HashFunction], $"myField")) // 注册函数 env.createTemporarySystemFunction("HashFunction", classOf[HashFunction]) // 在 Table API 里调用注册好的函数 env.from("MyTable").select(call("HashFunction", $"myField")) // 在 SQL 里调用注册好的函数 env.sqlQuery("SELECT HashFunction(myField) FROM MyTable")

十三、表值函数

跟自定义标量函数一样，自定义表值函数的输入参数也可以是 0 到多个标量。但是跟标量函数只能返回一个值不同的是，它可以返回任意多行。返回的每一行可以包含 1 到多列，如果输出行只包含 1 列，会省略结构化信息并生成标量值，这个标量值在运行阶段会隐式地包装进行里。

要定义一个表值函数，你需要扩展 org.apache.flink.table.functions 下的 TableFunction，可以通过实现多个名为 eval 的方法对求值方法进行重载。像其他函数一样，输入和输出类型也可以通过反射自动提取出来。表值函数返回的表的类型取决于 TableFunction 类的泛型参数 T，不同于标量函数，表值函数的求值方法本身不包含返回类型，而是通过 collect(T) 方法来发送要输出的行。

在 Table API 中，表值函数是通过 .joinLateral(…) 或者 .leftOuterJoinLateral(…) 来使用的。joinLateral 算子会把外表（算子左侧的表）的每一行跟跟表值函数返回的所有行（位于算子右侧）进行（cross）join。leftOuterJoinLateral 算子也是把外表（算子左侧的表）的每一行跟表值函数返回的所有行（位于算子右侧）进行（cross）join，并且如果表值函数返回 0 行也会保留外表的这一行。

在 SQL 里面用 JOIN 或者以 ON TRUE 为条件的 LEFT JOIN 来配合 LATERAL TABLE() 的使用。

下面的例子展示了如何实现一个分隔函数并在查询里调用它：

Java代码：

import org.apache.flink.table.annotation.DataTypeHint; import org.apache.flink.table.annotation.FunctionHint; import org.apache.flink.table.api.*; import org.apache.flink.table.functions.TableFunction; import org.apache.flink.types.Row; import static org.apache.flink.table.api.Expressions.*; @FunctionHint(output = @DataTypeHint("ROW")) public static class SplitFunction extends TableFunction<Row> { public void eval(String str) { for (String s : str.split(" ")) { // use collect(...) to emit a row collect(Row.of(s, s.length())); } } } TableEnvironment env = TableEnvironment.create(...); // 在 Table API 里不经注册直接“内联”调用函数 env .from("MyTable") .joinLateral(call(SplitFunction.class, $("myField"))) .select($("myField"), $("word"), $("length")); env .from("MyTable") .leftOuterJoinLateral(call(SplitFunction.class, $("myField"))) .select($("myField"), $("word"), $("length")); // 在 Table API 里重命名函数字段 env .from("MyTable") .leftOuterJoinLateral(call(SplitFunction.class, $("myField")).as("newWord", "newLength")) .select($("myField"), $("newWord"), $("newLength")); // 注册函数 env.createTemporarySystemFunction("SplitFunction", SplitFunction.class); // 在 Table API 里调用注册好的函数 env .from("MyTable") .joinLateral(call("SplitFunction", $("myField"))) .select($("myField"), $("word"), $("length")); env .from("MyTable") .leftOuterJoinLateral(call("SplitFunction", $("myField"))) .select($("myField"), $("word"), $("length")); // 在 SQL 里调用注册好的函数 env.sqlQuery( "SELECT myField, word, length " + "FROM MyTable, LATERAL TABLE(SplitFunction(myField))"); env.sqlQuery( "SELECT myField, word, length " + "FROM MyTable " + "LEFT JOIN LATERAL TABLE(SplitFunction(myField)) ON TRUE"); // 在 SQL 里重命名函数字段 env.sqlQuery( "SELECT myField, newWord, newLength " + "FROM MyTable " + "LEFT JOIN LATERAL TABLE(SplitFunction(myField)) AS T(newWord, newLength) ON TRUE");

Scala代码：

import org.apache.flink.table.annotation.DataTypeHint import org.apache.flink.table.annotation.FunctionHint import org.apache.flink.table.api._ import org.apache.flink.table.functions.TableFunction import org.apache.flink.types.Row @FunctionHint(output = new DataTypeHint("ROW")) class SplitFunction extends TableFunction[Row] { def eval(str: String): Unit = { // use collect(...) to emit a row str.split(" ").foreach(s => collect(Row.of(s, Int.box(s.length)))) } } val env = TableEnvironment.create(...) // 在 Table API 里不经注册直接“内联”调用函数 env .from("MyTable") .joinLateral(call(classOf[SplitFunction], $"myField") .select($"myField", $"word", $"length") env .from("MyTable") .leftOuterJoinLateral(call(classOf[SplitFunction], $"myField")) .select($"myField", $"word", $"length") // 在 Table API 里重命名函数字段 env .from("MyTable") .leftOuterJoinLateral(call(classOf[SplitFunction], $"myField").as("newWord", "newLength")) .select($"myField", $"newWord", $"newLength") // 注册函数 env.createTemporarySystemFunction("SplitFunction", classOf[SplitFunction]) // 在 Table API 里调用注册好的函数 env .from("MyTable") .joinLateral(call("SplitFunction", $"myField")) .select($"myField", $"word", $"length") env .from("MyTable") .leftOuterJoinLateral(call("SplitFunction", $"myField")) .select($"myField", $"word", $"length") // 在 SQL 里调用注册好的函数 env.sqlQuery( "SELECT myField, word, length " + "FROM MyTable, LATERAL TABLE(SplitFunction(myField))"); env.sqlQuery( "SELECT myField, word, length " + "FROM MyTable " + "LEFT JOIN LATERAL TABLE(SplitFunction(myField)) ON TRUE") // 在 SQL 里重命名函数字段 env.sqlQuery( "SELECT myField, newWord, newLength " + "FROM MyTable " + "LEFT JOIN LATERAL TABLE(SplitFunction(myField)) AS T(newWord, newLength) ON TRUE")

如果你打算使用 Scala，不要把表值函数声明为 Scala object，Scala object 是单例对象，将导致并发问题。

十四、聚合函数

自定义聚合函数（UDAGG）是把一个表（一行或者多行，每行可以有一列或者多列）聚合成一个标量值。

上面的图片展示了一个聚合的例子。假设你有一个关于饮料的表。表里面有三个字段，分别是 id、name、price，表里有 5 行数据。假设你需要找到所有饮料里最贵的饮料的价格，即执行一个 max() 聚合。你需要遍历所有 5 行数据，而结果就只有一个数值。

自定义聚合函数是通过扩展 AggregateFunction 来实现的。AggregateFunction 的工作过程如下。首先，它需要一个 accumulator，它是一个数据结构，存储了聚合的中间结果。通过调用 AggregateFunction 的 createAccumulator() 方法创建一个空的 accumulator。接下来，对于每一行数据，会调用 accumulate() 方法来更新 accumulator。当所有的数据都处理完了之后，通过调用 getValue 方法来计算和返回最终的结果。

下面几个方法是每个 AggregateFunction 必须要实现的：

createAccumulator()

accumulate()

getValue()

Flink 的类型推导在遇到复杂类型的时候可能会推导出错误的结果，比如那些非基本类型和普通的 POJO 类型的复杂类型。所以跟 ScalarFunction 和 TableFunction 一样，AggregateFunction 也提供了 AggregateFunction#getResultType() 和 AggregateFunction#getAccumulatorType() 来分别指定返回值类型和 accumulator 的类型，两个函数的返回值类型也都是 TypeInformation。

除了上面的方法，还有几个方法可以选择实现。这些方法有些可以让查询更加高效，而有些是在某些特定场景下必须要实现的。例如，如果聚合函数用在会话窗口（当两个会话窗口合并的时候需要 merge 他们的 accumulator）的话，merge() 方法就是必须要实现的。

AggregateFunction 的以下方法在某些场景下是必须实现的：

retract() 在 bounded OVER 窗口中是必须实现的。

merge() 在许多批式聚合和会话以及滚动窗口聚合中是必须实现的。除此之外，这个方法对于优化也很多帮助。例如，两阶段聚合优化就需要所有的 AggregateFunction 都实现 merge 方法。

resetAccumulator() 在许多批式聚合中是必须实现的。

AggregateFunction 的所有方法都必须是 public 的，不能是 static 的，而且名字必须跟上面写的一样。createAccumulator、getValue、getResultType 以及 getAccumulatorType 这几个函数是在抽象类 AggregateFunction 中定义的，而其他函数都是约定的方法。如果要定义一个聚合函数，你需要扩展 org.apache.flink.table.functions.AggregateFunction，并且实现一个（或者多个）accumulate 方法。accumulate 方法可以重载，每个方法的参数类型不同，并且支持变长参数。

AggregateFunction 的所有方法的详细文档如下。

Java代码：

/** * Base class for user-defined aggregates and table aggregates. * * @param the type of the aggregation result. * @param the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. */ public abstract class UserDefinedAggregateFunction<T, ACC> extends UserDefinedFunction { /** * Creates and init the Accumulator for this (table)aggregate function. * * @return the accumulator with the initial value */ public ACC createAccumulator(); // MANDATORY /** * Returns the TypeInformation of the (table)aggregate function's result. * * @return The TypeInformation of the (table)aggregate function's result or null if the result * type should be automatically inferred. */ public TypeInformation<T> getResultType = null; // PRE-DEFINED /** * Returns the TypeInformation of the (table)aggregate function's accumulator. * * @return The TypeInformation of the (table)aggregate function's accumulator or null if the * accumulator type should be automatically inferred. */ public TypeInformation<ACC> getAccumulatorType = null; // PRE-DEFINED } /** * Base class for aggregation functions. * * @param the type of the aggregation result * @param the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. * AggregateFunction represents its state using accumulator, thereby the state of the * AggregateFunction must be put into the accumulator. */ public abstract class AggregateFunction<T, ACC> extends UserDefinedAggregateFunction<T, ACC> { /** Processes the input values and update the provided accumulator instance. The method * accumulate can be overloaded with different custom types and arguments. An AggregateFunction * requires at least one accumulate() method. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ public void accumulate(ACC accumulator, [user defined inputs]); // MANDATORY /** * Retracts the input values from the accumulator instance. The current design assumes the * inputs are the values that have been previously accumulated. The method retract can be * overloaded with different custom types and arguments. This function must be implemented for * datastream bounded over aggregate. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ public void retract(ACC accumulator, [user defined inputs]); // OPTIONAL /** * Merges a group of accumulator instances into one accumulator instance. This function must be * implemented for datastream session window grouping aggregate and bounded grouping aggregate. * * @param accumulator the accumulator which will keep the merged aggregate results. It should * be noted that the accumulator may contain the previous aggregated * results. Therefore user should not replace or clean this instance in the * custom merge method. * @param its an {@link java.lang.Iterable} pointed to a group of accumulators that will be * merged. */ public void merge(ACC accumulator, java.lang.Iterable<ACC> its); // OPTIONAL /** * Called every time when an aggregation result should be materialized. * The returned value could be either an early and incomplete result * (periodically emitted as data arrive) or the final result of the * aggregation. * * @param accumulator the accumulator which contains the current * aggregated results * @return the aggregation result */ public T getValue(ACC accumulator); // MANDATORY /** * Resets the accumulator for this [[AggregateFunction]]. This function must be implemented for * bounded grouping aggregate. * * @param accumulator the accumulator which needs to be reset */ public void resetAccumulator(ACC accumulator); // OPTIONAL /** * Returns true if this AggregateFunction can only be applied in an OVER window. * * @return true if the AggregateFunction requires an OVER window, false otherwise. */ public Boolean requiresOver = false; // PRE-DEFINED }

Scala代码：

/** * Base class for user-defined aggregates and table aggregates. * * @tparam T the type of the aggregation result. * @tparam ACC the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. */ abstract class UserDefinedAggregateFunction[T, ACC] extends UserDefinedFunction { /** * Creates and init the Accumulator for this (table)aggregate function. * * @return the accumulator with the initial value */ def createAccumulator(): ACC // MANDATORY /** * Returns the TypeInformation of the (table)aggregate function's result. * * @return The TypeInformation of the (table)aggregate function's result or null if the result * type should be automatically inferred. */ def getResultType: TypeInformation[T] = null // PRE-DEFINED /** * Returns the TypeInformation of the (table)aggregate function's accumulator. * * @return The TypeInformation of the (table)aggregate function's accumulator or null if the * accumulator type should be automatically inferred. */ def getAccumulatorType: TypeInformation[ACC] = null // PRE-DEFINED } /** * Base class for aggregation functions. * * @tparam T the type of the aggregation result * @tparam ACC the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. * AggregateFunction represents its state using accumulator, thereby the state of the * AggregateFunction must be put into the accumulator. */ abstract class AggregateFunction[T, ACC] extends UserDefinedAggregateFunction[T, ACC] { /** * Processes the input values and update the provided accumulator instance. The method * accumulate can be overloaded with different custom types and arguments. An AggregateFunction * requires at least one accumulate() method. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ def accumulate(accumulator: ACC, [user defined inputs]): Unit // MANDATORY /** * Retracts the input values from the accumulator instance. The current design assumes the * inputs are the values that have been previously accumulated. The method retract can be * overloaded with different custom types and arguments. This function must be implemented for * datastream bounded over aggregate. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ def retract(accumulator: ACC, [user defined inputs]): Unit // OPTIONAL /** * Merges a group of accumulator instances into one accumulator instance. This function must be * implemented for datastream session window grouping aggregate and bounded grouping aggregate. * * @param accumulator the accumulator which will keep the merged aggregate results. It should * be noted that the accumulator may contain the previous aggregated * results. Therefore user should not replace or clean this instance in the * custom merge method. * @param its an [[java.lang.Iterable]] pointed to a group of accumulators that will be * merged. */ def merge(accumulator: ACC, its: java.lang.Iterable[ACC]): Unit // OPTIONAL /** * Called every time when an aggregation result should be materialized. * The returned value could be either an early and incomplete result * (periodically emitted as data arrive) or the final result of the * aggregation. * * @param accumulator the accumulator which contains the current * aggregated results * @return the aggregation result */ def getValue(accumulator: ACC): T // MANDATORY /** * Resets the accumulator for this [[AggregateFunction]]. This function must be implemented for * bounded grouping aggregate. * * @param accumulator the accumulator which needs to be reset */ def resetAccumulator(accumulator: ACC): Unit // OPTIONAL /** * Returns true if this AggregateFunction can only be applied in an OVER window. * * @return true if the AggregateFunction requires an OVER window, false otherwise. */ def requiresOver: Boolean = false // PRE-DEFINED }

下面的例子展示了如何：

定义一个聚合函数来计算某一列的加权平均，

在 TableEnvironment 中注册函数，

在查询中使用函数。

为了计算加权平均值，accumulator 需要存储加权总和以及数据的条数。在我们的例子里，我们定义了一个类 WeightedAvgAccum 来作为 accumulator。Flink 的 checkpoint 机制会自动保存 accumulator，在失败时进行恢复，以此来保证精确一次的语义。

我们的 WeightedAvg（聚合函数）的 accumulate 方法有三个输入参数。第一个是 WeightedAvgAccum accumulator，另外两个是用户自定义的输入：输入的值 ivalue 和输入的权重 iweight。尽管 retract()、merge()、resetAccumulator() 这几个方法在大多数聚合类型中都不是必须实现的，我们也在样例中提供了他们的实现。请注意我们在 Scala 样例中也是用的是 Java 的基础类型，并且定义了 getResultType() 和 getAccumulatorType()，因为 Flink 的类型推导对于 Scala 的类型推导做的不是很好。

Java代码：

/** * Accumulator for WeightedAvg. */ public static class WeightedAvgAccum { public long sum = 0; public int count = 0; } /** * Weighted Average user-defined aggregate function. */ public static class WeightedAvg extends AggregateFunction<Long, WeightedAvgAccum> { @Override public WeightedAvgAccum createAccumulator() { return new WeightedAvgAccum(); } @Override public Long getValue(WeightedAvgAccum acc) { if (acc.count == 0) { return null; } else { return acc.sum / acc.count; } } public void accumulate(WeightedAvgAccum acc, long iValue, int iWeight) { acc.sum += iValue * iWeight; acc.count += iWeight; } public void retract(WeightedAvgAccum acc, long iValue, int iWeight) { acc.sum -= iValue * iWeight; acc.count -= iWeight; } public void merge(WeightedAvgAccum acc, Iterable<WeightedAvgAccum> it) { Iterator<WeightedAvgAccum> iter = it.iterator(); while (iter.hasNext()) { WeightedAvgAccum a = iter.next(); acc.count += a.count; acc.sum += a.sum; } } public void resetAccumulator(WeightedAvgAccum acc) { acc.count = 0; acc.sum = 0L; } } // 注册函数 StreamTableEnvironment tEnv = ... tEnv.registerFunction("wAvg", new WeightedAvg()); // 使用函数 tEnv.sqlQuery("SELECT user, wAvg(points, level) AS avgPoints FROM userScores GROUP BY user");

Scala代码：

import java.lang.{Long => JLong, Integer => JInteger} import org.apache.flink.api.java.tuple.{Tuple1 => JTuple1} import org.apache.flink.api.java.typeutils.TupleTypeInfo import org.apache.flink.table.api.Types import org.apache.flink.table.functions.AggregateFunction /** * Accumulator for WeightedAvg. */ class WeightedAvgAccum extends JTuple1[JLong, JInteger] { sum = 0L count = 0 } /** * Weighted Average user-defined aggregate function. */ class WeightedAvg extends AggregateFunction[JLong, CountAccumulator] { override def createAccumulator(): WeightedAvgAccum = { new WeightedAvgAccum } override def getValue(acc: WeightedAvgAccum): JLong = { if (acc.count == 0) { null } else { acc.sum / acc.count } } def accumulate(acc: WeightedAvgAccum, iValue: JLong, iWeight: JInteger): Unit = { acc.sum += iValue * iWeight acc.count += iWeight } def retract(acc: WeightedAvgAccum, iValue: JLong, iWeight: JInteger): Unit = { acc.sum -= iValue * iWeight acc.count -= iWeight } def merge(acc: WeightedAvgAccum, it: java.lang.Iterable[WeightedAvgAccum]): Unit = { val iter = it.iterator() while (iter.hasNext) { val a = iter.next() acc.count += a.count acc.sum += a.sum } } def resetAccumulator(acc: WeightedAvgAccum): Unit = { acc.count = 0 acc.sum = 0L } override def getAccumulatorType: TypeInformation[WeightedAvgAccum] = { new TupleTypeInfo(classOf[WeightedAvgAccum], Types.LONG, Types.INT) } override def getResultType: TypeInformation[JLong] = Types.LONG } // 注册函数 val tEnv: StreamTableEnvironment = ??? tEnv.registerFunction("wAvg", new WeightedAvg()) // 使用函数 tEnv.sqlQuery("SELECT user, wAvg(points, level) AS avgPoints FROM userScores GROUP BY user")

python代码：

''' Java code: /** * Accumulator for WeightedAvg. */ public static class WeightedAvgAccum { public long sum = 0; public int count = 0; } // The java class must have a public no-argument constructor and can be founded in current java classloader. // Java 类必须有一个 public 的无参构造函数，并且可以在当前类加载器中加载到。 /** * Weighted Average user-defined aggregate function. */ public static class WeightedAvg extends AggregateFunction { @Override public WeightedAvgAccum createAccumulator() { return new WeightedAvgAccum(); } @Override public Long getValue(WeightedAvgAccum acc) { if (acc.count == 0) { return null; } else { return acc.sum / acc.count; } } public void accumulate(WeightedAvgAccum acc, long iValue, int iWeight) { acc.sum += iValue * iWeight; acc.count += iWeight; } public void retract(WeightedAvgAccum acc, long iValue, int iWeight) { acc.sum -= iValue * iWeight; acc.count -= iWeight; } public void merge(WeightedAvgAccum acc, Iterable it) { Iterator iter = it.iterator(); while (iter.hasNext()) { WeightedAvgAccum a = iter.next(); acc.count += a.count; acc.sum += a.sum; } } public void resetAccumulator(WeightedAvgAccum acc) { acc.count = 0; acc.sum = 0L; } } ''' # 注册函数 t_env = ... # type: StreamTableEnvironment t_env.register_java_function("wAvg", "my.java.function.WeightedAvg") # 使用函数 t_env.sql_query("SELECT user, wAvg(points, level) AS avgPoints FROM userScores GROUP BY user")

十五、表值聚合函数

自定义表值聚合函数（UDTAGG）可以把一个表（一行或者多行，每行有一列或者多列）聚合成另一张表，结果中可以有多行多列

上图展示了一个表值聚合函数的例子。假设你有一个饮料的表，这个表有 3 列，分别是 id、name 和 price，一共有 5 行。假设你需要找到价格最高的两个饮料，类似于 top2() 表值聚合函数。你需要遍历所有 5 行数据，结果是有 2 行数据的一个表。

用户自定义表值聚合函数是通过扩展 TableAggregateFunction 类来实现的。一个 TableAggregateFunction 的工作过程如下。首先，它需要一个 accumulator，这个 accumulator 负责存储聚合的中间结果。通过调用 TableAggregateFunction 的 createAccumulator 方法来构造一个空的 accumulator。接下来，对于每一行数据，会调用 accumulate 方法来更新 accumulator。当所有数据都处理完之后，调用 emitValue 方法来计算和返回最终的结果。

下面几个 TableAggregateFunction 的方法是必须要实现的：

createAccumulator()

accumulate()
Flink 的类型推导在遇到复杂类型的时候可能会推导出错误的结果，比如那些非基本类型和普通的 POJO 类型的复杂类型。所以类似于 ScalarFunction 和 TableFunction，TableAggregateFunction 也提供了 TableAggregateFunction#getResultType() 和 TableAggregateFunction#getAccumulatorType() 方法来指定返回值类型和 accumulator 的类型，这两个方法都需要返回 TypeInformation。

除了上面的方法，还有几个其他的方法可以选择性的实现。有些方法可以让查询更加高效，而有些方法对于某些特定场景是必须要实现的。比如，在会话窗口（当两个会话窗口合并时会合并两个 accumulator）中使用聚合函数时，必须要实现merge() 方法。

下面几个 TableAggregateFunction 的方法在某些特定场景下是必须要实现的：

retract() 在 bounded OVER 窗口中的聚合函数必须要实现。

merge() 在许多批式聚合和以及流式会话和滑动窗口聚合中是必须要实现的。

resetAccumulator() 在许多批式聚合中是必须要实现的。

emitValue() 在批式聚合以及窗口聚合中是必须要实现的。

下面的 TableAggregateFunction 的方法可以提升流式任务的效率：

emitUpdateWithRetract() 在 retract 模式下，该方法负责发送被更新的值。

emitValue 方法会发送所有 accumulator 给出的结果。拿 TopN 来说，emitValue 每次都会发送所有的最大的 n 个值。这在流式任务中可能会有一些性能问题。为了提升性能，用户可以实现 emitUpdateWithRetract 方法。这个方法在 retract 模式下会增量的输出结果，比如有数据更新了，我们必须要撤回老的数据，然后再发送新的数据。如果定义了 emitUpdateWithRetract 方法，那它会优先于 emitValue 方法被使用，因为一般认为 emitUpdateWithRetract 会更加高效，因为它的输出是增量的。

TableAggregateFunction 的所有方法都必须是 public 的、非 static 的，而且名字必须跟上面提到的一样。createAccumulator、getResultType 和 getAccumulatorType 这三个方法是在抽象父类 TableAggregateFunction 中定义的，而其他的方法都是约定的方法。要实现一个表值聚合函数，你必须扩展 org.apache.flink.table.functions.TableAggregateFunction，并且实现一个（或者多个）accumulate 方法。accumulate 方法可以有多个重载的方法，也可以支持变长参数。

TableAggregateFunction 的所有方法的详细文档如下：

Java代码：

/** * Base class for user-defined aggregates and table aggregates. * * @param the type of the aggregation result. * @param the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. */ public abstract class UserDefinedAggregateFunction<T, ACC> extends UserDefinedFunction { /** * Creates and init the Accumulator for this (table)aggregate function. * * @return the accumulator with the initial value */ public ACC createAccumulator(); // MANDATORY /** * Returns the TypeInformation of the (table)aggregate function's result. * * @return The TypeInformation of the (table)aggregate function's result or null if the result * type should be automatically inferred. */ public TypeInformation<T> getResultType = null; // PRE-DEFINED /** * Returns the TypeInformation of the (table)aggregate function's accumulator. * * @return The TypeInformation of the (table)aggregate function's accumulator or null if the * accumulator type should be automatically inferred. */ public TypeInformation<ACC> getAccumulatorType = null; // PRE-DEFINED } /** * Base class for table aggregation functions. * * @param the type of the aggregation result * @param the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute a table aggregation result. * TableAggregateFunction represents its state using accumulator, thereby the state of * the TableAggregateFunction must be put into the accumulator. */ public abstract class TableAggregateFunction<T, ACC> extends UserDefinedAggregateFunction<T, ACC> { /** Processes the input values and update the provided accumulator instance. The method * accumulate can be overloaded with different custom types and arguments. A TableAggregateFunction * requires at least one accumulate() method. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ public void accumulate(ACC accumulator, [user defined inputs]); // MANDATORY /** * Retracts the input values from the accumulator instance. The current design assumes the * inputs are the values that have been previously accumulated. The method retract can be * overloaded with different custom types and arguments. This function must be implemented for * datastream bounded over aggregate. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ public void retract(ACC accumulator, [user defined inputs]); // OPTIONAL /** * Merges a group of accumulator instances into one accumulator instance. This function must be * implemented for datastream session window grouping aggregate and bounded grouping aggregate. * * @param accumulator the accumulator which will keep the merged aggregate results. It should * be noted that the accumulator may contain the previous aggregated * results. Therefore user should not replace or clean this instance in the * custom merge method. * @param its an {@link java.lang.Iterable} pointed to a group of accumulators that will be * merged. */ public void merge(ACC accumulator, java.lang.Iterable<ACC> its); // OPTIONAL /** * Called every time when an aggregation result should be materialized. The returned value * could be either an early and incomplete result (periodically emitted as data arrive) or * the final result of the aggregation. * * @param accumulator the accumulator which contains the current * aggregated results * @param out the collector used to output data */ public void emitValue(ACC accumulator, Collector<T> out); // OPTIONAL /** * Called every time when an aggregation result should be materialized. The returned value * could be either an early and incomplete result (periodically emitted as data arrive) or * the final result of the aggregation. * * Different from emitValue, emitUpdateWithRetract is used to emit values that have been updated. * This method outputs data incrementally in retract mode, i.e., once there is an update, we * have to retract old records before sending new updated ones. The emitUpdateWithRetract * method will be used in preference to the emitValue method if both methods are defined in the * table aggregate function, because the method is treated to be more efficient than emitValue * as it can outputvalues incrementally. * * @param accumulator the accumulator which contains the current * aggregated results * @param out the retractable collector used to output data. Use collect method * to output(add) records and use retract method to retract(delete) * records. */ public void emitUpdateWithRetract(ACC accumulator, RetractableCollector<T> out); // OPTIONAL /** * Collects a record and forwards it. The collector can output retract messages with the retract * method. Note: only use it in {@code emitRetractValueIncrementally}. */ public interface RetractableCollector<T> extends Collector<T> { /** * Retract a record. * * @param record The record to retract. */ void retract(T record); } }

Scala代码：

/** * Base class for user-defined aggregates and table aggregates. * * @tparam T the type of the aggregation result. * @tparam ACC the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. */ abstract class UserDefinedAggregateFunction[T, ACC] extends UserDefinedFunction { /** * Creates and init the Accumulator for this (table)aggregate function. * * @return the accumulator with the initial value */ def createAccumulator(): ACC // MANDATORY /** * Returns the TypeInformation of the (table)aggregate function's result. * * @return The TypeInformation of the (table)aggregate function's result or null if the result * type should be automatically inferred. */ def getResultType: TypeInformation[T] = null // PRE-DEFINED /** * Returns the TypeInformation of the (table)aggregate function's accumulator. * * @return The TypeInformation of the (table)aggregate function's accumulator or null if the * accumulator type should be automatically inferred. */ def getAccumulatorType: TypeInformation[ACC] = null // PRE-DEFINED } /** * Base class for table aggregation functions. * * @tparam T the type of the aggregation result * @tparam ACC the type of the aggregation accumulator. The accumulator is used to keep the * aggregated values which are needed to compute an aggregation result. * TableAggregateFunction represents its state using accumulator, thereby the state of * the TableAggregateFunction must be put into the accumulator. */ abstract class TableAggregateFunction[T, ACC] extends UserDefinedAggregateFunction[T, ACC] { /** * Processes the input values and update the provided accumulator instance. The method * accumulate can be overloaded with different custom types and arguments. A TableAggregateFunction * requires at least one accumulate() method. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ def accumulate(accumulator: ACC, [user defined inputs]): Unit // MANDATORY /** * Retracts the input values from the accumulator instance. The current design assumes the * inputs are the values that have been previously accumulated. The method retract can be * overloaded with different custom types and arguments. This function must be implemented for * datastream bounded over aggregate. * * @param accumulator the accumulator which contains the current aggregated results * @param [user defined inputs] the input value (usually obtained from a new arrived data). */ def retract(accumulator: ACC, [user defined inputs]): Unit // OPTIONAL /** * Merges a group of accumulator instances into one accumulator instance. This function must be * implemented for datastream session window grouping aggregate and bounded grouping aggregate. * * @param accumulator the accumulator which will keep the merged aggregate results. It should * be noted that the accumulator may contain the previous aggregated * results. Therefore user should not replace or clean this instance in the * custom merge method. * @param its an [[java.lang.Iterable]] pointed to a group of accumulators that will be * merged. */ def merge(accumulator: ACC, its: java.lang.Iterable[ACC]): Unit // OPTIONAL /** * Called every time when an aggregation result should be materialized. The returned value * could be either an early and incomplete result (periodically emitted as data arrive) or * the final result of the aggregation. * * @param accumulator the accumulator which contains the current * aggregated results * @param out the collector used to output data */ def emitValue(accumulator: ACC, out: Collector[T]): Unit // OPTIONAL /** * Called every time when an aggregation result should be materialized. The returned value * could be either an early and incomplete result (periodically emitted as data arrive) or * the final result of the aggregation. * * Different from emitValue, emitUpdateWithRetract is used to emit values that have been updated. * This method outputs data incrementally in retract mode, i.e., once there is an update, we * have to retract old records before sending new updated ones. The emitUpdateWithRetract * method will be used in preference to the emitValue method if both methods are defined in the * table aggregate function, because the method is treated to be more efficient than emitValue * as it can outputvalues incrementally. * * @param accumulator the accumulator which contains the current * aggregated results * @param out the retractable collector used to output data. Use collect method * to output(add) records and use retract method to retract(delete) * records. */ def emitUpdateWithRetract(accumulator: ACC, out: RetractableCollector[T]): Unit // OPTIONAL /** * Collects a record and forwards it. The collector can output retract messages with the retract * method. Note: only use it in `emitRetractValueIncrementally`. */ trait RetractableCollector[T] extends Collector[T] { /** * Retract a record. * * @param record The record to retract. */ def retract(record: T): Unit } }

下面的例子展示了如何

定义一个 TableAggregateFunction 来计算给定列的最大的 2 个值，

在 TableEnvironment 中注册函数，

在 Table API 查询中使用函数（当前只在 Table API 中支持 TableAggregateFunction）。

为了计算最大的 2 个值，accumulator 需要保存当前看到的最大的 2 个值。在我们的例子中，我们定义了类 Top2Accum 来作为 accumulator。Flink 的 checkpoint 机制会自动保存 accumulator，并且在失败时进行恢复，来保证精确一次的语义。

我们的 Top2 表值聚合函数（TableAggregateFunction）的 accumulate() 方法有两个输入，第一个是 Top2Accum accumulator，另一个是用户定义的输入：输入的值 v。尽管 merge() 方法在大多数聚合类型中不是必须的，我们也在样例中提供了它的实现。请注意，我们在 Scala 样例中也使用的是 Java 的基础类型，并且定义了 getResultType() 和 getAccumulatorType() 方法，因为 Flink 的类型推导对于 Scala 的类型推导支持的不是很好。

Java代码：

/** * Accumulator for Top2. */ public class Top2Accum { public Integer first; public Integer second; } /** * The top2 user-defined table aggregate function. */ public static class Top2 extends TableAggregateFunction<Tuple2<Integer, Integer>, Top2Accum> { @Override public Top2Accum createAccumulator() { Top2Accum acc = new Top2Accum(); acc.first = Integer.MIN_VALUE; acc.second = Integer.MIN_VALUE; return acc; } public void accumulate(Top2Accum acc, Integer v) { if (v > acc.first) { acc.second = acc.first; acc.first = v; } else if (v > acc.second) { acc.second = v; } } public void merge(Top2Accum acc, java.lang.Iterable<Top2Accum> iterable) { for (Top2Accum otherAcc : iterable) { accumulate(acc, otherAcc.first); accumulate(acc, otherAcc.second); } } public void emitValue(Top2Accum acc, Collector<Tuple2<Integer, Integer>> out) { // emit the value and rank if (acc.first != Integer.MIN_VALUE) { out.collect(Tuple2.of(acc.first, 1)); } if (acc.second != Integer.MIN_VALUE) { out.collect(Tuple2.of(acc.second, 2)); } } } // 注册函数 StreamTableEnvironment tEnv = ... tEnv.registerFunction("top2", new Top2()); // 初始化表 Table tab = ...; // 使用函数 tab.groupBy("key") .flatAggregate("top2(a) as (v, rank)") .select("key, v, rank");

Scala代码：

import java.lang.{Integer => JInteger} import org.apache.flink.table.api.Types import org.apache.flink.table.functions.TableAggregateFunction /** * Accumulator for top2. */ class Top2Accum { var first: JInteger = _ var second: JInteger = _ } /** * The top2 user-defined table aggregate function. */ class Top2 extends TableAggregateFunction[JTuple2[JInteger, JInteger], Top2Accum] { override def createAccumulator(): Top2Accum = { val acc = new Top2Accum acc.first = Int.MinValue acc.second = Int.MinValue acc } def accumulate(acc: Top2Accum, v: Int) { if (v > acc.first) { acc.second = acc.first acc.first = v } else if (v > acc.second) { acc.second = v } } def merge(acc: Top2Accum, its: JIterable[Top2Accum]): Unit = { val iter = its.iterator() while (iter.hasNext) { val top2 = iter.next() accumulate(acc, top2.first) accumulate(acc, top2.second) } } def emitValue(acc: Top2Accum, out: Collector[JTuple2[JInteger, JInteger]]): Unit = { // emit the value and rank if (acc.first != Int.MinValue) { out.collect(JTuple2.of(acc.first, 1)) } if (acc.second != Int.MinValue) { out.collect(JTuple2.of(acc.second, 2)) } } } // 初始化表 val tab = ... // 使用函数 tab .groupBy('key) .flatAggregate(top2('a) as ('v, 'rank)) .select('key, 'v, 'rank)

下面的例子展示了如何使用 emitUpdateWithRetract 方法来只发送更新的数据。为了只发送更新的结果，accumulator 保存了上一次的最大的2个值，也保存了当前最大的2个值。注意：如果 TopN 中的 n 非常大，这种既保存上次的结果，也保存当前的结果的方式不太高效。一种解决这种问题的方式是把输入数据直接存储到 accumulator 中，然后在调用 emitUpdateWithRetract 方法时再进行计算。

Java代码：

/** * Accumulator for Top2. */ public class Top2Accum { public Integer first; public Integer second; public Integer oldFirst; public Integer oldSecond; } /** * The top2 user-defined table aggregate function. */ public static class Top2 extends TableAggregateFunction<Tuple2<Integer, Integer>, Top2Accum> { @Override public Top2Accum createAccumulator() { Top2Accum acc = new Top2Accum(); acc.first = Integer.MIN_VALUE; acc.second = Integer.MIN_VALUE; acc.oldFirst = Integer.MIN_VALUE; acc.oldSecond = Integer.MIN_VALUE; return acc; } public void accumulate(Top2Accum acc, Integer v) { if (v > acc.first) { acc.second = acc.first; acc.first = v; } else if (v > acc.second) { acc.second = v; } } public void emitUpdateWithRetract(Top2Accum acc, RetractableCollector<Tuple2<Integer, Integer>> out) { if (!acc.first.equals(acc.oldFirst)) { // if there is an update, retract old value then emit new value. if (acc.oldFirst != Integer.MIN_VALUE) { out.retract(Tuple2.of(acc.oldFirst, 1)); } out.collect(Tuple2.of(acc.first, 1)); acc.oldFirst = acc.first; } if (!acc.second.equals(acc.oldSecond)) { // if there is an update, retract old value then emit new value. if (acc.oldSecond != Integer.MIN_VALUE) { out.retract(Tuple2.of(acc.oldSecond, 2)); } out.collect(Tuple2.of(acc.second, 2)); acc.oldSecond = acc.second; } } } // 注册函数 StreamTableEnvironment tEnv = ... tEnv.registerFunction("top2", new Top2()); // 初始化表 Table tab = ...; // 使用函数 tab.groupBy("key") .flatAggregate("top2(a) as (v, rank)") .select("key, v, rank");

Scala代码：

import java.lang.{Integer => JInteger} import org.apache.flink.table.api.Types import org.apache.flink.table.functions.TableAggregateFunction /** * Accumulator for top2. */ class Top2Accum { var first: JInteger = _ var second: JInteger = _ var oldFirst: JInteger = _ var oldSecond: JInteger = _ } /** * The top2 user-defined table aggregate function. */ class Top2 extends TableAggregateFunction[JTuple2[JInteger, JInteger], Top2Accum] { override def createAccumulator(): Top2Accum = { val acc = new Top2Accum acc.first = Int.MinValue acc.second = Int.MinValue acc.oldFirst = Int.MinValue acc.oldSecond = Int.MinValue acc } def accumulate(acc: Top2Accum, v: Int) { if (v > acc.first) { acc.second = acc.first acc.first = v } else if (v > acc.second) { acc.second = v } } def emitUpdateWithRetract( acc: Top2Accum, out: RetractableCollector[JTuple2[JInteger, JInteger]]) : Unit = { if (acc.first != acc.oldFirst) { // if there is an update, retract old value then emit new value. if (acc.oldFirst != Int.MinValue) { out.retract(JTuple2.of(acc.oldFirst, 1)) } out.collect(JTuple2.of(acc.first, 1)) acc.oldFirst = acc.first } if (acc.second != acc.oldSecond) { // if there is an update, retract old value then emit new value. if (acc.oldSecond != Int.MinValue) { out.retract(JTuple2.of(acc.oldSecond, 2)) } out.collect(JTuple2.of(acc.second, 2)) acc.oldSecond = acc.second } } } // 初始化表 val tab = ... // 使用函数 tab .groupBy('key) .flatAggregate(top2('a) as ('v, 'rank)) .select('key, 'v, 'rank)

方法	描述
getMetricGroup()	执行该函数的 subtask 的 Metric Group。
getCachedFile(name)	分布式文件缓存的本地临时文件副本。
getJobParameter(name, defaultValue)	跟对应的 key 关联的全局参数值。

信创生态版图持续拓宽 | 华宇TAS应用中间件与宝德自强PR210K系列服务器完成兼容互认证 Thuni_soft 中间件服务器运维
近日，华宇TAS应用中间件软件完成与宝德计算机系统股份有限公司自强PR210K系列服务器、自强PT620Q、PT620W台式机的兼容适配，双方将深入融合各自的产品优势和服务能力，为企事业单位的数字化转型提供高性价比、安全可靠的解决方案，充分满足客户对国产化应用的高标准要宝德计算机系统股份有限公司宝德计算机系统股份有限公司是中国领先的计算产品方案提供商，以服务器和PC整机的研发、生产、销售及提供相关
基于JavaSpringmvc+myabtis+html的鲜花商城系统设计和实现网顺技术团队成品程序项目 html 前端课程设计 java 开发语言 mybatis
基于JavaSpringmvc+myabtis+html的鲜花商城系统设计和实现作者主页网顺技术团队欢迎点赞收藏⭐留言文末获取源码联系方式查看下方微信号获取联系方式承接各种定制系统精彩系列推荐精彩专栏推荐订阅不然下次找不到哟Java毕设项目精品实战案例《1000套》感兴趣的可以先收藏起来，还有大家在毕设选题，项目以及论文编写等相关问题都可以给我留言咨询，希望帮助更多的人文章目录基于JavaSpri
数据库必知必会系列：数据库主从复制与读写分离 AI天才研究院 AI大模型企业级应用开发实战大数据人工智能语言模型 Java Python 架构设计
作者：禅与计算机程序设计艺术1.背景介绍什么是主从复制？什么是读写分离？主从复制和读写分离是关系型数据库领域中最基础的两个概念。而在分布式环境下，如何实现主从复制和读写分离，是一个重要话题。作为开发人员，我们应该了解并掌握这些知识，因为他们将影响到我们开发、运行应用和维护系统的能力。数据库主从复制和读写分离在架构上给予了我们很大的灵活性，可以满足我们的各种需求。今天，我想和大家分享《数据库必知必会
Spring Cloud 微服务实战：网关那些事儿 Leaton Lee spring cloud spring
引言：网关在微服务架构中的重要地位在微服务架构中，网关（Gateway）扮演着“守门人”的角色。它不仅是前后端交互的唯一入口，还承担着路由、过滤、负载均衡、安全控制等多种职责。对于一个微服务系统来说，网关的设计和实现直接决定了系统的性能、安全性和可扩展性。我深知网关是大厂面试中的高频考点。无论是BAT还是其他一线互联网公司，面试官总会围绕网关的设计与实现提出一系列问题。例如：如何实现灰度发布？如何
架构设计系列（四）：设计模式 Resean0223 架构设计 SystemDesign101 设计模式 java 系统架构架构
一、概述设计模式是软件开发中常见问题的可重用解决方案，它们为构建更好的软件提供了蓝图。它们不是具体的代码，而是一种设计思想或模板，可以帮助开发人员更高效地构建可维护、可扩展的软件。二、开发必须掌握的18个关键的设计模式2.1设计模式的分类创建型模式（CreationalPatterns）解决对象创建的问题，提供灵活的对象创建机制。结构型模式（StructuralPatterns）解决类和对象的组
Linux及其系列分别怎么念？硬件王哪跑嵌入式硬件软硬件名词概念解析 linux 运维服务器
直接干货Linux一词，根据其创造者LinusTorvalds的发音应读作“li-nacks”，国际音标为[’li:nэks]，重音落在“na”上。解析为中式密咒------里那克死ubuntu是一个Linux的发行版本，由debian衍生而来。解析为中式密咒------午搬土
【机器学习】逻辑回归(LogisticRegression)原理与实战 GentleCP 机器学习(深度学习)逻辑回归 logistic regression 原理与实战机器学习
文章目录前言一、什么是逻辑回归1.1逻辑回归基础概念1.2逻辑回归核心概念二、逻辑回归Demo2.1数据准备2.2创建逻辑回归分类器2.3分类器预测三、逻辑回归实战3.1数据准备3.2数据划分与模型创建3.3预测数据评估模型四、参数选择五、总结六、参考资料本文属于我的机器学习/深度学习系列文章，点此查看系列文章目录前言本文主要通过文字和代码样例讲述逻辑回归的原理（包含逻辑回归的基础概念与推导）和实
开关电源实战（一）宽范围DC降压模块MP4560 贾saisai 硬件电路学习硬件工程嵌入式硬件开关电源
系列文章目录文章目录系列文章目录MP4560MP45603.8V至55V的宽输入范围可满足各种降压应用MOSFET只有250mΩ输出可调0.8V-52VSW：需要低VF肖特基二极管接地，而且要靠近引脚，高压侧开关的输出。EN：输入使能，拉低到阈值以下关闭芯片，拉高或浮空启动COMP：Compensation，GM误差放大器输出，控制环路的频率补偿FB：误差放大器输入，连接在输出和接地之间的外部电阻
30分钟学会HTML 奇偶变不变 html 前端
HTML基本语法HTML（HyperTextMarkupLanguage）是构成网页内容的基础。它使用一系列的标签来描述网页的结构，包括文本、图片、链接等元素。浏览器会解析这些标签并渲染成我们看到的网页。在线体验一下CodePen(在线HTML编辑器)。千万不要被「超文本」、「标记语言」吓到，HTML的语法非常直观，常用的标签结构并不复杂，用于构建基础网页已经足够，稍微了解一下就能上手。就是这些基
学生福利：畅享 Adobe 全系列软件的小技巧 reddingtons adobe
作为一名学生，我有幸通过学生邮箱免费使用Adobe的全系列软件。这意味着我可以使用包括Photoshop、Illustrator、PremierePro和AfterEffects在内的所有收费软件。这些工具为我的学习和创作提供了巨大的便利。接下来，我将分享五个冷门的Adobe软件小技巧，帮助大家更高效地使用这些强大的工具。（以下有可能包含收费软件的内容，不适用于免费用户，请谅解）1.Photosh
认识sm1,sm2,sm3,sm4以及如何在Node.js实现努力学习各种软件 node.js python 爬虫
概述国密即国家密码局认定的国产密码算法。主要有SM1，SM2，SM3，SM4。密钥长度和分组长度均为128位。国密算法是指国家密码管理局认定的一系列国产密码算法，包括SM1-SM9以及ZUC等。其中SM1、SM4、SM5、SM6、SM7、SM8、ZUC等属于对称密码，SM2、SM9等属于公钥密码(非对称加密)SM3属于单向散列函数。目前我国主要使用公开的SM2、SM3、SM4作为商用密码算法。其中
自学黑客（网络安全），一般人我劝你还是算了吧网安周星星 web安全安全 windows 网络网络安全
基于入门网络安全/黑客打造的：黑客&网络安全入门&进阶学习资源包文章讲述了自学网络安全时常见的误区，如先学编程、过度追求深度学习以及收集过多资料，并提供了前期学习的硬件、软件选择建议，强调了基础编程知识和英文能力的重要性。文中给出了详细的学习路线，包括基础操作入门、实战操作以及参加CTF和HVV等竞赛来提升技能，并推荐了一系列相关书籍和学习资源。一、自学网络安全学习的误区和陷阱1.不要试图先成为一
Jackson 注解 -- 输出 JSON 字段 shangboerds Jackson
–Start点击此处观看本系列配套视频。如果一个对象中某个字段中的值是JSON，输出整个对象会有问题，这时我们可以使用注解@JsonRawValuepackageshangbo.jackson.demo13;importorg.apache.commons.lang3.builder.ToStringBuilder;importcom.fasterxml.jackson.annotation.Js
Jackson注释的使用独泪了无痕 Java 技术栈 json java restful
Jackson提供了一系列注解，可以使用这些注解来设置将JSON读入对象的方式或从对象生成什么JSON的方式，下面介绍一些常用的注解。3.1序列化@JsonAnyGetter @JsonAnyGetter注解运行可以灵活的使用Map类型的作为属性字段，允许getter方法返回Map，该Map然后用于以与其他属性类似的方式序列化JSON的其他属性。通过序列化该实体Bean，我们将会得到Map属性中
Golang 基础库之Time包家了叭叭 Golang 1024程序员节
Timegoversion:go1.17.2提供了获取系统时间、时间计算、比较、等一系列操作go语言的诞生时间：2006年1月2号15点04分1.当前时间获取functimeDemo(){now:=time.Now()//当前时间fmt.Println("now:",now)year:=now.Year()month:=now.Month()day:=now.Day()hour:=now.Hour
go+系列【数组、集合基础篇】五岁小孩新之助杂念 redis memcached 数据库
一、GO的代码的数组如果我们想要用GO语言去完成一个数组的初始化并且赋值的话是比较繁琐的，他需要的代码如下：packagemainimport"fmt"funcmain(){a:=[]float64{1,2,3.4}fmt.Println(a)}而我们使用GO+的话就仅仅只这样写：a:=[1,2,3.4]println(a)对比来看一下GO+对比与GO，省去了类型的定义，GO+基本上和GO的代码时
【Java】已解决：java.util.concurrent.ExecutionException 屿小夏 java 开发语言 android
个人简介：某不知名博主，致力于全栈领域的优质博客分享|用最优质的内容带来最舒适的阅读体验！文末获取免费IT学习资料！文末获取更多信息精彩专栏推荐订阅收藏专栏系列直达链接相关介绍书籍分享点我跳转书籍作为获取知识的重要途径，对于IT从业者来说更是不可或缺的资源。不定期更新IT图书，并在评论区抽取随机粉丝，书籍免费包邮到家AI前沿点我跳转探讨人工智能技术领域的最新发展和创新，涵盖机器学习、深度学习、自然
如何利用栈和队列实现高效的计算器与任务管理系统吴师兄大模型数据结构 python 算法栈队列计算器任务管理系统
系列文章目录01-从零开始掌握Python数据结构：提升代码效率的必备技能！02-算法复杂度全解析：时间与空间复杂度优化秘籍03-线性数据结构解密：数组的定义、操作与实际应用04-深入浅出链表：Python实现与应用全面解析05-栈数据结构详解：Python实现与经典应用场景06-深入理解队列数据结构：从定义到Python实现与应用场景07-双端队列（Deque）详解：Python实现与滑动窗口应
【C#】一维、二维、三维数组的使用 wangnaisheng C#c#
在C#中，数组是用于存储固定数量相同类型元素的数据结构。根据维度的不同，可以分为一维数组、二维数组（矩阵阵列）、三维数组等。每增加一个维度，数据的组织方式就会变得更加复杂。一维数组一维数组是最简单的数组形式，它是一个线性集合，包含一系列相同类型的元素。可以通过单个索引来访问每个元素。int[]myArray=newint[5];//创建一个含有5个整数的一维数组存储一系列数据：例如，保存一个班学生
本地部署的DeepSeek-R1-32B与DeepSeek-R1-7B模型效果对比 MaxCode-1 搭建本地gpt Deepseek
本地部署的DeepSeek-R1-32B与DeepSeek-R1-7B模型效果对比在当今人工智能快速发展的时代，大语言模型（LargeLanguageModel,LLM）的应用场景日益广泛。无论是企业级应用还是个人开发，本地部署大语言模型已经成为一种趋势。DeepSeek-R1-32B和DeepSeek-R1-7B作为DeepSeek系列中的两个重要版本，分别代表了不同规模和性能的模型。本文将从多
Android15音频进阶之音频焦点申请(一百零六) Android系统攻城狮 Android Audio工程师进阶系列音视频 Android15 音频进阶智能座舱
简介：CSDN博客专家、《Android系统多媒体进阶实战》一书作者新书发布：《Android系统多媒体进阶实战》优质专栏：Audio工程师进阶系列【原创干货持续更新中……】优质专栏：多媒体系统工程师系列【原创干货持续更新中……】优质视频课程：AAOS车载系统+AOSP14系统攻城狮入门视频实战课人生格言：人生从来没有捷径，只有行动才是治疗恐惧和懒惰的唯一良药.更多原创,欢迎关注：Android系
【ISO 14229-1:2023 UDS诊断全量测试用例清单系列：第十六节】车端域控测试工程师 ISO 14229-1:2023 UDS诊断测试用例全解析测试用例经验分享汽车学习方法学习
ISO14229-1:2023UDS诊断服务测试用例全解析（LinkControl_0x87服务）作者：车端域控测试工程师更新日期：2025年02月14日关键词：UDS协议、0x87服务、链路控制、ISO14229-1:2023、ECU测试一、服务功能概述0x87服务（LinkControl）用于管理ECU与其他设备的诊断通信链路，支持建立（0x01）、验证（0x02）、终止（0x03）三种操作模
【ISO 14229-1:2023 UDS诊断全量测试用例清单系列：第十三节】车端域控测试工程师 ISO 14229-1:2023 UDS诊断测试用例全解析测试用例学习经验分享汽车 CAPL
ISO14229-1:2023UDS诊断服务测试用例全解析（ControlDTCSetting_0x85服务）作者：车端域控测试工程师更新日期：2025年02月14日关键词：UDS协议、0x85服务、DTC设置控制、NRC覆盖、ISO14229-1:2023一、服务功能概述0x85服务（ControlDTCSetting）用于动态控制DTC存储功能的启用/禁用，支持全局控制和按DTC组控制两种模式
【ISO 14229-1:2023 UDS诊断全量测试用例清单系列：第七节】车端域控测试工程师 ISO 14229-1:2023 UDS诊断测试用例全解析测试用例网络汽车学习经验分享测试工具
ISO14229-1:2023UDS诊断服务测试用例全解析（RequestDownload0x34服务）作者：车端域控测试工程师更新日期：2025-02-13关键词：UDS协议、0x34服务、数据下载、ISO14229-1:2023、ECU测试一、服务功能概述0x34服务（RequestDownload）用于初始化从诊断仪到ECU的数据传输流程，定义目标内存地址、数据长度和传输模式，为后续0x36
新手必看——ctf六大题型介绍及六大题型解析&举例解题沛哥网络安全 web安全学习安全 udp 网络协议
CTF（CaptureTheFlag）介绍与六大题型解析一、什么是CTF？CTF（CaptureTheFlag），意为“夺旗赛”，是一种信息安全竞赛形式，广泛应用于网络安全领域。CTF竞赛通过模拟现实中的网络安全攻防战，让参赛者以攻防对抗的形式，利用各种信息安全技术进行解决一系列安全问题，最终获得“旗帜（Flag）”来获得积分。CTF赛事一般分为两种形式：Jeopardy（解题模式）：参赛者通过解
Simulink和CANOE联合仿真概要云纳星辰怀自在仿真测试经验分享
概要该系列重点介绍Simulink和CANOE联合仿真的软件安装、环境配置，以及不同联合仿真的使用场景的建模配置。Simulink与CANOE联合仿真测试系列-软件安装配置
实现用户名字母数字大小写符号（正则表达式） mantangjojo python 正则表达式
正则表达式，又称规则表达式,（RegularExpression，在代码中常简写为regex、regexp或RE），是一种文本模式，包括普通字符（例如，a到z之间的字母）和特殊字符（称为"元字符"），是计算机科学的一个概念。正则表达式使用单个字符串来描述、匹配一系列匹配某个句法规则的字符串，通常被用来检索、替换那些符合某个模式（规则）的文本。特点：1.灵活性、逻辑性和功能性非常强；2.可以迅速地用
基于深度学习YOLOv5的活体人脸检测系统（Python+PySide6界面+训练代码）深度学习&目标检测实战项目深度学习 YOLO python 人工智能目标跟踪计算机视觉开发语言
一、前言随着人工智能技术的快速发展，计算机视觉（ComputerVision）已广泛应用于各种实际场景中，特别是在安全、金融、医疗等领域。人脸识别作为计算机视觉的一个重要应用，已经成为很多身份验证、安防监控、智能门禁等系统的核心技术。近年来，随着深度学习的突破，YOLO（YouOnlyLookOnce）系列算法因其高效、准确、实时的特点，广泛应用于物体检测任务。在实际的人脸识别应用中，活体人脸检测
Spring Boot 的约定优于配置，你的理解是什么？梦城忆 spring boot 后端 java
文章目录一、项目结构约定二、自动配置机制三、默认配置值四、命名约定五、优势六、局限性含义“约定优于配置”指的是在SpringBoot框架中，对于常见的开发场景和需求，框架预先设定了一系列合理的默认配置和开发约定。开发者在大多数情况下只需遵循这些约定，无需手动编写大量繁琐的配置代码，框架就能自动完成相应的配置工作。只有当默认配置无法满足特定需求时，开发者才需要通过显式的配置来覆盖默认设置。一、项目结
STMicroelectronics 系列：STM32F7 系列_（16）.STM32F7系列USB接口 kkchenkx 单片机开发 stm32 android 嵌入式硬件
STM32F7系列USB接口USB接口概述USB（UniversalSerialBus，通用串行总线）是一种广泛应用的接口标准，用于在计算机和各种外设之间进行数据传输。STM32F7系列微控制器集成了一个高度灵活的USB接口，支持USB2.0全速（12Mbps）和高速（480Mbps）通信。该接口不仅支持标准的USB设备和主机模式，还支持OTG（On-The-Go）模式，使得STM32F7可以同时
ztree异步加载 3213213333332132 JavaScript Ajax json Web ztree
相信新手用ztree的时候,对异步加载会有些困惑，我开始的时候也是看了API花了些时间才搞定了异步加载，在这里分享给大家。我后台代码生成的是json格式的数据，数据大家按各自的需求生成，这里只给出前端的代码。设置setting，这里只关注async属性的配置 var setting = { //异步加载配置
thirft rpc 具体调用流程 BlueSkator 中间件 rpc thrift
Thrift调用过程中，Thrift客户端和服务器之间主要用到传输层类、协议层类和处理类三个主要的核心类，这三个类的相互协作共同完成rpc的整个调用过程。在调用过程中将按照以下顺序进行协同工作：（1）将客户端程序调用的函数名和参数传递给协议层（TProtocol），协议
异或运算推导, 交换数据 dcj3sjt126com PHP 异或 ^
/* * 5 0101 * 9 1010 * * 5 ^ 5 * 0101 * 0101 * ----- * 0000 * 得出第一个规律: 相同的数进行异或, 结果是0 * * 9 ^ 5 ^ 6 * 1010 * 0101 * ---- * 1111 * * 1111 * 0110 * ---- * 1001
事件源对象周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
MySql配置及相关命令 g21121 mysql
MySQL安装完毕后我们需要对它进行一些设置及性能优化，主要包括字符集设置，启动设置，连接优化，表优化，分区优化等等。一修改MySQL密码及用户
[简单]poi删除excel 2007超链接 53873039oycg Excel
采用解析sheet.xml方式删除超链接，缺点是要打开文件2次,代码如下: public void removeExcel2007AllHyperLink(String filePath) throws Exception { OPCPackage ocPkg = OPCPac
Struts2添加 open flash chart 云端月影
准备以下开源项目： 1. Struts 2.1.6 2. Open Flash Chart 2 Version 2 Lug Wyrm Charmer (28th, July 2009) 3. jofc2，这东西不知道是没做好还是什么意思，好像和ofc2不怎么匹配，最好下源码，有什么问题直接改。 4. log4j 用eclipse新建动态网站，取名OFC2Demo，将Struts2 l
spring包详解 aijuans spring
下载的spring包中文件及各种包众多，在项目中往往只有部分是我们必须的，如果不清楚什么时候需要什么包的话，看看下面就知道了。 aspectj目录下是在Spring框架下使用aspectj的源代码和测试程序文件。Aspectj是java最早的提供AOP的应用框架。 dist 目录下是Spring 的发布包，关于发布包下面会详细进行说明。 docs&nb
网站推广之seo概念 antonyup_2006 算法 Web 应用服务器搜索引擎 Google
持续开发一年多的b2c网站终于在08年10月23日上线了。作为开发人员的我在修改bug的同时，准备了解下网站的推广分析策略。所谓网站推广，目的在于让尽可能多的潜在用户了解并访问网站，通过网站获得有关产品和服务等信息，为最终形成购买决策提供支持。网站推广策略有很多，seo，email，adv
单例模式,sql注入,序列百合不是茶单例模式序列 sql注入预编译
序列在前面写过有关的博客,也有过总结,但是今天在做一个JDBC操作数据库的相关内容时需要使用序列创建一个自增长的字段居然不会了,所以将序列写在本篇的前面 1,序列是一个保存数据连续的增长的一种方式; 序列的创建; CREATE SEQUENCE seq_pro 2 INCREMENT BY 1 -- 每次加几个 3
Mockito单元测试实例 bijian1013 单元测试 mockito
Mockito单元测试实例： public class SettingServiceTest { private List<PersonDTO> personList = new ArrayList<PersonDTO>(); @InjectMocks private SettingPojoService settin
精通Oracle10编程SQL(9)使用游标 bijian1013 oracle 数据库 plsql
/* *使用游标 */ --显示游标 --在显式游标中使用FETCH...INTO语句 DECLARE CURSOR emp_cursor is select ename,sal from emp where deptno=1; v_ename emp.ename%TYPE; v_sal emp.sal%TYPE; begin ope
【Java语言】动态代理 bit1129 java语言
JDK接口动态代理 JDK自带的动态代理通过动态的根据接口生成字节码(实现接口的一个具体类)的方式，为接口的实现类提供代理。被代理的对象和代理对象通过InvocationHandler建立关联 package com.tom; import com.tom.model.User; import com.tom.service.IUserService;
Java通信之URL通信基础白糖_ java jdk webservice 网络协议 ITeye
java对网络通信以及提供了比较全面的jdk支持，java.net包能让程序员直接在程序中实现网络通信。在技术日新月异的现在，我们能通过很多方式实现数据通信，比如webservice、url通信、socket通信等等，今天简单介绍下URL通信。学习准备：建议首先学习java的IO基础知识 URL是统一资源定位器的简写，URL可以访问Internet和www，可以通过url
博弈Java讲义 - Java线程同步 (1) boyitech java 多线程同步锁
在并发编程中经常会碰到多个执行线程共享资源的问题。例如多个线程同时读写文件，共用数据库连接，全局的计数器等。如果不处理好多线程之间的同步问题很容易引起状态不一致或者其他的错误。同步不仅可以阻止一个线程看到对象处于不一致的状态，它还可以保证进入同步方法或者块的每个线程，都看到由同一锁保护的之前所有的修改结果。处理同步的关键就是要正确的识别临界条件（cri
java-给定字符串，删除开始和结尾处的空格，并将中间的多个连续的空格合并成一个。 bylijinnan java
public class DeleteExtraSpace { /** * 题目：给定字符串，删除开始和结尾处的空格，并将中间的多个连续的空格合并成一个。 * 方法1.用已有的String类的trim和replaceAll方法 * 方法2.全部用正则表达式，这个我不熟 * 方法3.“重新发明轮子”，从头遍历一次 */ public static v
An error has occurred.See the log file错误解决！ Kai_Ge MyEclipse
今天早上打开MyEclipse时，自动关闭！弹出An error has occurred.See the log file错误提示！很郁闷昨天启动和关闭还好着！！！打开几次依然报此错误，确定不是眼花了！打开日志文件！找到当日错误文件内容： --------------------------------------------------------------------------
[矿业与工业]修建一个空间矿床开采站要多少钱? comsci
地球上的钛金属矿藏已经接近枯竭........... 我们在冥王星的一颗卫星上面发现一些具有开采价值的矿床..... 那么,现在要编制一个预算,提交给财政部门..
解析Google Map Routes dai_lm google api
为了获得从A点到B点的路劲，经常会使用Google提供的API，例如 [url] http://maps.googleapis.com/maps/api/directions/json?origin=40.7144,-74.0060&destination=47.6063,-122.3204&sensor=false [/url] 从返回的结果上，大致可以了解应该怎么走，但
SQL还有多少“理所应当”？ datamachine sql
转贴存档，原帖地址：http://blog.chinaunix.net/uid-29242841-id-3968998.html、http://blog.chinaunix.net/uid-29242841-id-3971046.html！ ------------------------------------华丽的分割线--------------------------------
Yii使用Ajax验证时，如何设置某些字段不需要验证 dcj3sjt126com Ajax yii
经常像你注册页面,你可能非常希望只需要Ajax去验证用户名和Email,而不需要使用Ajax再去验证密码,默认如果你使用Yii 内置的ajax验证Form,例如: $form=$this->beginWidget('CActiveForm', array( 'id'=>'usuario-form',&
使用git同步网站代码 dcj3sjt126com crontab git
转自:http://ued.ctrip.com/blog/?p=3646?tn=gongxinjun.com 管理一网站，最开始使用的虚拟空间，采用提供商支持的ftp上传网站文件，后换用vps，vps可以自己搭建ftp的，但是懒得搞，直接使用scp传输文件到服务器，现在需要更新文件到服务器，使用scp真的很烦。发现本人就职的公司，采用的git+rsync的方式来管理、同步代码，遂
sql基本操作蕃薯耀 sql sql基本操作 sql常用操作
sql基本操作 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:30:33 星期一 &
Spring4+Hibernate4+Atomikos3.3多数据源事务管理 hanqunfeng Hibernate4
Spring3+后不再对JTOM提供支持，所以可以改用Atomikos管理多数据源事务。Spring2.5+Hibernate3+JTOM参考：http://hanqunfeng.iteye.com/blog/1554251Atomikos官网网站：http://www.atomikos.com/ 一.pom.xml <dependency> <
jquery中两个值得注意的方法one()和trigger()方法 jackyrong trigger
在jquery中，有两个值得注意但容易忽视的方法，分别是one()方法和trigger()方法,这是从国内作者<<jquery权威指南》一书中看到不错的介绍 1） one方法 one方法的功能是让所选定的元素绑定一个仅触发一次的处理函数，格式为 one(type,${data},fn) &nb
拿工资不仅仅是让你写代码的 lampcy 工作面试咨询
这是我对团队每个新进员工说的第一件事情。这句话的意思是，我并不关心你是如何快速完成任务的，哪怕代码很差，只要它像救生艇通气门一样管用就行。这句话也是我最喜欢的座右铭之一。这个说法其实很合理：我们的工作是思考客户提出的问题，然后制定解决方案。思考第一，代码第二，公司请我们的最终目的不是写代码，而是想出解决方案。话粗理不粗。付你薪水不是让你来思考的，也不是让你来写代码的，你的目的是交付产品
架构师之对象操作----------对象的效率复制和判断是否全为空 nannan408 架构师
1.前言。如题。 2.代码。 (1)对象的复制，比spring的beanCopier在大并发下效率要高，利用net.sf.cglib.beans.BeanCopier Src src=new Src(); BeanCopier beanCopier = BeanCopier.create(Src.class, Des.class, false);
ajax 被缓存的解决方案 Rainbow702 JavaScript jquery Ajax cache 缓存
使用jquery的ajax来发送请求进行局部刷新画面，各位可能都做过。今天碰到一个奇怪的现象，就是，同一个ajax请求，在chrome中，不论发送多少次，都可以发送至服务器端，而不会被缓存。但是，换成在IE下的时候，发现，同一个ajax请求，会发生被缓存的情况，只有第一次才会被发送至服务器端，之后的不会再被发送。郁闷。解决方法如下： ① 直接使用 JQuery提供的 “cache”参数，
修改date.toLocaleString()的警告 tntxia String
我们在写程序的时候，经常要查看时间，所以我们经常会用到date.toLocaleString()，但是date.toLocaleString()是一个过时的API，代替的方法如下： package com.tntxia.htmlmaker.util; import java.text.SimpleDateFormat; import java.util.
项目完成后的小总结 xiaomiya js 总结项目
项目完成了，突然想做个总结但是有点无从下手了。做之前对于客户端给的接口很模式。然而定义好了格式要求就如此的愉快了。先说说项目主要实现的功能吧 1，按键精灵 2，获取行情数据 3，各种input输入条件判断 4，发送数据（有json格式和string格式） 5，获取预警条件列表和预警结果列表， 6，排序， 7，预警结果分页获取 8，导出文件（excel，text等） 9，修

Flink系列之：自定义函数

Flink系列之：自定义函数

一、自定义函数

二、概述

三、开发指南

四、函数类

五、求值方法

六、类型推导

七、自动类型推导

八、定制类型推导

九、确定性

十、内置函数的确定性

十一、运行时集成

十二、标量函数

十三、表值函数

十四、聚合函数

十五、表值聚合函数

你可能感兴趣的:(Flink,Flink系列,自定义函数)