TableApi & sql的特点:
Table API自身的特点:
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = TableEnvironment.getTableEnvironment(env);
DataSet input = env.fromElements(
new WC("hello", 1),
new WC("word", 2),
new WC("mvp", 1)
);
Table table = tEnv.fromDataSet(input);//dataSet 转为 Table
Table filtered = table.groupBy("word").select("word,frequency.sum as frequency").filter("frequency=2");
DataSet result = tEnv.toDataSet(filtered, WC.class);
result.print();
1.Table descriptor
tEnv.connect().withFormat().withSchema().registerTableSource("");
2.Table Source
TableSource csvSource = new CsvTableSource("/path/to/file", ...);
tableEnv.registerTableSource("CsvTable", csvSource);
3.DateStream或 DataSet
DataStream> stream = ...
tableEnv.registerDataStream("myTable", stream);
1.Table descriptor
tEnv.connect().withFormat().withSchema().registerTableSink("");
2.TableSink
TableSink csvSink = new CsvTableSink("/path/to/file", ...);
// define the field names and types
String[] fieldNames = {"a", "b", "c"};
TypeInformation[] fieldTypes = {Types.INT, Types.STRING, Types.LONG};
// register the TableSink as table "CsvSinkTable"
tableEnv.registerTableSink("CsvSinkTable", fieldNames, fieldTypes, csvSink);
通过调用resultTable.insertinto(“targettable”);方式将数据写入表
Table APi 查询举例:
Table revenue = orders
.filter("cCountry === 'FRANCE'")
.groupBy("cID, cName")
.select("cID, cName, revenue.sum AS revSum");
sql查询举例:
Table revenue = tableEnv.sqlQuery(
"SELECT cID, cName, SUM(revenue) AS revSum " +
"FROM Orders " +
"WHERE cCountry = 'FRANCE' " +
"GROUP BY cID, cName"
);
Table类型转换为DataStream或着是DataSet
//Row类型
DataStream dsRow = tableEnv.toAppendStream(table, Row.class);
//TupleType
TupleTypeInfo> tupleType = new TupleTypeInfo<>(Types.STRING(),Types.INT());
DataStream> dsTuple = tableEnv.toAppendStream(table, tupleType);
//转为RetractStream
DataStream> retractStream = tableEnv.toRetractStream(table, Row.class);
DataSet dsRow = tableEnv.toDataSet(table, Row.class);
TupleTypeInfo> tupleType = new TupleTypeInfo<>(Types.STRING(),Types.INT());
DataSet> dsTuple = tableEnv.toDataSet(table, tupleType);
Table的方法调用类似DataSteam也是链式调用,返回的类型也是Table类型
groupBy之后会生成一个GroupTable,调用select后又会转为Table类型,各个函数的作用和sql标准的作用一样
udf特点:输入输出1:1
//继承ScalarFunction,类名就是函数名
public class HashCode extends ScalarFunction {
private int factor = 12;
public HashCode(int factor) {
this.factor = factor;
}
public int eval(String s) {
return s.hashCode() * factor;
}
}
BatchTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
//注册函数
tableEnv.registerFunction("hashCode", new HashCode(10));
// Table API Java 用法
myTable.select("string, string.hashCode(), hashCode(string)");
// SQL API 用法
tableEnv.sqlQuery("SELECT string, HASHCODE(string) FROM MyTable");
udtf的特点是输入输出1:n
//继承TableFunction,实现eval函数
public class Split extends TableFunction> {
private String separator = " ";
public Split(String separator) {
this.separator = separator;
}
public void eval(String str) {
for (String s : str.split(separator)) {
// use collect(...) to emit a row
collect(new Tuple2(s, s.length()));
}
}
}
udaf的特点是:输入输出n:1,继承自AggregateFunction并实现各个方法