https://www.jianshu.com/p/a3f43f861a42?utm_source=oschina-app
这篇文章是但不仅仅是官方文档的中文翻译,还有里面每一个方法对应的Transformation和运行时对Task的影响。
Map
DataStream dataStream = //...
dataStream.map(new MapFunction() {
@Override
public Integer map(Integer value) throws Exception {
return 2 * value;
}
});
Transformation: 生成一个OneInputTransformation并包含StreamMap算子
StreamMapTransformation
Runtime:
StreamMapTask
FlatMap
dataStream.flatMap(new FlatMapFunction() {
@Override
public void flatMap(String value, Collector out)
throws Exception {
for(String word: value.split(" ")){
out.collect(word);
}
}
});
Transformation: 生成一个OneInputTransformation并包含StreamFlatMap算子
StreamFlatMapTransformation
Runtime:
StreamFlatMapTask
Filter
dataStream.filter(new FilterFunction() {
@Override
public boolean filter(Integer value) throws Exception {
return value != 0;
}
});
Transformation:生成一个OneInputTransformation并包含StreamFilter算子
StreamFilterTransformation
Runtime:
StreamFilterTask
KeyBy
dataStream.keyBy("someKey") // Key by field "someKey"
dataStream.keyBy(0) // Key by the first element of a Tuple
Transformation: KeyBy会产生一个PartitionTransformation,并且通过KeySelector创建一个KeyGroupStreamPartitioner,目的是将输出的数据分区。此外还会把KeySelector保存到KeyedStream的属性中,在下一个Transformation创建时时将KeySelector注入进去。
KeyByTransformation
Runtime: 生成StreamGraph时会将PartitionTransformation中的Partitioner 注入到StreamEdge当中,此外还会在下一个StreamNode创建过程中注入KeySelector用于提取元素的Key。之后将Partitioner注入StreamRecordWriter中用于将上一个Task的输出元素指定到某一个ResultSubParition中,此外KeySelector也被注入到下一个Task的算子当中。
KeyBy Runtime
WindowAll
dataStream.windowAll(TumblingEventTimeWindows.of(Time.seconds(5))); // Last 5 seconds of data
Union
dataStream.union(otherStream1, otherStream2, ...);
Transformation: 从所有相关的stream中获取Transformation并注入到UnionTransformation的inputs中
UnionTransformation
Join
dataStream.join(otherStream)
.where().equalTo()
.window(TumblingEventTimeWindows.of(Time.seconds(3)))
.apply (new JoinFunction () {...});
Window CoGroup
dataStream.coGroup(otherStream)
.where(0).equalTo(1)
.window(TumblingEventTimeWindows.of(Time.seconds(3)))
.apply (new CoGroupFunction () {...});
Transformation:生成一个TaggedUnion类型和unionKeySelector,里面分别包含了两个流的元素类型和两个流的KeySelector。将两个流通过map分别输出为类型是TaggedUnion的两个流(map详情见StreamMap),再Union在一起(详情见Union),再使用合并过后的流和unionKeySelector生成一个KeyedStream(详情见KeyBy),最后使用KeyedStream的window方法并传入WindowAssigner生成WindowedStream,并apply CoGroupFunction来处理(详情见WindowedStream Apply方法)。总体来说,Flink对这个方法做了很多内部的转换,最后生成了两个StreamMapTransformation,一个PartitionTransformation和一个包含了WindowOperator的OneInputTransformation。
CoGroupTransformation
Runtime:参考每个Transformation对应的Runtime情况
Connect
DataStream someStream = //...
DataStream otherStream = //...
ConnectedStreams connectedStreams = someStream.connect(otherStream);
Split
SplitStream split = someDataStream.split(new OutputSelector() {
@Override
public Iterable select(Integer value) {
List output = new ArrayList();
if (value % 2 == 0) {
output.add("even");
}
else {
output.add("odd");
}
return output;
}
});
Transformation:在这一步会生成一个SplitTransformation,里面包含了OutputSelector。
SplitTransformation
Runtime: 在生成StreamGraph时找到父Transformation,并将OutputSelector注入到父StreamNode中。生成JobGraph的时候在注入到对应的JobNode中,最后在运行时封装到OperatorChain的OutputCollector中并且注入算子。
SplitRuntime
Iterate
IterativeStream iteration = initialStream.iterate();
DataStream iterationBody = iteration.map (/*do something*/);
DataStream feedback = iterationBody.filter(new FilterFunction(){
@Override
public boolean filter(Integer value) throws Exception {
return value > 0;
}
});
iteration.closeWith(feedback);
DataStream output = iterationBody.filter(new FilterFunction(){
@Override
public boolean filter(Integer value) throws Exception {
return value <= 0;
}
});
ExtractTimestamps
stream.assignTimestamps (new TimeStampExtractor() {...});
Transformation:assignTimestamps会将TimeStampExtractor注入进刚创建的ExtractTimestampsOperator,再通过ExtractTimestampsOperator生成一个OneInputTransformation
ExtractTimestampsTransformation
Runtime:
ExtractTimestampsTask
Project
DataStream> in = // [...]
DataStream> out = in.project(2,0);
Transformation:生成一个OneInputTransformation并包含StreamProjection算子
StreamProjectionTransformation
Runtime
StreamProjectionTask
Custom partitioning
dataStream.partitionCustom(partitioner, "someKey");
dataStream.partitionCustom(partitioner, 0);
Transformation:partitionCustom类似于KeyBy,不过partitioner是由自己定制并且输出的不是KeyedStream。首先会通过KeySelector和用户实现的Partitioner生成一个CustomPartitionerWrapper(StreamPartitioner),再讲它注入到PartitionTransformation。
CustomPartitioningTransformation
Runtime:将Partitioner注入StreamRecordWriter中用于将上一个Task的输出元素指定到某一个ResultSubParition中
CustomPartitioningTask
Random partitioning
dataStream.shuffle();
Rebalancing (Round-robin partitioning)
dataStream.rebalance();
Rescaling
rescale.png
dataStream.rescale();
Broadcasting
dataStream.broadcast();
Reduce
keyedStream.reduce(new ReduceFunction() {
@Override
public Integer reduce(Integer value1, Integer value2)
throws Exception {
return value1 + value2;
}
});
Transformation:生成一个OneInputTransformation并包含StreamGroupedReduce算子
KeyedReduceTransformation
Runtime:
KeyedReduceTask
Fold
DataStream result =
keyedStream.fold("start", new FoldFunction() {
@Override
public String fold(String current, Integer value) {
return current + "-" + value;
}
});
Aggregations
keyedStream.sum(0);
keyedStream.sum("key");
keyedStream.min(0);
keyedStream.min("key");
keyedStream.max(0);
keyedStream.max("key");
keyedStream.minBy(0);
keyedStream.minBy("key");
keyedStream.maxBy(0);
keyedStream.maxBy("key");
Window
dataStream.window(TumblingEventTimeWindows.of(Time.seconds(5))); // Last 5 seconds of data
Interval Join
// this will join the two streams so that
// key1 == key2 && leftTs - 2 < rightTs < leftTs + 2
keyedStream.intervalJoin(otherKeyedStream)
.between(Time.milliseconds(-2), Time.milliseconds(2)) // lower and upper bound
.upperBoundExclusive(true) // optional
.lowerBoundExclusive(true) // optional
.process(new IntervalJoinFunction() {...});
Apply
windowedStream.apply (new WindowFunction, Integer, Tuple, Window>() {
public void apply (Tuple tuple,
Window window,
Iterable> values,
Collector out) throws Exception {
int sum = 0;
for (value t: values) {
sum += t.f1;
}
out.collect (new Integer(sum));
}
});
Transformation:
WindowApplyTransformation
Runtime:
WindowApply Task
Reduce
windowedStream.reduce (new ReduceFunction>() {
public Tuple2 reduce(Tuple2 value1, Tuple2 value2) throws Exception {
return new Tuple2(value1.f0, value1.f1 + value2.f1);
}
});
Aggregations
windowedStream.sum(0);
windowedStream.sum("key");
windowedStream.min(0);
windowedStream.min("key");
windowedStream.max(0);
windowedStream.max("key");
windowedStream.minBy(0);
windowedStream.minBy("key");
windowedStream.maxBy(0);
windowedStream.maxBy("key");
Apply
// applying an AllWindowFunction on non-keyed window stream
allWindowedStream.apply (new AllWindowFunction, Integer, Window>() {
public void apply (Window window,
Iterable> values,
Collector out) throws Exception {
int sum = 0;
for (value t: values) {
sum += t.f1;
}
out.collect (new Integer(sum));
}
});
CoMap, CoFlatMap
connectedStreams.map(new CoMapFunction() {
@Override
public Boolean map1(Integer value) {
return true;
}
@Override
public Boolean map2(String value) {
return false;
}
});
connectedStreams.flatMap(new CoFlatMapFunction() {
@Override
public void flatMap1(Integer value, Collector out) {
out.collect(value.toString());
}
@Override
public void flatMap2(String value, Collector out) {
for (String word: value.split(" ")) {
out.collect(word);
}
}
});
Transformation:ConnectedStream并不会产生Transformation,只会保存两个Input DataStream,从inputs中的DataStream获取父Transformation,并生成一个CoStream(Flat)Map算子。KeySelector依赖于父Transformation注入(如果是PartitionTransformation的话)。
Co(Flat)MapTransformation
Runtime: Task会具体负责调用processElement1方法还是processElement2方法。
CoStream(Flat)MapTask
Select
SplitStream split;
DataStream even = split.select("even");
DataStream odd = split.select("odd");
DataStream all = split.select("even","odd");
Transformation:生成SelectTransformation,里面包含了OutputSelector
SelectTransformation
Runtime:生成StreamGraph时会将OutputNames注入到新生成的StreamEdge中,然后注入到对应的JobEdge中,最后用它来生成OutputCollector中的outputMap,发送消息时根据相应的selectedName发送到相应的下游Task
Select Runtime
作者:铛铛铛clark
链接:https://www.jianshu.com/p/a3f43f861a42
來源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。