Flink物理分区

Flink Physical partitioning(物理分区)

Rebalancing (Round-robin partitioning) 默认策略

轮询,会将数据轮询发送给下游任务

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.rebalance
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.printToErr("测试")

fsEnv.execute("FlinkWordCounts")
Random partitioning

随机将数据发送给下游

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.shuffle
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.printToErr("测试")

fsEnv.execute("FlinkWordCounts")
Rescaling

上游分区的数据 会 轮询方式发送给下游的子分区,上下游任务并行度呈现整数倍
Flink物理分区_第1张图片

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.setParallelism(4)
.rescale
.map(word=>(word,1))
.setParallelism(2)
.print("测试")
.setParallelism(2)
fsEnv.execute("FlinkWordCounts")
Broadcasting

将上游数据广播给下游所有分区。

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.broadcast
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.print("测试")
fsEnv.execute("FlinkWordCounts")
Custom partitioning

自定义分区

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.socketTextStream("HadoopNode00",9999)
.flatMap(line=>line.split("\\s+"))
.map(word=>(word,1))
.partitionCustom(new Partitioner[String] {
    override def partition(key: String, numPartitions: Int): Int = {
        //保证是正整数 key.hashCode&Integer.MAX_VALUE
        (key.hashCode&Integer.MAX_VALUE)%numPartitions
    }
},t=>t._1)

.print("测试")
fsEnv.execute("Custom Partitions")

你可能感兴趣的:(flink,大数据)