Flink一:初识Flinlk,并与Spark做对比!

初识Flink,flink stream是无边界的数据,咱们用一个例子,对比下Flink与Spark的差异。

Flink是基于的,且Event是独立的,操作、算子都是基于单个的Event的;

Spark是基于RDD的,操作、算子都是基于集合实现的,这是Spark与Flink最本质的差别。

1:Spark WordCount例子

import org.apache.spark.{SparkConf, SparkContext}

object SparkWordCount {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("SparkWordCount Demo, compare to Flink !!!").setMaster("local")
    val sc = new SparkContext(conf)
    val text = sc.makeRDD(
      List(
        "import org apache flink streaming api scala"
        , "import org apache flink streaming api windowing time Time"
      ))
    println("default NumPartitions : " + text.getNumPartitions)
    val wordCounts = text.flatMap(line => line.split(" ")).map(word => (word.toLowerCase, 1)).reduceByKey((a, b) => a + b)
    println("wordCounts NumPartitions : " + wordCounts.getNumPartitions)
    wordCounts.foreach(println(_))
  }
}

输出如下:从输出,可以看到Spark将输入一次性全部计算完成。

(scala,1)
(import,2)
(flink,2)
(apache,2)
(org,2)
(windowing,1)
(streaming,2)
(time,2)
(api,2)

2:看下Flink WordCount

import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time

object FlinkNoTime extends App {
  val env = StreamExecutionEnvironment.getExecutionEnvironment
  val text = env.fromElements(
    "import org apache flink streaming api scala"
    , "import org apache flink streaming api window ing time Time"
  )

  val counts = text
    .flatMap {
      _.toLowerCase.split("\\W+") filter {
        _.nonEmpty
      }
    }
    .map((_, 1))
    .keyBy(0)  group by the tuple field "0" and sum up tuple field "1"
    //.timeWindow(Time.seconds(5))  //加上这行将没数据,因为没有指定时间
    .sum(1)

  counts.print()
  env.execute("Window Stream WordCount")
}

输出如下:我们看到每一个单词,都是由1累加至2。说明Flink把每个单词,都当成独立的Event,Event逐个进入到Flink系统,所以WordCount慢慢累加起来。

那有没办法,让Flink计算输出类似Spark这样了?注意:我们是将flink stream与spark在做比较flink batch此处不考虑。

当然有,给Flink加上Window操作,将所有Event聚集在一个窗口,再实施聚合操作即可。

6> (windowing,1)
3> (org,1)
3> (org,2)
2> (streaming,1)
4> (import,1)
2> (streaming,2)
7> (apache,1)
7> (flink,1)
1> (api,1)
1> (api,2)
1> (scala,1)
5> (time,1)
5> (time,2)
4> (import,2)
7> (apache,2)
7> (flink,2)

 

3:Flink WordCount Like Spark批计算

import org.apache.flink.streaming.api.TimeCharacteristic
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time

object FlinkLikeSpark extends App{
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)
    //没有process逻辑,所以此处设置ProcessingTime时,不会有数据数据,暂时只能用IngestionTime
//    env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)

    val text = env.fromElements(
      "import org apache flink streaming api scala"
      , "import org apache flink streaming api windowing time Time"
    )

    val counts = text
      .flatMap {_.toLowerCase.split("\\W+") filter {_.nonEmpty}}
      .map((_, 1))
      .keyBy(0)
      .timeWindow(Time.seconds(5)) //添加一个窗口操作,即可实现聚合
      .sum(1)
      .print()

    env.execute("Window Stream WordCount")
}

输出如下:观察一下,跟Spark批计算输出相似了,说明我的猜想是正确的。

7> (apache,2)
7> (flink,2)
6> (windowing,1)
4> (import,2)
3> (org,2)
5> (time,2)
1> (api,2)
1> (scala,1)
2> (streaming,2)

 

你可能感兴趣的:(Flink)