flink实战—使用shell终端(local模式)

概述

本文讲述如何使用flink的scala-shell终端程序,通过该脚本可以快速上手flink,并可以对简单的flink任务进行调试和测试。类似于spark的shell终端。

本文讲述的是本地模式下的使用。

启动flink的scala-shell

Flink附带了一个集成的交互式Scala Shell。它可以在本地模式和群集模式中使用。

要将shell与集成的Flink集群一起使用,只需执行:

bin/start-scala-shell.sh local

注意:该命令集成了flink的执行环境,所以不需要启动flink集群。

scala-shell集成环境说明

shell支持Batch和Streaming。启动后会自动预先绑定两个不同的执行环境。可以使用"benv"和"senv"变量来分别访问Batch和Streaming环境。

使用Batch环境

在scala shell中执行wordcount

启动scala-shell终端,在终端中输入以下命令:

scala> val text = benv.fromElements(
     |   "To be, or not to be,--that is the question:--",
     |   "Whether 'tis nobler in the mind to suffer",
     |   "The slings and arrows of outrageous fortune",
     |   "Or to take arms against a sea of troubles,")
text: org.apache.flink.api.scala.DataSet[String] = org.apache.flink.api.scala.DataSet@479f738a

scala> val counts = text
counts: org.apache.flink.api.scala.DataSet[String] = org.apache.flink.api.scala.DataSet@479f738a

scala> val counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
counts: org.apache.flink.api.scala.AggregateDataSet[(String, Int)] = org.apache.flink.api.scala.AggregateDataSet@44f4c619

scala> counts.print()
(a,1)
(against,1)
(and,1)
(arms,1)
(arrows,1)
(be,2)
(fortune,1)
(in,1)
(is,1)
(mind,1)
(nobler,1)
(not,1)
(of,2)
(or,2)
(outrageous,1)
(question,1)
(sea,1)
(slings,1)
(suffer,1)
(take,1)
(that,1)
(the,3)
(tis,1)
(to,4)
(troubles,1)
(whether,1)

print() 命令会自动将指定的任务发送到JobManager执行,并在终端中显示计算结果。
也可以将结果写入文件。但是,在这种情况下,就需要调用execute,来运行您的程序:

Scala-Flink> benv.execute("MyProgram")

注意:只有在local模式下才会把输出打印到终端,若是集群模式,将不会打印到终端。

使用Streaming环境

在scala shell中通过DataStream API来计算wordcount

scala> val textStreaming = senv.fromElements(
     |   "To be, or not to be,--that is the question:--",
     |   "Whether 'tis nobler in the mind to suffer",
     |   "The slings and arrows of outrageous fortune",
     |   "Or to take arms against a sea of troubles,")
textStreaming: org.apache.flink.streaming.api.scala.DataStream[String] = org.apache.flink.streaming.api.scala.DataStream@22717282

scala> val countsStreaming = textStreaming .flatMap { _.toLowerCase.split("\\W+") } .map { (_, 1) }.keyBy(0).sum(1)
countsStreaming: org.apache.flink.streaming.api.scala.DataStream[(String, Int)] = org.apache.flink.streaming.api.scala.DataStream@4daa4a5a

scala> countsStreaming.print()
res7: org.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)] = org.apache.flink.streaming.api.datastream.DataStreamSink@7d957c96

scala> senv.execute("Streaming Wordcount")
(to,1)
(be,1)
(or,1)
(not,1)
(to,2)
(be,2)
(that,1)
(is,1)
(the,1)
(question,1)
(whether,1)
(tis,1)
(nobler,1)
(in,1)
(the,2)
(mind,1)
(to,3)
(suffer,1)
(the,3)
(slings,1)
(and,1)
(arrows,1)
(of,1)
(outrageous,1)
(fortune,1)
(or,2)
(to,4)
(take,1)
(arms,1)
(against,1)
(a,1)
(sea,1)
(of,2)
(troubles,1)
res8: org.apache.flink.api.common.JobExecutionResult = org.apache.flink.api.common.JobExecutionResult@7c950d4f

注意:在Streaming环境下,打印操作不会直接触发执行。

了解如何使用start-scala-shell.sh

bin/start-scala-shell.sh --help

通过以上帮助信息可以知道,该命令支持remote和yarn集群模式,并且可以通过该命令来添加外部依赖的jar包。

参考资料

  • 官方Scala REPL的说明

你可能感兴趣的:(flink,flink实战)