本文讲述如何使用flink的scala-shell终端程序,通过该脚本可以快速上手flink,并可以对简单的flink任务进行调试和测试。类似于spark的shell终端。
本文讲述的是本地模式下的使用。
Flink附带了一个集成的交互式Scala Shell。它可以在本地模式和群集模式中使用。
要将shell与集成的Flink集群一起使用,只需执行:
bin/start-scala-shell.sh local
注意:该命令集成了flink的执行环境,所以不需要启动flink集群。
shell支持Batch和Streaming。启动后会自动预先绑定两个不同的执行环境。可以使用"benv"和"senv"变量来分别访问Batch和Streaming环境。
启动scala-shell终端,在终端中输入以下命令:
scala> val text = benv.fromElements(
| "To be, or not to be,--that is the question:--",
| "Whether 'tis nobler in the mind to suffer",
| "The slings and arrows of outrageous fortune",
| "Or to take arms against a sea of troubles,")
text: org.apache.flink.api.scala.DataSet[String] = org.apache.flink.api.scala.DataSet@479f738a
scala> val counts = text
counts: org.apache.flink.api.scala.DataSet[String] = org.apache.flink.api.scala.DataSet@479f738a
scala> val counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
counts: org.apache.flink.api.scala.AggregateDataSet[(String, Int)] = org.apache.flink.api.scala.AggregateDataSet@44f4c619
scala> counts.print()
(a,1)
(against,1)
(and,1)
(arms,1)
(arrows,1)
(be,2)
(fortune,1)
(in,1)
(is,1)
(mind,1)
(nobler,1)
(not,1)
(of,2)
(or,2)
(outrageous,1)
(question,1)
(sea,1)
(slings,1)
(suffer,1)
(take,1)
(that,1)
(the,3)
(tis,1)
(to,4)
(troubles,1)
(whether,1)
print() 命令会自动将指定的任务发送到JobManager执行,并在终端中显示计算结果。
也可以将结果写入文件。但是,在这种情况下,就需要调用execute,来运行您的程序:
Scala-Flink> benv.execute("MyProgram")
注意:只有在local模式下才会把输出打印到终端,若是集群模式,将不会打印到终端。
scala> val textStreaming = senv.fromElements(
| "To be, or not to be,--that is the question:--",
| "Whether 'tis nobler in the mind to suffer",
| "The slings and arrows of outrageous fortune",
| "Or to take arms against a sea of troubles,")
textStreaming: org.apache.flink.streaming.api.scala.DataStream[String] = org.apache.flink.streaming.api.scala.DataStream@22717282
scala> val countsStreaming = textStreaming .flatMap { _.toLowerCase.split("\\W+") } .map { (_, 1) }.keyBy(0).sum(1)
countsStreaming: org.apache.flink.streaming.api.scala.DataStream[(String, Int)] = org.apache.flink.streaming.api.scala.DataStream@4daa4a5a
scala> countsStreaming.print()
res7: org.apache.flink.streaming.api.datastream.DataStreamSink[(String, Int)] = org.apache.flink.streaming.api.datastream.DataStreamSink@7d957c96
scala> senv.execute("Streaming Wordcount")
(to,1)
(be,1)
(or,1)
(not,1)
(to,2)
(be,2)
(that,1)
(is,1)
(the,1)
(question,1)
(whether,1)
(tis,1)
(nobler,1)
(in,1)
(the,2)
(mind,1)
(to,3)
(suffer,1)
(the,3)
(slings,1)
(and,1)
(arrows,1)
(of,1)
(outrageous,1)
(fortune,1)
(or,2)
(to,4)
(take,1)
(arms,1)
(against,1)
(a,1)
(sea,1)
(of,2)
(troubles,1)
res8: org.apache.flink.api.common.JobExecutionResult = org.apache.flink.api.common.JobExecutionResult@7c950d4f
注意:在Streaming环境下,打印操作不会直接触发执行。
bin/start-scala-shell.sh --help
通过以上帮助信息可以知道,该命令支持remote和yarn集群模式,并且可以通过该命令来添加外部依赖的jar包。