Beam wordCount

  1. clone beam的github仓库
    https://github.com/apache/beam.git

  2. 创建beam 项目

mvn archetype:generate \
      -DarchetypeGroupId=org.apache.beam \
      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -DarchetypeVersion=2.12.0 \
      -DgroupId=org.example \
      -DartifactId=word-count-beam \
      -Dversion="0.1" \
      -Dpackage=org.apache.beam.examples \
      -DinteractiveMode=false

查看生成的WC相关代码

$ cd word-count-beam/

$ ls
pom.xml src

$ ls src/main/java/org/apache/beam/examples/
DebuggingWordCount.java WindowedWordCount.java  common
MinimalWordCount.java   WordCount.java

3.运行wordcount
有几种引擎可以选择,DirectRunner便于在本地调试,主要关注热门的大数据处理引擎Flink和Spark.
DirectRunner:

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
Beam wordCount_第1张图片
wordcount

FlinkRunner:

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=FlinkRunner --inputFile=pom.xml --output=counts" -Pflink-runner

FlinkMaster:

mvn package exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=FlinkRunner --flinkMaster= --filesToStage=target/word-count-beam-bundled-0.1.jar \
                  --inputFile=/path/to/quickstart/pom.xml --output=/tmp/counts" -Pflink-runner

Spark:

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--runner=SparkRunner --inputFile=pom.xml --output=counts" -Pspark-runner

refer:
http://shiyanjun.cn/archives/1567.html

你可能感兴趣的:(Beam wordCount)