Flink系列---Flink批处理WordCount(Java、Scala版本)

Flink系列---Flink批处理WordCount(Java、Scala版本)

 

一、配置Linux中的Flink HA集群

配置Flink HA集群

二、配置IDEA中Flink的环境

需要有Scala插件,百度一下即可。

Flink系列---Flink批处理WordCount(Java、Scala版本)_第1张图片

Flink系列---Flink批处理WordCount(Java、Scala版本)_第2张图片

三、代码级别

pom.xml



    4.0.0

    com.kevin
    DTFlink
    1.0-SNAPSHOT

    
        UTF-8
        1.7.2
        1.8
        2.11
        1.16.0
        ${java.version}
        ${java.version}
    

    
        
            org.apache.flink
            flink-java
            ${flink.version}
        

        
            org.apache.flink
            flink-streaming-java_2.11
            ${flink.version}
        

        
            org.apache.flink
            flink-streaming-scala_2.11
            ${flink.version}
        

        
            org.apache.flink
            flink-scala_2.11
            ${flink.version}
        

        
            org.apache.flink
            flink-clients_2.11
            ${flink.version}
        

        
            org.slf4j
            slf4j-api
            1.7.25
        

        
            org.slf4j
            slf4j-log4j12
            1.7.25
        

    

    
        
            
            
                org.apache.maven.plugins
                maven-compiler-plugin
                3.6.0
                
                    1.8
                    1.8
                    UTF-8
                
            
            
            
                net.alchim31.maven
                scala-maven-plugin
                3.1.6
                
                    2.11
                    2.11.12
                    UTF-8
                
                
                    
                        compile-scala
                        compile
                        
                            add-source
                            compile
                        
                    
                    
                        test-compile-scala
                        test-compile
                        
                            add-source
                            testCompile
                        
                    
                
            
            
            
                org.apache.maven.plugins
                maven-assembly-plugin
                2.6
                
                    
                        jar-with-dependencies
                    
                    
                        
                            
                            com.kevin.batch.BatchWordCountScala
                        
                    
                
                
                    
                        make-assembly
                        package
                        
                            single
                        
                    
                
            
        
    

log4j.properties

log4j.rootLogger=info,kevin

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} [%t] [%c] [%p] - %m%n

Scala代码

package com.kevin.batch

import org.apache.flink.api.scala.ExecutionEnvironment
import org.apache.flink.api.scala._

/**
 * @author caonanqing
 * @version 1.0
 * @description   批处理单词计数
 * @createDate 2020/2/27
 */
object BatchWordCountScala {

  def main(args: Array[String]): Unit = {

    // 1.获取运行环境
    val env = ExecutionEnvironment.getExecutionEnvironment
    // 2.创建数据集
    val text = env.fromElements("java java scala","scala java python")
    // 3.flatMap将数据转成大写并以空格进行分割,且过滤掉空
    // map进行单词计数,groupBy归纳相同的key,sum将value相加
    val counts = text.flatMap{ _.toLowerCase.split(" ") filter { _.nonEmpty }}
      .map{ (_,1)}
      .groupBy(0)
      .sum(1)

    // 4.打印
    counts.print()

  }

}

Flink系列---Flink批处理WordCount(Java、Scala版本)_第3张图片

Java代码

package com.kevin.batch;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector;

/**
 * @author caonanqing
 * @version 1.0
 * @description
 * @createDate 2020/2/27
 */
public class BatchWordCount {

    public static void main(String[] args) throws Exception {

        // 1.获取运行环境
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        // 2.创建数据集
        DataSet text = env.fromElements("java java scala", "scala java python");
        // 3.flatMap将数据转成大写并以空格进行分割
        // groupBy归纳相同的key,sum将value相加
        DataSet> counts = text.flatMap(new FlatMapFunction>() {
            @Override
            public void flatMap(String s, Collector> out) throws Exception {
                String[] value = s.toLowerCase().split(" ");
                for (String word : value) {
                    out.collect(new Tuple2(word, 1));
                }
            }
        })
        .groupBy(0)
        .sum(1);

        // 4.打印
        counts.print();

    }
}

Flink系列---Flink批处理WordCount(Java、Scala版本)_第4张图片

 

你可能感兴趣的:(大数据,flink)