Spark 编写WordCount程序

注:此案例是以yarn的模式进行运行的,所以你需要启动hdfs与yarn集群

1.创建一个Maven项目WordCount并导入依赖


    
        org.apache.spark
        spark-core_2.11
        2.1.1
    



        WordCount
        

net.alchim31.maven
scala-maven-plugin
                3.2.2
                
                    
                       
                          compile
                          testCompile
                       
                    
                 
            
        

 

2)编写代码

args(0):读取文件的路径

args(1):文件输出路径

package com.test

import org.apache.spark.{SparkConf, SparkContext}

object WordCount{

  def main(args: Array[String]): Unit = {


//1.创建SparkConf并设置App名称
    val conf = new SparkConf().setAppName("WC")

//2.创建SparkContext,该对象是提交Spark App的入口
    val sc = new SparkContext(conf)

    //3.使用sc创建RDD并执行相应的transformation和action
    sc.textFile(args(0)).flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_, 1).sortBy(_._2, false).saveAsTextFile(args(1))


//4.关闭连接
    sc.stop()
  }
}

3)打包插件



                org.apache.maven.plugins

                maven-assembly-plugin

                3.0.0

                

                    

                        

                            WordCount

                        

                    

                    

                        jar-with-dependencies

                    

                

                

                    

                        make-assembly

                        package

                        

                            single

                        

                    

                

      

4)打包到集群测试

此命令要在spark的安装的目录下运行

bin/spark-submit \

--class com.test.WordCount \

--master yarn \

WordCount.jar \

/word.txt \

/out

 

你可能感兴趣的:(Spark)