第一个spark程序:用maven实现WordCount

1.新建一个maven项目

2.填写GroupId和ArtifactId,然后点击Next

3.开启Auto-Import

4.编辑pom.xml



    4.0.0

    com.symsimmy
    sparklearning
    1.0-SNAPSHOT

    
        UTF-8
        2.2.0
        2.11
        2.9.0
    

    
        
            org.apache.spark
            spark-core_${scala.version}
            ${spark.version}
        
        
            org.apache.spark
            spark-sql_${scala.version}
            ${spark.version}
        
        
            org.apache.spark
            spark-streaming_${scala.version}
            ${spark.version}
        
        
            junit
            junit
            4.12
        
    

    
        src/main/scala
        src/test/scala

        
            
                
                org.apache.maven.plugins
                maven-compiler-plugin
                3.7
                
                    1.8
                    1.8
                    UTF-8
                
            
        
    


5.添加scala SDK

这里经过我的验证,在spark-2.2.0版本下,scala-2.12.4会出错,这是一个坑,所以我推荐官方2.2.0文档中出现的2.11.8,实测可以用.
进入Project Structure->Library,点击+号,选择Scala SDK.


6.配置环境

  • src目录下面添加scala目录,并设置为Source Folders
  • test目录下面添加scala目录和resources目录,并分别设置为Source FoldersTest Resource Folders

7.运行spark实例SparkPi

src/scala目录下,右键New一个Scala Class,命名为ScalaPi,类型选择为Object

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

// scalastyle:off println
package scala

import scala.math.random

import org.apache.spark.sql.SparkSession

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]) {
    val spark = SparkSession
      .builder()
      .appName("Spark Pi")
      .getOrCreate() 
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y <= 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / (n - 1))
    spark.stop()
  }
}
// scalastyle:on println

报错,如何解决

17/12/06 10:10:14 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.(SparkContext.scala:376)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at SparkPi$.main(SparkPi.scala:31)
    at SparkPi.main(SparkPi.scala)
17/12/06 10:10:14 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.(SparkContext.scala:376)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at SparkPi$.main(SparkPi.scala:31)
    at SparkPi.main(SparkPi.scala)

Process finished with exit code 1

首先,必须说明,直接运行是一定会出错的,我们知道spark有几种运行模式.如果想本地运行,必须设置为local
我们可以在Edit Configurations中添加VM Options,-Dspark.master=local


  • 点击ok,之后就正确运行了


8.设置环境,打jar包

  • 设置Artifacts,选择From modules with dependencies...

  • 选择Main ClassSparkPi

  • 最后为这样,可以自定义生成的jar包的位置,点击ok即可


  • 点击build->build Artifacts

  • 在屏幕中间出现该菜单,点击Build

  • build成功后,会在out目录下面生成一个jar包

使用spark-shell运行生成的jar包

参考文章

  • 基于IntelliJ IDEA开发Spark的Maven项目——Scala语言

你可能感兴趣的:(第一个spark程序:用maven实现WordCount)