idea spark scala maven环境搭建

1.Windows开发环境配置与安装

下载IDEA并安装,可以百度一下免费文档。

2.IDEA Maven工程创建与配置

1)配置maven

idea spark scala maven环境搭建_第1张图片

2)新建Project项目

idea spark scala maven环境搭建_第2张图片

3)选择maven骨架

idea spark scala maven环境搭建_第3张图片

4)创建项目名称

idea spark scala maven环境搭建_第4张图片

5)选择maven地址

idea spark scala maven环境搭建_第5张图片

6)生成maven项目

7)选择scala版本

8)新建Java 和scala目录

9)编辑pom.xml文件

3.开发Spark Application程序并进行本地测试

1)idea编写WordCount程序

package com.spark.test
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
/**
  * Created by z on 2018/4/18.
  */
object test {
  def main(args: Array[String]): Unit = {
    val spark= SparkSession
      .builder
      .appName("HdfsTest")
      .getOrCreate()
    val filePart = "E://stu.txt"
    //     val rdd= spark.sparkContext.textFile(filePart)
  
    //     println(lines)
    import spark.implicits._
    val dataSet= spark.read.textFile(filePart)
      .flatMap(x => x.split(" "))
      .map(x=>(x,1)).groupBy("_1").count()
      .show()
  }
}


结果


18/04/18 23:49:55 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
18/04/18 23:49:55 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
18/04/18 23:49:55 INFO Executor: Finished task 50.0 in stage 9.0 (TID 200). 2540 bytes result sent to driver
18/04/18 23:49:55 INFO TaskSetManager: Finished task 50.0 in stage 9.0 (TID 200) in 4 ms on localhost (executor driver) (75/75)
18/04/18 23:49:55 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool 
18/04/18 23:49:55 INFO DAGScheduler: ResultStage 9 (show at test.scala:22) finished in 0.363 s
18/04/18 23:49:55 INFO DAGScheduler: Job 4 finished: show at test.scala:22, took 0.372613 s
18/04/18 23:49:55 INFO CodeGenerator: Code generated in 5.24118 ms
+--------+-----+
|      _1|count|
+--------+-----+
|     dfd|    2|
|      ha|    3|
|      hh|    3|
|dsfsdfsd|    1|
|  sdfdsf|    1|
+--------+-----+


18/04/18 23:49:55 INFO SparkContext: Invoking stop() from shutdown hook
18/04/18 23:49:55 INFO SparkUI: Stopped Spark web UI at http://192.168.143.1:4040
18/04/18 23:49:55 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/18 23:49:55 INFO MemoryStore: MemoryStore cleared
18/04/18 23:49:55 INFO BlockManager: BlockManager stopped
18/04/18 23:49:55 INFO BlockManagerMaster: BlockManagerMaster stopped
18/04/18 23:49:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/04/18 23:49:55 INFO SparkContext: Successfully stopped SparkContext
18/04/18 23:49:55 INFO ShutdownHookManager: Shutdown hook called
18/04/18 23:49:55 INFO ShutdownHookManager: Deleting directory C:\Users\wangz\AppData\Local\Temp\spark-bce58627-23a7-4a65-8552-ceeb8112fba2


Process finished with exit code 0


4.Spark Application程序打包

1)项目打jar包,参考之前讲过的项目打包方式

2)spark-submit方式提交作业

bin/spark-submit --master local[2] /opt/jars/sparkStu.jarhdfs://bigdata-pro01.kfk.com:9000/user/data/stu.txt

 

你可能感兴趣的:(hadoop)