sbt打包WordCount的过程

1.在安装有spark的客户端上安装sbt(请参考上一篇博客)

2.在一台slave终端机的/home/hadoop1/xuguokun/下创建WordCount文件夹。

3.创建build.sbt文件,该文件的里面的内容如下:

name := "o2o-spark"

version := "0.1"

scalaVersion := "2.10.4"

libraryDependencies ++= Seq(
  "org.scalanlp" % "chalk" % "1.3.0",
  "org.apache.spark" %% "spark-core" % "1.3.1",
  "org.apache.spark" %% "spark-mllib" % "1.3.1",
  "org.apache.spark" % "spark-streaming_2.10" % "1.3.1",
  "org.apache.spark" %% "spark-streaming-kafka" % "1.1.0",
  "org.apache.hadoop" % "hadoop-client" % "2.2.0",
  "org.apache.hadoop" % "hadoop-common" % "2.2.0",
  "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0",
  "com.github.scopt" %% "scopt" % "3.3.0",
  "org.apache.spark" %% "spark-sql" % "1.5.1",
  "org.apache.spark" %% "spark-hive" % "1.5.1",
  "org.apache.hbase" % "hbase" % "0.94.18"
)

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

4.在/home/hadoop1/xuguokun/目录下用vi编辑WordCount.scala代码文件,代码的内容如下:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._

object WordCount {

  def main(args: Array[String]): Unit = {
    
    val conf = new SparkConf().setAppName("WordCount").setMaster("spark://10.10.16.251:7077")    
    
    val sc = new SparkContext(conf)
    val line = sc.textFile("/home/hadoop1/xuguokun/words.txt")
    
    line.flatMap(_.split(",")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)

    sc.stop()
  }

}

5.在 /home/hadoop1/xuguokun/WordCount目录下执行如下命令:

../sbt/sbt package,见到如下信息,说明打包成功:

6.往hdfs上传输words.txt文件

hadoop fs -put /home/hadoop1/xuguokun/words.txt /data

查看words.txt是否已经传到hdfs上的相应目录下

./hadoop fs -ls /data


7.将sbt打包成的jar发布到spark集群上面

./spark-submit  --class WordCount  --master spark:10.10.16.251:7077 --executor-memory 2g --num-executors 3  /home/hadoop1/xuguokun/WordCount/target/scala-2.10/o2o-spark_2.10-0.1.jar

8.运行的结果是:


注意:words.txt中的内容如下


8.WordCont调试成功

你可能感兴趣的:(sbt打包WordCount的过程)