1.在安装有spark的客户端上安装sbt(请参考上一篇博客)
2.在一台slave终端机的/home/hadoop1/xuguokun/下创建WordCount文件夹。
3.创建build.sbt文件,该文件的里面的内容如下:
name := "o2o-spark" version := "0.1" scalaVersion := "2.10.4" libraryDependencies ++= Seq( "org.scalanlp" % "chalk" % "1.3.0", "org.apache.spark" %% "spark-core" % "1.3.1", "org.apache.spark" %% "spark-mllib" % "1.3.1", "org.apache.spark" % "spark-streaming_2.10" % "1.3.1", "org.apache.spark" %% "spark-streaming-kafka" % "1.1.0", "org.apache.hadoop" % "hadoop-client" % "2.2.0", "org.apache.hadoop" % "hadoop-common" % "2.2.0", "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0", "com.github.scopt" %% "scopt" % "3.3.0", "org.apache.spark" %% "spark-sql" % "1.5.1", "org.apache.spark" %% "spark-hive" % "1.5.1", "org.apache.hbase" % "hbase" % "0.94.18" ) resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
4.在/home/hadoop1/xuguokun/目录下用vi编辑WordCount.scala代码文件,代码的内容如下:
import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ object WordCount { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("WordCount").setMaster("spark://10.10.16.251:7077") val sc = new SparkContext(conf) val line = sc.textFile("/home/hadoop1/xuguokun/words.txt") line.flatMap(_.split(",")).map((_, 1)).reduceByKey(_+_).collect().foreach(println) sc.stop() } }5.在 /home/hadoop1/xuguokun/WordCount目录下执行如下命令:
../sbt/sbt package,见到如下信息,说明打包成功:
6.往hdfs上传输words.txt文件
hadoop fs -put /home/hadoop1/xuguokun/words.txt /data
查看words.txt是否已经传到hdfs上的相应目录下
./hadoop fs -ls /data
7.将sbt打包成的jar发布到spark集群上面
./spark-submit --class WordCount --master spark:10.10.16.251:7077 --executor-memory 2g --num-executors 3 /home/hadoop1/xuguokun/WordCount/target/scala-2.10/o2o-spark_2.10-0.1.jar
8.运行的结果是:
注意:words.txt中的内容如下
8.WordCont调试成功