sbt创建分布式spark任务

工作目录是word


代码在word/src/main/scala/WordCount.scala
1.直接执行 sbt 会在当前目录下面创建 target 目录
sbt 的目录格局一般为 lib/ (该目录下存储与编译相关的 jar 文件) project/ src/main/scala/src/main/test/scala
复制 jar 文件 spark-assembly *hadoop2.5.1.jar 到 lib 目录下


[root@localhost word]# find / -name spark*jar |grep assem
/opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
[root@localhost word]# cp /opt/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar lib/ 
[root@localhost word]# ls lib 
spark-assembly-1.6.0-hadoop2.6.0.jar
编辑 wordcount.scala
vi -p /word/src/main/scala/spark/example/WordCount.scala




import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "/input/README.md" // 应该是hdfs文件在我hdfs里input文件夹下有一个README.md
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}




编辑 build.sbt


[root@localhost word]# cat build.sbt 
name := "Simple Project"
 
version := "1.0"
 
scalaVersion := "2.11.7"
 
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"


6 . 编译打包 成 jar 文件


[root@localhost word]# sbt package 
[info] Set current project to wordCount (in build file:/opt/htt/temp_20140611/java/word/) 
[info] Updating {file:/opt/htt/temp_20140611/java/word/}word… 
[info] Resolving jline#jline;2.12 … 
[info] Done updating. 
[info] Compiling 2 Scala sources to /opt/htt/temp_20140611/java/word/target/scala-2.11/classes… 
[warn] Multiple main classes detected. Run ‘show discoveredMainClasses’ to see the list 
[info] Packaging /opt/htt/temp_20140611/java/word/target/scala-2.11/wordcount_2.11-1.0.jar … 
[info] Done packaging. 
[success] Total time: 11 s, completed Jan 5, 2015 8:37:38 AM 



hadoop namenode -format

hadoop start

spark start

hadoop fs -put

submit


[root@localhost word]#spark-submit --class WordCount --master spark://master11:7077 /WordCount/target/scala-2.11/simple-project_2.11-1.0.jar

(spark-submit --class org.apache.spark.examples.SparkPi --master spark://master01:7077 ~/Pi/target/scala-2.11/simpleproject_2.11-1.0.jar//看代码里面有木有package代码)


nohup spark-submit --class org.apache.spark.examples.SparkPageRank --master spark://master01:7077 /opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar hdfs://master01:9000/input/facebook.txt


nohup spark-submit  --class org.apache.spark.examples.graphx.LiveJournalPageRank --master spark://master01:7077 /opt/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar hdfs://master01:9000/input/facebook.txt --output=/data --numEPart=192  



可以登录网页http://master:8080/

http://master:4040/

查看工作情况







注意事项:使用的文件要放在hdfs里面
scala代码名字要和里面的类名字一样  如果修改代码每一次编译要把 project和target删除

你可能感兴趣的:(sbt创建分布式spark任务)