Ubuntu sbt安装和Spark下的使用
环境:
ubuntu server 14.04.04 amd64,hadoop2.6.2,scala 2.11.7,sbt 0.13.11,jdk 1.8
一、安装方法一:
下载tgz压缩包
1 下载
root@spark:~# wget https://dl.bintray.com/sbt/native-packages/sbt/0.13.11/sbt-0.13.11.tgz
2 解压
root@spark:/usr/local# tar zxvf sbt-0.13.11.tgz
3 赋执行权限
root@spark:/usr/local/sbt# chmod u+x bin/sbt
4 环境变量
vi ~/.bashrc
写入
export SBT_HOME=/usr/local/sbt
export PATH=${SBT_HOME}/bin:$PATH
生效
source ~/.bashrc
参考
http://www.linuxdiyf.com/linux/14871.html
二、安装方法二:
wget https://dl.bintray.com/sbt/debian/sbt-0.13.11.deb
dpkg -i sbt-0.13.11.deb
apt-get update
apt-get -f install
apt-get install sbt
参考
http://stackoverflow.com/questions/13711395/install-sbt-on-ubuntu
http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
三、使用一
比如使用如下代码
object Hi {
def main(args: Array[String]) = println("Hi!")
}
可以在命令行输入:
$ mkdir hello
$ cd hello
$ echo 'object Hi { def main(args: Array[String]) = println("Hi!") }' > hw.scala
$ sbt
...
> run
...
Hi!
四、使用二 -- Spark
参考书:Big Data Analytics with Spark - chapter 5
5 使用
5.1 创建目录,.scala,.sbt文件
mkdir WordCount
//WordCount.scala
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object WordCount {
def main(args: Array[String]): Unit = {
val inputPath = args(0)
val outputPath = args(1)
val sc = new SparkContext()
val lines = sc.textFile(inputPath)
val wordCounts = lines.flatMap {line => line.split(" ")}
.map(word => (word, 1))
.reduceByKey(_ + _)
wordCounts.saveAsTextFile(outputPath)
}
}
//wordcount.sbt
//参考:http://www.scala-sbt.org/0.13/docs/Hello.html
//参考:http://www.scala-sbt.org/0.13/docs/Basic-Def.html
//旧版sbt
//name := "word-count"
//version := "1.0.0"
//scalaVersion := "2.10.6"
//libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
新版sbt 2.11.7用法如下:
//wordcount.sbt
lazy val root=(project in file(".")).
settings(
name := "word-count",
version := "1.0.0",
scalaVersion := "2.11.7",
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2" % "provided"
)
结果:
root@spark:~/WordCount# ls
wordcount.sbt WordCount.scala
5.2 编译
首次运行sbt会下很多jar包,可以用sbt -v package编译,显示详情,网速慢等很久。
cd WordCount
sbt package
结果:
Getting Scala 2.10.6 (for sbt)...
downloading https://repo1.maven.org/maven2/org/scala-lang/jline/2.10.6/jline-2.10.6.jar ...
[SUCCESSFUL ] org.scala-lang#jline;2.10.6!jline.jar (3764ms)
downloading https://repo1.maven.org/maven2/org/fusesource/jansi/jansi/1.4/jansi-1.4.jar ...
[SUCCESSFUL ] org.fusesource.jansi#jansi;1.4!jansi.jar (2979ms)
:: retrieving :: org.scala-sbt#boot-scala
confs: [default]
5 artifacts copied, 0 already retrieved (24494kB/6179ms)
[info] Set current project to root (in build file:/root/)
root@spark:~/WordCount# sbt package
[info] Set current project to word-count (in build file:/root/WordCount/)
[info] Updating {file:/root/WordCount/}root...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 1 Scala source to /root/WordCount/target/scala-2.11/classes...
[info] Packaging /root/WordCount/target/scala-2.11/word-count_2.11-1.0.0.jar ...
[info] Done packaging.
[success] Total time: 38 s, completed Jun 10, 2016 11:33:59 AM
如下:
root@spark:~/WordCount/target/scala-2.11# ls
classes word-count_2.11-1.0.0.jar
5.3 Spark submit运行jar包
注意先启动Hadoop 2.6.2
查看输入文件【上传可以用hadoop fs -put /home/alex/t1.log /】
root@spark:~/WordCount# hadoop fs -ls /
-rw-r--r-- 1 root supergroup 556 2016-06-06 07:00 /t1.log
root@spark:~/WordCount# hadoop fs -cat /t1.log
log文本内容如下
[BEGIN] 2016/6/4 12:30:12
[2016/6/4 12:30:12] Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 3.19.0-59-generic x86_64)
[2016/6/4 12:30:12]
[2016/6/4 12:30:12] * Documentation: https://help.ubuntu.com/
[2016/6/4 12:30:12] Last login: Sat Jun 4 06:40:54 2016 from 192.168.10.1
[2016/6/4 12:30:16] root@spark:~# ls
[2016/6/4 12:30:16] cleanjob derby.log image.jpg input metastore_db testtable.java
[2016/6/4 12:30:21] root@spark:~# cd /home/alex
[2016/6/4 12:30:22] root@spark:/home/alex# ls
[2016/6/4 12:30:22] pcshare seed.txt xdata xsetups
提交spark submit
root@spark:~/WordCount/target/scala-2.11# /usr/local/spark/spark-1.5.2-bin-hadoop2.6/bin/spark-submit --class "WordCount" --master local[*] word-count_2.11-1.0.0.jar /t1.log /outsbt
上边命令中的jar包在本地(~/WordCount/target/scala-2.11/),不是在HDFS上,跑的是单机local模式 输入文件在HDFS上 /t1.log 输出也在HDFS上 /outsbt文件夹
查看/outsbt
root@spark:~/WordCount/target/scala-2.11# hadoop fs -ls /outsbt
Found 2 items
-rw-r--r-- 1 root supergroup 0 2016-06-10 11:53 /outsbt/_SUCCESS
-rw-r--r-- 1 root supergroup 565 2016-06-10 11:53 /outsbt/part-00000
查看文件
root@spark:~/WordCount/target/scala-2.11# hadoop fs -cat /outsbt/part-00000
(x86_64),1)
(ls,2)
(4,1)
(12:30:12],4)
(06:40:54,1)
(3.19.0-59-generic,1)
(metastore_db,1)
(cd,1)
(/home/alex,1)
(cleanjob,1)
(root@spark:~#,2)
([2016/6/4,9)
(2016/6/4,1)
(pcshare,1)
(Ubuntu,1)
(xsetups,1)
(12:30:12,1)
(login:,1)
(Welcome,1)
(,11)
(image.jpg,1)
(root@spark:/home/alex#,1)
(to,1)
(*,1)
(Jun,1)
(2016,1)
(Documentation:,1)
(https://help.ubuntu.com/,1)
(12:30:16],2)
(LTS,1)
(xdata,1)
(12:30:22],2)
(derby.log,1)
([BEGIN],1)
(Sat,1)
(seed.txt,1)
(input,1)
(Last,1)
(14.04.4,1)
(from,1)
(12:30:21],1)
((GNU/Linux,1)
(192.168.10.1,1)
(testtable.java,1)
6 参考:
Big Data Analytics with Spark -- Chapter 5
http://www.scala-sbt.org/0.13/docs/Hello.html
http://www.scala-sbt.org/0.13/docs/Basic-Def.html