Spark独立应用程序 Scala

  1. 安装sbt
  2. 创建应用
  3. 使用sbt打包Scala程序
  4. spark-submit 运行程序
  5. 安装idea

安装sbt

创建目录

mkdir /usr/local/sbt

sbt-launch.jar

cp ~/下载/sbt-launch.jar .
chmod u+x ./sbt
检查
hadoop@dhjvirtualmachine:/usr/local/sbt$ ./sbt sbt-version
[info] Set current project to sbt (in build file:/usr/local/sbt/)
[warn] The `-` command is deprecated in favor of `onFailure` and will be removed in 0.14.0
hadoop@dhjvirtualmachine:/usr/local/sbt$ ./sbt sbt-version
[info] Set current project to sbt (in build file:/usr/local/sbt/)
[info] 0.13.11

创建应用

终端中执行如下命令创建一个文件夹 sparkapp 作为应用程序根目录

cd ~           # 进入用户主文件夹
mkdir ./sparkapp        # 创建应用程序根目录
mkdir -p ./sparkapp/src/main/scala     # 创建所需的文件夹结构

创建代码文件

vim ./sparkapp/src/main/scala/SimpleApp.scala
    /* SimpleApp.scala */
    import org.apache.spark.SparkContext
    import org.apache.spark.SparkContext._
    import org.apache.spark.SparkConf
 
    object SimpleApp {
        def main(args: Array[String]) {
            val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
            val conf = new SparkConf().setAppName("Simple Application")
            val sc = new SparkContext(conf)
            val logData = sc.textFile(logFile, 2).cache()
            val numAs = logData.filter(line => line.contains("a")).count()
            val numBs = logData.filter(line => line.contains("b")).count()
            println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
        }
    }

使用sbt打包Scala程序

./sparkapp 中新建文件 simple.sbt
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2"

检查目录结构
cd ~/sparkapp
find .

./simple.sbt
./src
./src/main
./src/main/scala
./src/main/scala/SimpleApp.scala

打包
hadoop@dhjvirtualmachine:~/sparkapp$  /usr/local/sbt/sbt package

spark-submit 运行程序

/usr/local/spark/bin/spark-submit --class "SimpleApp" /home/hadoop/sparkapp/target/scala-2.10/simple-project_2.10-1.0.jar 2>&1 | grep "Lines with a:"

Lines with a: 58, Lines with b: 2617/12/27 23:43:23 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 65 ms on localhost (2/2)

使用idea maven 打包在spark上运行 scala程序

Spark独立应用程序 Scala_第1张图片

你可能感兴趣的:(大数据)