windows安装Spark单机开发测试环境

每次都要打包上传jar包到服务器,提交spark-submit任务?难道不能直接在windows的IDE中直接右键运行?当然可以!

环境准备

1、IntelliJ IDEA安装
2、Java环境变量配置
win+R -> cmd

# java -version

java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

3、Scala环境变量配置

# scala

Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_8
0).
Type in expressions to have them evaluated.
Type :help for more information.

安装Spark

1、spark 下载
windows安装Spark单机开发测试环境_第1张图片

2、下载后解压到D盘,保证所在路径中没有空格

3、进入解压目录/bin/
shift+右键 -> 在此处打开命令窗口

# spark-shell
//报错
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha
doop binaries.
...
...
java.lang.RuntimeException: java.lang.NullPointerException

安装winutils的windows版本

1、下载winutils包
hadoop-common-2.2.0-bin

下载zip包,hadoop-common-2.2.0-bin-master.zip,随意解压到D盘一个目录

2、 增加HADOOP_HOME环境变量
新建系统变量HADOOP_HOME,值是下载的zip包解压的目录,然后在系统变量path里增加%HADOOP_HOME%\bin

3、再次运行spark-shell

# spark-shell
//报错
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /t
mp/hive on HDFS should be writable. Current permissions are: rwx------

\tmp\hive授权

1、创建D:\tmp\hive目录(\tmp\hive必须在D的根目录下)
2、进入winutils目录,打开命令行

# winutils.exe chmod 777 /tmp/hive

3、再次执行spark-shell

# spark-shell
//成功
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
...
...
SQL context available as sqlContext.

4、更改输出日志级别

spark shell中的输出好乱,可以更改输出日志的级别,进行控制
a、进入%SPARK_HOME%\conf
b、在conf中有个日志配置的模板,复制一个log4j.properties.template,改名为log4j.properties
c、将log4j.rootCategory=INFO, console 改为log4j.rootCategory=WARN, console

IDE程序开发

在Intellij IDEA中创建scala项目

package my.test

import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.SQLContext
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
object TestSpark{

  def main(args: Array[String]): Unit = {
      //设置hadoop路径
      System.setProperty("hadoop.home.dir", "D:\\hadoop-common-2.2.0-bin-master")
      val conf = new SparkConf().setMaster("local").setAppName("SparkTest")
      val sc = new SparkContext(conf)
      //设置console日志输出级别
      sc.setLogLevel("WARN")
      val data_test = Array(1,2,3,4,5,6,7,8,9,10)
      val disData = sc.parallelize(data_test)
      val result = disData.reduce(_+_)
      println(result)
      sc.stop()
      }
}

在IDE中右键运行

输出:


18/02/11 13:34:49 INFO SparkUI: Started SparkUI at http://10.31.17.95:4040

18/02/11 13:34:50 INFO BlockManagerMaster: Registered BlockManager
55

你可能感兴趣的:(大数据,windows,spark)