每次都要打包上传jar包到服务器,提交spark-submit任务?难道不能直接在windows的IDE中直接右键运行?当然可以!
1、IntelliJ IDEA安装
2、Java环境变量配置
win+R -> cmd
# java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
3、Scala环境变量配置
# scala
Welcome to Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_8
0).
Type in expressions to have them evaluated.
Type :help for more information.
2、下载后解压到D盘,保证所在路径中没有空格
3、进入解压目录/bin/ shift+右键
-> 在此处打开命令窗口
# spark-shell
//报错
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Ha
doop binaries.
...
...
java.lang.RuntimeException: java.lang.NullPointerException
1、下载winutils包
hadoop-common-2.2.0-bin
下载zip包,hadoop-common-2.2.0-bin-master.zip,随意解压到D盘一个目录
2、 增加HADOOP_HOME环境变量
新建系统变量HADOOP_HOME,值是下载的zip包解压的目录,然后在系统变量path里增加%HADOOP_HOME%\bin
3、再次运行spark-shell
# spark-shell
//报错
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /t
mp/hive on HDFS should be writable. Current permissions are: rwx------
1、创建D:\tmp\hive目录(\tmp\hive必须在D的根目录下)
2、进入winutils目录,打开命令行
# winutils.exe chmod 777 /tmp/hive
3、再次执行spark-shell
# spark-shell
//成功
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.1
/_/
Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
...
...
SQL context available as sqlContext.
4、更改输出日志级别
spark shell中的输出好乱,可以更改输出日志的级别,进行控制
a、进入%SPARK_HOME%\conf
b、在conf中有个日志配置的模板,复制一个log4j.properties.template,改名为log4j.properties
c、将log4j.rootCategory=INFO, console
改为log4j.rootCategory=WARN, console
在Intellij IDEA中创建scala项目
package my.test
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.sql.SQLContext
import org.apache.spark.mllib.clustering.{KMeans, KMeansModel}
import org.apache.spark.mllib.linalg.Vectors
object TestSpark{
def main(args: Array[String]): Unit = {
//设置hadoop路径
System.setProperty("hadoop.home.dir", "D:\\hadoop-common-2.2.0-bin-master")
val conf = new SparkConf().setMaster("local").setAppName("SparkTest")
val sc = new SparkContext(conf)
//设置console日志输出级别
sc.setLogLevel("WARN")
val data_test = Array(1,2,3,4,5,6,7,8,9,10)
val disData = sc.parallelize(data_test)
val result = disData.reduce(_+_)
println(result)
sc.stop()
}
}
在IDE中右键运行
输出:
…
18/02/11 13:34:49 INFO SparkUI: Started SparkUI at http://10.31.17.95:4040
…
18/02/11 13:34:50 INFO BlockManagerMaster: Registered BlockManager
55