IDEA下搭建SPark本地编译环境并上传到集群运行
环境:
本地:window7 64 +idea15.0.4+scala 2.10.5
集群:ubuntu+spark1.5.2
1.安装scala2.10.5,需要配置环境变量,还需要安装jdk1.7,同样要环境变量,很多教材,不细讲
2.本地安装idea15.0.4:
https://www.jetbrains.com/idea/download/#section=windows
3.安装插件:
http://plugins.jetbrains.com/plugin/?idea&id=1347
直接在idea 15.0.4的file-》setting-》plugins中搜索scala会搜索不到,应该是网络原因,可以去上面的网址下,然后放到idea安装位置的plugins下,重启idea,会发现有scala,但是new project的时候没有
于是删了,然后在setting的plugins中加上http://www.jetbrains.net/confluence/display/SCA/Scala+Plugin+for+IntelliJ+IDEA
然后在install jetbrains plugin中搜索就可以安装上scala 2.2.0
由于spark1.5.2使用的是scala2.10,以及spark-assembly-1.5.2-hadoop2.6.0.jar也是scala2.10
所以找到刚才安装的目录:C:\Users\xubo\.IdeaIC15\config\plugins,我得idea默认安装插件位置,然后保存scala为scala2,将从http://plugins.jetbrains.com/plugin/?idea&id=1347中下载的scala2.10解压到该目录
4.重启idea,就可以新建scala project 然后导入spark-assembly-1.5.2-hadoop2.6.0.jar就可以本地编译spark程序:
示例:SparkPi.scala,从源码中cp,然后加了setMaster
/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ // scalastyle:off println package scalaTest import scala.math.random import org.apache.spark._ /** Computes an approximation to pi */ object SparkPi { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark Pi ").setMaster("local") val spark = new SparkContext(conf) val slices = if (args.length > 0) args(0).toInt else 2 println("slices:\n"+slices) println("args.length:\n"+args.length) val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow val count = spark.parallelize(1 until n, slices).map { i => val x = random * 2 - 1 val y = random * 2 - 1 if (x*x + y*y < 1) 1 else 0 }.reduce(_ + _) println("Pi is roughly " + 4.0 * count / n) spark.stop() } } // scalastyle:on println
D:\1win7\java\jdk\bin\java -Didea.launcher.port=7534 "-Didea.launcher.bin.path=D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\bin" -Dfile.encoding=UTF-8 -classpath "D:\1win7\java\jdk\jre\lib\charsets.jar;D:\1win7\java\jdk\jre\lib\deploy.jar;D:\1win7\java\jdk\jre\lib\ext\access-bridge-64.jar;D:\1win7\java\jdk\jre\lib\ext\dnsns.jar;D:\1win7\java\jdk\jre\lib\ext\jaccess.jar;D:\1win7\java\jdk\jre\lib\ext\localedata.jar;D:\1win7\java\jdk\jre\lib\ext\sunec.jar;D:\1win7\java\jdk\jre\lib\ext\sunjce_provider.jar;D:\1win7\java\jdk\jre\lib\ext\sunmscapi.jar;D:\1win7\java\jdk\jre\lib\ext\zipfs.jar;D:\1win7\java\jdk\jre\lib\javaws.jar;D:\1win7\java\jdk\jre\lib\jce.jar;D:\1win7\java\jdk\jre\lib\jfr.jar;D:\1win7\java\jdk\jre\lib\jfxrt.jar;D:\1win7\java\jdk\jre\lib\jsse.jar;D:\1win7\java\jdk\jre\lib\management-agent.jar;D:\1win7\java\jdk\jre\lib\plugin.jar;D:\1win7\java\jdk\jre\lib\resources.jar;D:\1win7\java\jdk\jre\lib\rt.jar;D:\1win7\scala;D:\1win7\scala\lib;D:\all\idea\scala2\out\production\scala2;G:\149\spark-assembly-1.5.2-hadoop2.6.0.jar;D:\1win7\scala\lib\scala-actors-migration.jar;D:\1win7\scala\lib\scala-actors.jar;D:\1win7\scala\lib\scala-library.jar;D:\1win7\scala\lib\scala-reflect.jar;D:\1win7\scala\lib\scala-swing.jar;D:\1win7\idea\IntelliJ IDEA Community Edition 15.0.4\lib\idea_rt.jar" com.intellij.rt.execution.application.AppMain scalaTest.SparkPi Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/03/03 17:19:19 INFO SparkContext: Running Spark version 1.5.2 16/03/03 17:19:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/03/03 17:19:21 INFO SecurityManager: Changing view acls to: xubo 16/03/03 17:19:21 INFO SecurityManager: Changing modify acls to: xubo 16/03/03 17:19:21 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xubo); users with modify permissions: Set(xubo) 16/03/03 17:19:22 INFO Slf4jLogger: Slf4jLogger started 16/03/03 17:19:22 INFO Remoting: Starting remoting 16/03/03 17:19:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:52826] 16/03/03 17:19:22 INFO Utils: Successfully started service 'sparkDriver' on port 52826. 16/03/03 17:19:22 INFO SparkEnv: Registering MapOutputTracker 16/03/03 17:19:22 INFO SparkEnv: Registering BlockManagerMaster 16/03/03 17:19:22 INFO DiskBlockManager: Created local directory at C:\Users\xubo\AppData\Local\Temp\blockmgr-193ae298-f771-488a-92ee-60c4e94ca9d1 16/03/03 17:19:22 INFO MemoryStore: MemoryStore started with capacity 730.6 MB 16/03/03 17:19:22 INFO HttpFileServer: HTTP File server directory is C:\Users\xubo\AppData\Local\Temp\spark-4b618306-ea29-4c02-a891-754af4d84648\httpd-0a2aa0cd-b7f2-453b-983c-482852013882 16/03/03 17:19:22 INFO HttpServer: Starting HTTP Server 16/03/03 17:19:22 INFO Utils: Successfully started service 'HTTP file server' on port 52827. 16/03/03 17:19:22 INFO SparkEnv: Registering OutputCommitCoordinator 16/03/03 17:19:22 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/03/03 17:19:22 INFO SparkUI: Started SparkUI at http://202.38.84.241:4040 16/03/03 17:19:23 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 16/03/03 17:19:23 INFO Executor: Starting executor ID driver on host localhost 16/03/03 17:19:23 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52834. 16/03/03 17:19:23 INFO NettyBlockTransferService: Server created on 52834 16/03/03 17:19:23 INFO BlockManagerMaster: Trying to register BlockManager 16/03/03 17:19:23 INFO BlockManagerMasterEndpoint: Registering block manager localhost:52834 with 730.6 MB RAM, BlockManagerId(driver, localhost, 52834) 16/03/03 17:19:23 INFO BlockManagerMaster: Registered BlockManager slices: 2 args.length: 0 16/03/03 17:19:24 INFO SparkContext: Starting job: main at NativeMethodAccessorImpl.java:-2 16/03/03 17:19:24 INFO DAGScheduler: Got job 0 (main at NativeMethodAccessorImpl.java:-2) with 2 output partitions 16/03/03 17:19:24 INFO DAGScheduler: Final stage: ResultStage 0(main at NativeMethodAccessorImpl.java:-2) 16/03/03 17:19:24 INFO DAGScheduler: Parents of final stage: List() 16/03/03 17:19:24 INFO DAGScheduler: Missing parents: List() 16/03/03 17:19:24 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at main at NativeMethodAccessorImpl.java:-2), which has no missing parents 16/03/03 17:19:24 INFO MemoryStore: ensureFreeSpace(1856) called with curMem=0, maxMem=766075207 16/03/03 17:19:24 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1856.0 B, free 730.6 MB) 16/03/03 17:19:24 INFO MemoryStore: ensureFreeSpace(1198) called with curMem=1856, maxMem=766075207 16/03/03 17:19:24 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1198.0 B, free 730.6 MB) 16/03/03 17:19:24 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:52834 (size: 1198.0 B, free: 730.6 MB) 16/03/03 17:19:24 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:861 16/03/03 17:19:24 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at main at NativeMethodAccessorImpl.java:-2) 16/03/03 17:19:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 16/03/03 17:19:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 2085 bytes) 16/03/03 17:19:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 16/03/03 17:19:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1031 bytes result sent to driver 16/03/03 17:19:24 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 2085 bytes) 16/03/03 17:19:24 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 16/03/03 17:19:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 190 ms on localhost (1/2) 16/03/03 17:19:24 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1031 bytes result sent to driver 16/03/03 17:19:24 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 33 ms on localhost (2/2) 16/03/03 17:19:24 INFO DAGScheduler: ResultStage 0 (main at NativeMethodAccessorImpl.java:-2) finished in 0.230 s 16/03/03 17:19:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/03/03 17:19:24 INFO DAGScheduler: Job 0 finished: main at NativeMethodAccessorImpl.java:-2, took 0.545201 s Pi is roughly 3.14548 16/03/03 17:19:24 INFO SparkUI: Stopped Spark web UI at http://202.38.84.241:4040 16/03/03 17:19:24 INFO DAGScheduler: Stopping DAGScheduler 16/03/03 17:19:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/03/03 17:19:24 INFO MemoryStore: MemoryStore cleared 16/03/03 17:19:24 INFO BlockManager: BlockManager stopped 16/03/03 17:19:24 INFO BlockManagerMaster: BlockManagerMaster stopped 16/03/03 17:19:24 INFO SparkContext: Successfully stopped SparkContext 16/03/03 17:19:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/03/03 17:19:24 INFO ShutdownHookManager: Shutdown hook called 16/03/03 17:19:24 INFO ShutdownHookManager: Deleting directory C:\Users\xubo\AppData\Local\Temp\spark-4b618306-ea29-4c02-a891-754af4d84648 Process finished with exit code 0
5.将代码打成jar包,上传到集群,请参考:书“”Spark大数据应用“P123
大概:File-》Project Structure-》Artifact,然后选择jar-》from modules dependences。。。
选择class,可以将scala和spark的包删除,不然会很大,最后在idea界面选择build-》build artifact 生成jar导入集群,然后在运行,
运行脚本:
#!/usr/bin/env bash spark-submit --name SparkPi \ --class scalaTest.SparkPi \ --master spark://219.219.220.149:7077 \ --executor-memory 512M \ --total-executor-cores 22 scala2.jar
位置:/home/hadoop/cloud/testByXubo/spark/backupSuccess/ideaSparkPi/1
执行结果:
hadoop@Master:~/cloud/testByXubo/spark/backupSuccess/ideaSparkPi/1$ ./submitJob.sh slices: 2 args.length: 0 Pi is roughly 3.14344