Spark集群测试

1. Spark Shell测试

Spark Shell是一个特别适合快速开发Spark原型程序的工具,可以帮助我们熟悉Scala语言。即使你对Scala不熟悉,仍然可以使用这一工具。Spark Shell使得用户可以和Spark集群进行交互,提交查询,这便于调试,也便于初学者使用Spark。

 

测试案例1:

[Spark@Master spark]$ MASTER=spark://Master:7077 bin/spark-shell //连接到集群

Spark assembly has been built with Hive, including Datanucleus jars on classpath

14/12/01 11:11:03 INFO spark.SecurityManager: Changing view acls to: Spark,

14/12/01 11:11:03 INFO spark.SecurityManager: Changing modify acls to: Spark,

14/12/01 11:11:03 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )

14/12/01 11:11:03 INFO spark.HttpServer: Starting HTTP Server

14/12/01 11:11:03 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 11:11:03 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:36942

14/12/01 11:11:03 INFO util.Utils: Successfully started service 'HTTP class server' on port 36942.

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.1.0

      /_/



Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)

Type in expressions to have them evaluated.

Type :help for more information.

14/12/01 11:11:10 INFO spark.SecurityManager: Changing view acls to: Spark,

14/12/01 11:11:10 INFO spark.SecurityManager: Changing modify acls to: Spark,

14/12/01 11:11:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )

14/12/01 11:11:11 INFO slf4j.Slf4jLogger: Slf4jLogger started

14/12/01 11:11:11 INFO Remoting: Starting remoting

14/12/01 11:11:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:45322]

14/12/01 11:11:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:45322]

14/12/01 11:11:11 INFO util.Utils: Successfully started service 'sparkDriver' on port 45322.

14/12/01 11:11:11 INFO spark.SparkEnv: Registering MapOutputTracker

14/12/01 11:11:11 INFO spark.SparkEnv: Registering BlockManagerMaster

14/12/01 11:11:12 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201111112-e9cc

14/12/01 11:11:12 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 52705.

14/12/01 11:11:12 INFO network.ConnectionManager: Bound socket to port 52705 with id = ConnectionManagerId(Master,52705)

14/12/01 11:11:12 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB

14/12/01 11:11:12 INFO storage.BlockManagerMaster: Trying to register BlockManager

14/12/01 11:11:12 INFO storage.BlockManagerMasterActor: Registering block manager Master:52705 with 267.3 MB RAM

14/12/01 11:11:12 INFO storage.BlockManagerMaster: Registered BlockManager

14/12/01 11:11:12 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-87ad77b3-40b1-4320-958f-b1d632f2b4f5

14/12/01 11:11:12 INFO spark.HttpServer: Starting HTTP Server

14/12/01 11:11:12 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 11:11:12 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:51107

14/12/01 11:11:12 INFO util.Utils: Successfully started service 'HTTP file server' on port 51107.

14/12/01 11:11:12 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 11:11:12 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040

14/12/01 11:11:12 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

14/12/01 11:11:12 INFO ui.SparkUI: Started SparkUI at http://Master:4040

14/12/01 11:11:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/12/01 11:11:14 INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...

14/12/01 11:11:14 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

14/12/01 11:11:14 INFO repl.SparkILoop: Created spark context..

Spark context available as sc.



scala> 14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141201111115-0000

14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor added: app-20141201111115-0000/0 on worker-20141201031041-Slave1-49261 (Slave1:49261) with 1 cores

14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201111115-0000/0 on hostPort Slave1:49261 with 1 cores, 512.0 MB RAM

14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor added: app-20141201111115-0000/1 on worker-20141201031041-Slave2-33833 (Slave2:33833) with 1 cores

14/12/01 11:11:15 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201111115-0000/1 on hostPort Slave2:33833 with 1 cores, 512.0 MB RAM

14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor updated: app-20141201111115-0000/0 is now RUNNING

14/12/01 11:11:15 INFO client.AppClient$ClientActor: Executor updated: app-20141201111115-0000/1 is now RUNNING

14/12/01 11:11:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave1:41369/user/Executor#-1591583962] with ID 0

14/12/01 11:11:19 INFO storage.BlockManagerMasterActor: Registering block manager Slave1:57062 with 267.3 MB RAM

14/12/01 11:11:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave2:47569/user/Executor#-1622351454] with ID 1

14/12/01 11:11:20 INFO storage.BlockManagerMasterActor: Registering block manager Slave2:52207 with 267.3 MB RAM





scala> val file = sc.textFile("hdfs://Master:9000/data/test1") 14/12/01 11:12:12 INFO storage.MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975

14/12/01 11:12:12 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)

14/12/01 11:12:12 INFO storage.MemoryStore: ensureFreeSpace(12910) called with curMem=163705, maxMem=280248975

14/12/01 11:12:12 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)

14/12/01 11:12:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master:52705 (size: 12.6 KB, free: 267.3 MB)

14/12/01 11:12:12 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0

file: org.apache.spark.rdd.RDD[String] = hdfs://Master:9000/data/test1 MappedRDD[1] at textFile at <console>:12



scala> val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_+_) 14/12/01 11:12:43 INFO mapred.FileInputFormat: Total input paths to process : 1

count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:14



scala> count.collect() 14/12/01 11:12:59 INFO spark.SparkContext: Starting job: collect at <console>:17

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Registering RDD 3 (map at <console>:14)

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:17) with 2 output partitions (allowLocal=false)

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at <console>:17)

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at <console>:14), which has no missing parents

14/12/01 11:12:59 INFO storage.MemoryStore: ensureFreeSpace(3424) called with curMem=176615, maxMem=280248975

14/12/01 11:12:59 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)

14/12/01 11:12:59 INFO storage.MemoryStore: ensureFreeSpace(2051) called with curMem=180039, maxMem=280248975

14/12/01 11:12:59 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)

14/12/01 11:12:59 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master:52705 (size: 2.0 KB, free: 267.3 MB)

14/12/01 11:12:59 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0

14/12/01 11:12:59 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at <console>:14)

14/12/01 11:12:59 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

14/12/01 11:12:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, Slave2, NODE_LOCAL, 1174 bytes)

14/12/01 11:12:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, Slave1, NODE_LOCAL, 1174 bytes)

14/12/01 11:13:00 INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:43475]

14/12/01 11:13:00 INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:57062]

14/12/01 11:13:00 INFO network.ConnectionManager: Accepted connection from [Slave2/192.168.8.31:43976]

14/12/01 11:13:00 INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:57062], 1 messages pending

14/12/01 11:13:00 INFO network.SendingConnection: Initiating connection to [Slave2/192.168.8.31:52207]

14/12/01 11:13:00 INFO network.SendingConnection: Connected to [Slave2/192.168.8.31:52207], 1 messages pending

14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1:57062 (size: 2.0 KB, free: 267.3 MB)

14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave2:52207 (size: 2.0 KB, free: 267.3 MB)

14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1:57062 (size: 12.6 KB, free: 267.3 MB)

14/12/01 11:13:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave2:52207 (size: 12.6 KB, free: 267.3 MB)

14/12/01 11:13:07 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 8197 ms on Slave2 (1/2)

14/12/01 11:13:07 INFO scheduler.DAGScheduler: Stage 1 (map at <console>:14) finished in 8.626 s

14/12/01 11:13:07 INFO scheduler.DAGScheduler: looking for newly runnable stages

14/12/01 11:13:07 INFO scheduler.DAGScheduler: running: Set()

14/12/01 11:13:07 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)

14/12/01 11:13:07 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 8585 ms on Slave1 (2/2)

14/12/01 11:13:07 INFO scheduler.DAGScheduler: failed: Set()

14/12/01 11:13:07 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 

14/12/01 11:13:07 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()

14/12/01 11:13:07 INFO scheduler.DAGScheduler: Submitting Stage 0 (ShuffledRDD[4] at reduceByKey at <console>:14), which is now runnable

14/12/01 11:13:07 INFO storage.MemoryStore: ensureFreeSpace(2112) called with curMem=182090, maxMem=280248975

14/12/01 11:13:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.1 KB, free 267.1 MB)

14/12/01 11:13:07 INFO storage.MemoryStore: ensureFreeSpace(1327) called with curMem=184202, maxMem=280248975

14/12/01 11:13:07 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1327.0 B, free 267.1 MB)

14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master:52705 (size: 1327.0 B, free: 267.3 MB)

14/12/01 11:13:07 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0

14/12/01 11:13:07 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (ShuffledRDD[4] at reduceByKey at <console>:14)

14/12/01 11:13:07 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks

14/12/01 11:13:07 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, Slave2, PROCESS_LOCAL, 948 bytes)

14/12/01 11:13:07 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, Slave1, PROCESS_LOCAL, 948 bytes)

14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1:57062 (size: 1327.0 B, free: 267.3 MB)

14/12/01 11:13:07 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave2:52207 (size: 1327.0 B, free: 267.3 MB)

14/12/01 11:13:08 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@Slave1:36991

14/12/01 11:13:08 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 143 bytes

14/12/01 11:13:08 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@Slave2:50333

14/12/01 11:13:08 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 149 ms on Slave2 (1/2)

14/12/01 11:13:08 INFO scheduler.DAGScheduler: Stage 0 (collect at <console>:17) finished in 0.179 s

14/12/01 11:13:08 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 181 ms on Slave1 (2/2)

14/12/01 11:13:08 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

14/12/01 11:13:08 INFO spark.SparkContext: Job finished: collect at <console>:17, took 8.947687849 s

res0: Array[(String, Int)] = Array((spark,1), (hadoop,2), (hbase,1))



scala> 

测试案例2:

运行Spark自带测试程序

[Spark@Master spark]$ bin/run-example org.apache.spark.examples.SparkPi 2 spark://192.168.8.29:7077

Spark assembly has been built with Hive, including Datanucleus jars on classpath

14/12/01 11:01:24 INFO spark.SecurityManager: Changing view acls to: Spark,

14/12/01 11:01:24 INFO spark.SecurityManager: Changing modify acls to: Spark,

14/12/01 11:01:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )

14/12/01 11:01:24 INFO slf4j.Slf4jLogger: Slf4jLogger started

14/12/01 11:01:25 INFO Remoting: Starting remoting

14/12/01 11:01:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:60670]

14/12/01 11:01:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:60670]

14/12/01 11:01:25 INFO util.Utils: Successfully started service 'sparkDriver' on port 60670.

14/12/01 11:01:25 INFO spark.SparkEnv: Registering MapOutputTracker

14/12/01 11:01:25 INFO spark.SparkEnv: Registering BlockManagerMaster

14/12/01 11:01:25 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201110125-9987

14/12/01 11:01:25 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 35768.

14/12/01 11:01:25 INFO network.ConnectionManager: Bound socket to port 35768 with id = ConnectionManagerId(Master,35768)

14/12/01 11:01:25 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB

14/12/01 11:01:25 INFO storage.BlockManagerMaster: Trying to register BlockManager

14/12/01 11:01:25 INFO storage.BlockManagerMasterActor: Registering block manager Master:35768 with 267.3 MB RAM

14/12/01 11:01:25 INFO storage.BlockManagerMaster: Registered BlockManager

14/12/01 11:01:25 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-68503776-9126-4e30-89a3-83a560210e14

14/12/01 11:01:25 INFO spark.HttpServer: Starting HTTP Server

14/12/01 11:01:25 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 11:01:25 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:33890

14/12/01 11:01:25 INFO util.Utils: Successfully started service 'HTTP file server' on port 33890.

14/12/01 11:01:26 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 11:01:26 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040

14/12/01 11:01:26 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

14/12/01 11:01:26 INFO ui.SparkUI: Started SparkUI at http://Master:4040

14/12/01 11:01:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/12/01 11:01:27 INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/lib/spark-examples-1.1.0-hadoop2.4.0.jar at http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362

14/12/01 11:01:27 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@Master:60670/user/HeartbeatReceiver

14/12/01 11:01:27 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:35

14/12/01 11:01:27 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 2 output partitions (allowLocal=false)

14/12/01 11:01:27 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)

14/12/01 11:01:27 INFO scheduler.DAGScheduler: Parents of final stage: List()

14/12/01 11:01:27 INFO scheduler.DAGScheduler: Missing parents: List()

14/12/01 11:01:27 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no missing parents

14/12/01 11:01:28 INFO storage.MemoryStore: ensureFreeSpace(1728) called with curMem=0, maxMem=280248975

14/12/01 11:01:28 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1728.0 B, free 267.3 MB)

14/12/01 11:01:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)

14/12/01 11:01:28 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks

14/12/01 11:01:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1223 bytes)

14/12/01 11:01:28 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)

14/12/01 11:01:28 INFO executor.Executor: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar with timestamp 1417402887362

14/12/01 11:01:28 INFO util.Utils: Fetching http://Master:33890/jars/spark-examples-1.1.0-hadoop2.4.0.jar to /tmp/fetchFileTemp7489373377783107634.tmp

14/12/01 11:01:28 INFO executor.Executor: Adding file:/tmp/spark-ad7b4d7f-9793-406b-b3a9-21bd79fddf9f/spark-examples-1.1.0-hadoop2.4.0.jar to class loader

14/12/01 11:01:28 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 701 bytes result sent to driver

14/12/01 11:01:28 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1223 bytes)

14/12/01 11:01:28 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)

14/12/01 11:01:29 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 701 bytes result sent to driver

14/12/01 11:01:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 765 ms on localhost (1/2)

14/12/01 11:01:29 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:35) finished in 0.936 s

14/12/01 11:01:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 177 ms on localhost (2/2)

14/12/01 11:01:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

14/12/01 11:01:29 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:35, took 1.3590325 s

Pi is roughly 3.13872

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}

14/12/01 11:01:29 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}

14/12/01 11:01:29 INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040

14/12/01 11:01:29 INFO scheduler.DAGScheduler: Stopping DAGScheduler

14/12/01 11:01:30 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!

14/12/01 11:01:30 INFO network.ConnectionManager: Selector thread was interrupted!

14/12/01 11:01:30 INFO network.ConnectionManager: ConnectionManager stopped

14/12/01 11:01:30 INFO storage.MemoryStore: MemoryStore cleared

14/12/01 11:01:30 INFO storage.BlockManager: BlockManager stopped

14/12/01 11:01:30 INFO storage.BlockManagerMaster: BlockManagerMaster stopped

14/12/01 11:01:30 INFO spark.SparkContext: Successfully stopped SparkContext

14/12/01 11:01:30 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

14/12/01 11:01:30 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

 

2. 利用Intellij IDEA(Scala插件)编写相应的Spark程序后进行打包成.jar文件后,提交到Spark集群进行运行

其中,com.husor.Test.WordCount.scala代码如下:

package com.husor.Test



import org.apache.spark.{SparkContext,SparkConf}

import org.apache.spark.SparkContext._



/**

 * Created by huxiu on 2014/11/27.

 */

object WordCount {

  def main(args: Array[String]) {



    println("Test is starting......")



    if (args.length < 2) {

      System.err.println("Usage: HDFS_InputFile <File> HDFS_OutputDir <Directory>")

      System.exit(1)

    }



    //System.setProperty("hadoop.home.dir", "d:\\winutil\\")



    val conf = new SparkConf().setAppName("WordCount")

                              .setSparkHome("SPARK_HOME")



    val spark = new SparkContext(conf)



    //val spark = new SparkContext("local","WordCount")



    val file = spark.textFile(args(0))



    //在控制台上进行输出

    //file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)

    //val wordcounts = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)



    val wordCounts = file.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_)

    wordCounts.saveAsTextFile(args(1))

    spark.stop()



    println("Test is Succeed!!!")



  }

}

 

相应的执行脚本runSpark.sh如下:

#!/bin/bash



set -x



spark-submit \

--class com.husor.Test.WordCount \

--master spark://Master:7077 \

--executor-memory 512m \

--total-executor-cores 1 \

/home/Spark/husor/spark/SparkTest.jar \

hdfs://Master:9000/data/test1 \

hdfs://Master:9000/user/huxiu/SparkWordCount

给执行脚本runSpark.sh添加执行权限(chmod +x runSpark.sh),执行过程如下:

[Spark@Master spark]$ ./runSpark.sh + spark-submit --class com.husor.Test.WordCount --master spark://Master:7077 --executor-memory 512m --total-executor-cores 1 /home/Spark/husor/spark/SparkTest.jar hdfs://Master:9000/data/test1 hdfs://Master:9000/user/huxiu/SparkWordCount

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Test is starting...... 14/12/01 12:10:50 INFO spark.SecurityManager: Changing view acls to: Spark,

14/12/01 12:10:50 INFO spark.SecurityManager: Changing modify acls to: Spark,

14/12/01 12:10:50 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Spark, ); users with modify permissions: Set(Spark, )

14/12/01 12:10:50 INFO slf4j.Slf4jLogger: Slf4jLogger started

14/12/01 12:10:50 INFO Remoting: Starting remoting

14/12/01 12:10:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@Master:37899]

14/12/01 12:10:51 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@Master:37899]

14/12/01 12:10:51 INFO util.Utils: Successfully started service 'sparkDriver' on port 37899.

14/12/01 12:10:51 INFO spark.SparkEnv: Registering MapOutputTracker

14/12/01 12:10:51 INFO spark.SparkEnv: Registering BlockManagerMaster

14/12/01 12:10:51 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20141201121051-6189

14/12/01 12:10:51 INFO util.Utils: Successfully started service 'Connection manager for block manager' on port 34131.

14/12/01 12:10:51 INFO network.ConnectionManager: Bound socket to port 34131 with id = ConnectionManagerId(Master,34131)

14/12/01 12:10:51 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB

14/12/01 12:10:51 INFO storage.BlockManagerMaster: Trying to register BlockManager

14/12/01 12:10:51 INFO storage.BlockManagerMasterActor: Registering block manager Master:34131 with 267.3 MB RAM

14/12/01 12:10:51 INFO storage.BlockManagerMaster: Registered BlockManager

14/12/01 12:10:51 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-83b486ec-2237-4f71-be00-0418e485151f

14/12/01 12:10:51 INFO spark.HttpServer: Starting HTTP Server

14/12/01 12:10:51 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 12:10:51 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34902

14/12/01 12:10:51 INFO util.Utils: Successfully started service 'HTTP file server' on port 34902.

14/12/01 12:10:51 INFO server.Server: jetty-8.y.z-SNAPSHOT

14/12/01 12:10:51 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040

14/12/01 12:10:51 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.

14/12/01 12:10:51 INFO ui.SparkUI: Started SparkUI at http://Master:4040

14/12/01 12:10:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

14/12/01 12:10:52 INFO spark.SparkContext: Added JAR file:/home/Spark/husor/spark/SparkTest.jar at http://Master:34902/jars/SparkTest.jar with timestamp 1417407052941

14/12/01 12:10:53 INFO client.AppClient$ClientActor: Connecting to master spark://Master:7077...

14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0

14/12/01 12:10:53 INFO storage.MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975

14/12/01 12:10:53 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)

14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20141201121053-0006

14/12/01 12:10:53 INFO client.AppClient$ClientActor: Executor added: app-20141201121053-0006/0 on worker-20141201031041-Slave1-49261 (Slave1:49261) with 1 cores

14/12/01 12:10:53 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20141201121053-0006/0 on hostPort Slave1:49261 with 1 cores, 512.0 MB RAM

14/12/01 12:10:54 INFO client.AppClient$ClientActor: Executor updated: app-20141201121053-0006/0 is now RUNNING

14/12/01 12:10:54 INFO storage.MemoryStore: ensureFreeSpace(12910) called with curMem=163705, maxMem=280248975

14/12/01 12:10:54 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 12.6 KB, free 267.1 MB)

14/12/01 12:10:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Master:34131 (size: 12.6 KB, free: 267.3 MB)

14/12/01 12:10:54 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0

14/12/01 12:10:54 INFO mapred.FileInputFormat: Total input paths to process : 1

14/12/01 12:10:55 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id

14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap

14/12/01 12:10:55 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition

14/12/01 12:10:55 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id

14/12/01 12:10:55 INFO spark.SparkContext: Starting job: saveAsTextFile at WordCount.scala:35

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:34)

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Got job 0 (saveAsTextFile at WordCount.scala:35) with 2 output partitions (allowLocal=false)

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(saveAsTextFile at WordCount.scala:35)

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 1)

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Missing parents: List(Stage 1)

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Submitting Stage 1 (MappedRDD[3] at map at WordCount.scala:34), which has no missing parents

14/12/01 12:10:55 INFO storage.MemoryStore: ensureFreeSpace(3400) called with curMem=176615, maxMem=280248975

14/12/01 12:10:55 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.3 KB, free 267.1 MB)

14/12/01 12:10:55 INFO storage.MemoryStore: ensureFreeSpace(2055) called with curMem=180015, maxMem=280248975

14/12/01 12:10:55 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 267.1 MB)

14/12/01 12:10:55 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Master:34131 (size: 2.0 KB, free: 267.3 MB)

14/12/01 12:10:55 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0

14/12/01 12:10:55 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[3] at map at WordCount.scala:34)

14/12/01 12:10:55 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

14/12/01 12:10:57 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@Slave1:38410/user/Executor#898843507] with ID 0

14/12/01 12:10:57 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, Slave1, NODE_LOCAL, 1222 bytes)

14/12/01 12:10:57 INFO storage.BlockManagerMasterActor: Registering block manager Slave1:44906 with 267.3 MB RAM

14/12/01 12:10:58 INFO network.ConnectionManager: Accepted connection from [Slave1/192.168.8.30:43149]

14/12/01 12:10:58 INFO network.SendingConnection: Initiating connection to [Slave1/192.168.8.30:44906]

14/12/01 12:10:58 INFO network.SendingConnection: Connected to [Slave1/192.168.8.30:44906], 1 messages pending

14/12/01 12:10:58 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on Slave1:44906 (size: 2.0 KB, free: 267.3 MB)

14/12/01 12:10:58 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on Slave1:44906 (size: 12.6 KB, free: 267.3 MB)

14/12/01 12:10:59 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, Slave1, NODE_LOCAL, 1222 bytes)

14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 159 ms on Slave1 (1/2)

14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 2454 ms on Slave1 (2/2)

14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 

14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stage 1 (map at WordCount.scala:34) finished in 4.444 s

14/12/01 12:11:00 INFO scheduler.DAGScheduler: looking for newly runnable stages

14/12/01 12:11:00 INFO scheduler.DAGScheduler: running: Set()

14/12/01 12:11:00 INFO scheduler.DAGScheduler: waiting: Set(Stage 0)

14/12/01 12:11:00 INFO scheduler.DAGScheduler: failed: Set()

14/12/01 12:11:00 INFO scheduler.DAGScheduler: Missing parents for Stage 0: List()

14/12/01 12:11:00 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:35), which is now runnable

14/12/01 12:11:00 INFO storage.MemoryStore: ensureFreeSpace(57552) called with curMem=182070, maxMem=280248975

14/12/01 12:11:00 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 56.2 KB, free 267.0 MB)

14/12/01 12:11:00 INFO storage.MemoryStore: ensureFreeSpace(19863) called with curMem=239622, maxMem=280248975

14/12/01 12:11:00 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.4 KB, free 267.0 MB)

14/12/01 12:11:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Master:34131 (size: 19.4 KB, free: 267.2 MB)

14/12/01 12:11:00 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0

14/12/01 12:11:00 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[5] at saveAsTextFile at WordCount.scala:35)

14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks

14/12/01 12:11:00 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, Slave1, PROCESS_LOCAL, 996 bytes)

14/12/01 12:11:00 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on Slave1:44906 (size: 19.4 KB, free: 267.2 MB)

14/12/01 12:11:00 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@Slave1:51850

14/12/01 12:11:00 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 133 bytes

14/12/01 12:11:00 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 3, Slave1, PROCESS_LOCAL, 996 bytes)

14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 412 ms on Slave1 (1/2)

14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stage 0 (saveAsTextFile at WordCount.scala:35) finished in 0.710 s

14/12/01 12:11:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 3) in 308 ms on Slave1 (2/2)

14/12/01 12:11:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 

14/12/01 12:11:00 INFO spark.SparkContext: Job finished: saveAsTextFile at WordCount.scala:35, took 5.556490798 s

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}

14/12/01 12:11:00 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}

14/12/01 12:11:00 INFO ui.SparkUI: Stopped Spark web UI at http://Master:4040

14/12/01 12:11:00 INFO scheduler.DAGScheduler: Stopping DAGScheduler

14/12/01 12:11:00 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors

14/12/01 12:11:00 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down

14/12/01 12:11:01 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(Slave1,44906)

14/12/01 12:11:01 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,44906)

14/12/01 12:11:01 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(Slave1,44906)

14/12/01 12:11:02 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!

14/12/01 12:11:02 INFO network.ConnectionManager: Selector thread was interrupted!

14/12/01 12:11:02 INFO network.ConnectionManager: ConnectionManager stopped

14/12/01 12:11:02 INFO storage.MemoryStore: MemoryStore cleared

14/12/01 12:11:02 INFO storage.BlockManager: BlockManager stopped

14/12/01 12:11:02 INFO storage.BlockManagerMaster: BlockManagerMaster stopped

14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

14/12/01 12:11:02 INFO spark.SparkContext: Successfully stopped SparkContext

Test is Succeed!!!

14/12/01 12:11:02 INFO Remoting: Remoting shut down

14/12/01 12:11:02 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

[Spark@Master spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00001

14/12/01 12:11:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

(spark,1)

(hadoop,2)

(hbase,1)

[Spark@Master spark]$ hdfs dfs -ls /user/huxiu/SparkWordCount/

14/12/01 12:11:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 3 items

-rw-r--r--   2 Spark huxiu          0 2014-12-01 12:11 /user/huxiu/SparkWordCount/_SUCCESS

-rw-r--r--   2 Spark huxiu          0 2014-12-01 12:11 /user/huxiu/SparkWordCount/part-00000

-rw-r--r--   2 Spark huxiu         31 2014-12-01 12:11 /user/huxiu/SparkWordCount/part-00001

[Spark@Master spark]$ hdfs dfs -cat /user/huxiu/SparkWordCount/part-00000

14/12/01 12:11:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Note:

运行过程中可能会出现 Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory异常,而内存肯定是够的,但就是无法获取资源!检查防火墙,果然客户端只开启的对80端口的访问,其他都禁止了!

Solution:

关闭各节点上的防火墙(service iptables stop),然后在Spark on yarn集群上执行上述脚本runSpark.sh即可

你可能感兴趣的:(spark)