hadoop-yarn集群中,通过shell脚本自动化提交spark任务

spark_submit.sh

#!/bin/sh

# spark_submit.sh
# 这是提交spark任务到yarn分布式集群上的自动化脚本

export HADOOP_HOME=/home/elon/hadoop/hadoop-2.7.5

spark-submit --master yarn --deploy-mode client --class org.training.examples.WordCount
/home/elon/jars/examples-1.0-SNAPSHOT.jar yarn file:///home/elon/spark-2.2.1/README.md

控制台输出

[elon@hadoop1 shell]$ spark_submit.sh    
masterUrl:yarn, inputPath: file:///home/elon/spark-2.2.1/README.md
18/02/11 12:07:17 INFO spark.SparkContext: Running Spark version 2.2.1
18/02/11 12:07:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/02/11 12:07:19 INFO spark.SparkContext: Submitted application: WordCount
18/02/11 12:07:19 INFO spark.SecurityManager: Changing view acls to: elon
18/02/11 12:07:19 INFO spark.SecurityManager: Changing modify acls to: elon
18/02/11 12:07:19 INFO spark.SecurityManager: Changing view acls groups to: 
18/02/11 12:07:19 INFO spark.SecurityManager: Changing modify acls groups to: 
18/02/11 12:07:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(elon); groups with view permissions: Set(); users  with modify permissions: Set(elon); groups with modify permissions: Set()
18/02/11 12:07:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 45560.
18/02/11 12:07:20 INFO spark.SparkEnv: Registering MapOutputTracker
18/02/11 12:07:20 INFO spark.SparkEnv: Registering BlockManagerMaster
18/02/11 12:07:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/02/11 12:07:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/02/11 12:07:20 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-06b84a42-039f-41c5-a4d1-b70ab6009c8c
18/02/11 12:07:20 INFO memory.MemoryStore: MemoryStore started with capacity 117.0 MB
18/02/11 12:07:21 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/02/11 12:07:21 INFO util.log: Logging initialized @6857ms
18/02/11 12:07:22 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/02/11 12:07:22 INFO server.Server: Started @7250ms
18/02/11 12:07:22 INFO server.AbstractConnector: Started ServerConnector@6f884ddb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/02/11 12:07:22 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@324dcd31{/jobs,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1804f60d{/jobs/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@547e29a4{/jobs/job,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b39fd82{/jobs/job/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@21680803{/stages,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c8b96ec{/stages/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d8f2f3a{/stages/stage,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58a55449{/stages/stage/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6e0ff644{/stages/pool,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a2bb0eb{/stages/pool/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d0566ba{/storage,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7728643a{/storage/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5167268{/storage/rdd,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28c0b664{/storage/rdd/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1af7f54a{/environment,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@436390f4{/environment/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68ed96ca{/executors,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3228d990{/executors/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@50b8ae8d{/executors/threadDump,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51c929ae{/executors/threadDump/json,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@29d2d081{/static,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@28a2a3e7{/,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@10b3df93{/api,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3c321bdb{/jobs/job/kill,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3abd581e{/stages/stage/kill,null,AVAILABLE,@Spark}
18/02/11 12:07:22 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.111:4040
18/02/11 12:07:22 INFO spark.SparkContext: Added JAR file:/home/elon/jars/examples-1.0-SNAPSHOT.jar at spark://192.168.1.111:45560/jars/examples-1.0-SNAPSHOT.jar with timestamp 1518322042715
18/02/11 12:07:25 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.1.111:8032
18/02/11 12:07:26 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
18/02/11 12:07:26 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
18/02/11 12:07:26 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
18/02/11 12:07:26 INFO yarn.Client: Setting up container launch context for our AM
18/02/11 12:07:26 INFO yarn.Client: Setting up the launch environment for our AM container
18/02/11 12:07:26 INFO yarn.Client: Preparing resources for our AM container
18/02/11 12:07:30 INFO yarn.Client: Source and destination file systems are the same. Not copying hdfs:/home/elon/spark/spark-libs.jar
18/02/11 12:07:31 INFO yarn.Client: Uploading resource file:/tmp/spark-3b26c620-946b-4efe-a60b-d101e32ec42a/__spark_conf__7401771411523449275.zip -> hdfs://hadoop1:8020/user/elon/.sparkStaging/application_1518316627470_0003/__spark_conf__.zip
18/02/11 12:07:32 INFO spark.SecurityManager: Changing view acls to: elon
18/02/11 12:07:32 INFO spark.SecurityManager: Changing modify acls to: elon
18/02/11 12:07:32 INFO spark.SecurityManager: Changing view acls groups to: 
18/02/11 12:07:32 INFO spark.SecurityManager: Changing modify acls groups to: 
18/02/11 12:07:32 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(elon); groups with view permissions: Set(); users  with modify permissions: Set(elon); groups with modify permissions: Set()
18/02/11 12:07:32 INFO yarn.Client: Submitting application application_1518316627470_0003 to ResourceManager
18/02/11 12:07:32 INFO impl.YarnClientImpl: Submitted application application_1518316627470_0003
18/02/11 12:07:32 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1518316627470_0003 and attemptId None
18/02/11 12:07:33 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:33 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1518322052230
         final status: UNDEFINED
         tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/
         user: elon
18/02/11 12:07:34 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:35 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:36 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:37 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:38 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:39 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:40 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:41 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:42 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:43 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
18/02/11 12:07:43 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> hadoop1, PROXY_URI_BASES -> http://hadoop1:8088/proxy/application_1518316627470_0003), /proxy/application_1518316627470_0003
18/02/11 12:07:43 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
18/02/11 12:07:43 INFO yarn.Client: Application report for application_1518316627470_0003 (state: ACCEPTED)
18/02/11 12:07:44 INFO yarn.Client: Application report for application_1518316627470_0003 (state: RUNNING)
18/02/11 12:07:44 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.1.113
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1518322052230
         final status: UNDEFINED
         tracking URL: http://hadoop1:8088/proxy/application_1518316627470_0003/
         user: elon
18/02/11 12:07:44 INFO cluster.YarnClientSchedulerBackend: Application application_1518316627470_0003 has started running.
18/02/11 12:07:44 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38368.
18/02/11 12:07:44 INFO netty.NettyBlockTransferService: Server created on 192.168.1.111:38368
18/02/11 12:07:44 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:44 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.111:38368 with 117.0 MB RAM, BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:44 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:44 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.111, 38368, None)
18/02/11 12:07:45 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5db3d57c{/metrics/json,null,AVAILABLE,@Spark}
18/02/11 12:07:46 INFO scheduler.EventLoggingListener: Logging events to file:/tmp/spark-events/application_1518316627470_0003
18/02/11 12:07:53 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 290.1 KB, free 116.7 MB)
18/02/11 12:07:54 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.7 KB, free 116.7 MB)
18/02/11 12:07:54 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.111:38368 (size: 23.7 KB, free: 116.9 MB)
18/02/11 12:07:54 INFO spark.SparkContext: Created broadcast 0 from textFile at WordCount.scala:22
18/02/11 12:07:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.113:35724) with ID 1
18/02/11 12:07:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager hadoop3:33799 with 117.0 MB RAM, BlockManagerId(1, hadoop3, 33799, None)
18/02/11 12:07:55 INFO mapred.FileInputFormat: Total input paths to process : 1
18/02/11 12:07:58 INFO spark.SparkContext: Starting job: take at WordCount.scala:26
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Registering RDD 3 (map at WordCount.scala:24)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Got job 0 (take at WordCount.scala:26) with 1 output partitions
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (take at WordCount.scala:26)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
18/02/11 12:08:02 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24), which has no missing parents
18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.8 KB, free 116.7 MB)
18/02/11 12:08:03 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 116.6 MB)
18/02/11 12:08:03 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.111:38368 (size: 2.8 KB, free: 116.9 MB)
18/02/11 12:08:03 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/02/11 12:08:04 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at WordCount.scala:24) (first 15 tasks are for partitions Vector(0, 1))
18/02/11 12:08:04 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
18/02/11 12:08:05 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, hadoop3, executor 1, partition 0, PROCESS_LOCAL, 4856 bytes)
18/02/11 12:08:07 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on hadoop3:33799 (size: 2.8 KB, free: 117.0 MB)
18/02/11 12:08:08 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop3:33799 (size: 23.7 KB, free: 116.9 MB)
18/02/11 12:08:11 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, hadoop3, executor 1, partition 1, PROCESS_LOCAL, 4856 bytes)
18/02/11 12:08:11 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 7027 ms on hadoop3 (executor 1) (1/2)
18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 446 ms on hadoop3 (executor 1) (2/2)
18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 
18/02/11 12:08:12 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at WordCount.scala:24) finished in 7.435 s
18/02/11 12:08:12 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/02/11 12:08:12 INFO scheduler.DAGScheduler: running: Set()
18/02/11 12:08:12 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
18/02/11 12:08:12 INFO scheduler.DAGScheduler: failed: Set()
18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24), which has no missing parents
18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 116.6 MB)
18/02/11 12:08:12 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2013.0 B, free 116.6 MB)
18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.111:38368 (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:12 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/02/11 12:08:12 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (ShuffledRDD[4] at reduceByKey at WordCount.scala:24) (first 15 tasks are for partitions Vector(0))
18/02/11 12:08:12 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks
18/02/11 12:08:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, hadoop3, executor 1, partition 0, NODE_LOCAL, 4632 bytes)
18/02/11 12:08:12 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on hadoop3:33799 (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:12 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.1.113:35724
18/02/11 12:08:12 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 150 bytes
18/02/11 12:08:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 425 ms on hadoop3 (executor 1) (1/1)
18/02/11 12:08:12 INFO scheduler.DAGScheduler: ResultStage 1 (take at WordCount.scala:26) finished in 0.427 s
18/02/11 12:08:12 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
18/02/11 12:08:12 INFO scheduler.DAGScheduler: Job 0 finished: take at WordCount.scala:26, took 14.817908 s
(package,1)
(this,1)
(Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1)
(Because,1)
(Python,2)
(page](http://spark.apache.org/documentation.html).,1)
(cluster.,1)
(its,1)
([run,1)
(general,3)
(have,1)
(pre-built,1)
(YARN,,1)
(locally,2)
(changed,1)
(locally.,1)
(sc.parallelize(1,1)
(only,1)
(several,1)
(This,2)
18/02/11 12:08:12 INFO spark.SparkContext: Invoking stop() from shutdown hook
18/02/11 12:08:13 INFO server.AbstractConnector: Stopped Spark@6f884ddb{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/02/11 12:08:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.111:4040
18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on hadoop3:33799 in memory (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:13 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on 192.168.1.111:38368 in memory (size: 2013.0 B, free: 116.9 MB)
18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
18/02/11 12:08:13 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
18/02/11 12:08:13 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
18/02/11 12:08:13 INFO cluster.YarnClientSchedulerBackend: Stopped
18/02/11 12:08:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/02/11 12:08:13 INFO memory.MemoryStore: MemoryStore cleared
18/02/11 12:08:13 INFO storage.BlockManager: BlockManager stopped
18/02/11 12:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
18/02/11 12:08:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/02/11 12:08:13 INFO spark.SparkContext: Successfully stopped SparkContext
18/02/11 12:08:13 INFO util.ShutdownHookManager: Shutdown hook called
18/02/11 12:08:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3b26c620-946b-4efe-a60b-d101e32ec42a

你可能感兴趣的:(【大数据】➣,Spark)