部署使用的spark版本是spark1.3.0部署环境:
主节点centos7操作系统 2g内存
从节点debian系统1g内存(2个)
spark-env.sh的设置如下:
export SCALA_HOME=/usr/local/scala-2.10.4
export SPARK_MASTER_IP=master
export SPARK_LOCAL_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_WORKER_CORE=1
export SPARK_WORKER_MEMORY=512m
部署完成后用submit提交程序会报错,如下,运行时添加executor-memory和driver-memory将内存调小,错误是同样的。./bin/spark-submit --class SimpleApp --master spark://172.21.7.182:7077 executor-memory 256m driver-memory 256m ~/spark_wordcount/target/scala-2.10/simple-project_2.10-1.0.jar
[hadoop@master spark-1.3.0-bin-hadoop2.4]$ ./bin/spark-submit --class SimpleApp --master spark://172.21.7.182:7077 ~/spark_wordcount/target/scala-2.10/simple-project_2.10-1.0.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/04/22 14:30:09 INFO SparkContext: Running Spark version 1.3.0
15/04/22 14:30:09 WARN Utils: Your hostname, master resolves to a loopback address: 127.0.0.1; using 172.21.7.182 instead (on interface ens33)
15/04/22 14:30:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/22 14:30:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/04/22 14:30:11 INFO SecurityManager: Changing view acls to: hadoop
15/04/22 14:30:11 INFO SecurityManager: Changing modify acls to: hadoop
15/04/22 14:30:11 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/04/22 14:30:12 INFO Slf4jLogger: Slf4jLogger started
15/04/22 14:30:12 INFO Remoting: Starting remoting
15/04/22 14:30:12 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:34207]
15/04/22 14:30:12 INFO Utils: Successfully started service 'sparkDriver' on port 34207.
15/04/22 14:30:12 INFO SparkEnv: Registering MapOutputTracker
15/04/22 14:30:12 INFO SparkEnv: Registering BlockManagerMaster
15/04/22 14:30:12 INFO DiskBlockManager: Created local directory at /tmp/spark-53ce348a-13c5-4150-87c1-9c54c9618d08/blockmgr-c810fb30-e229-4835-a509-36f6af26140d
15/04/22 14:30:12 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/04/22 14:30:12 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a830a518-62b2-44e6-b9a3-8b4742f56c3f/httpd-2281965f-a87d-46a4-99aa-a185623c63c3
15/04/22 14:30:13 INFO HttpServer: Starting HTTP Server
15/04/22 14:30:13 INFO Server: jetty-8.y.z-SNAPSHOT
15/04/22 14:30:13 INFO AbstractConnector: Started [email protected]:52592
15/04/22 14:30:13 INFO Utils: Successfully started service 'HTTP file server' on port 52592.
15/04/22 14:30:13 INFO SparkEnv: Registering OutputCommitCoordinator
15/04/22 14:30:13 INFO Server: jetty-8.y.z-SNAPSHOT
15/04/22 14:30:13 INFO AbstractConnector: Started [email protected]:4040
15/04/22 14:30:13 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/04/22 14:30:13 INFO SparkUI: Started SparkUI at http://172.21.7.182:4040
15/04/22 14:30:13 INFO SparkContext: Added JAR file:/home/hadoop/spark_wordcount/target/scala-2.10/simple-project_2.10-1.0.jar at http://172.21.7.182:52592/jars/simple-project_2.10-1.0.jar with timestamp 1429684213446
15/04/22 14:30:13 INFO AppClient$ClientActor: Connecting to master akka.tcp://[email protected]:7077/user/Master...
15/04/22 14:30:13 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150422143013-0002
15/04/22 14:30:14 INFO AppClient$ClientActor: Executor added: app-20150422143013-0002/0 on worker-20150422013644-bananapi-46063 (bananapi:46063) with 2 cores
15/04/22 14:30:14 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150422143013-0002/0 on hostPort bananapi:46063 with 2 cores, 512.0 MB RAM
15/04/22 14:30:14 INFO AppClient$ClientActor: Executor added: app-20150422143013-0002/1 on worker-20150422013644-bananapi-59551 (bananapi:59551) with 2 cores
15/04/22 14:30:14 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150422143013-0002/1 on hostPort bananapi:59551 with 2 cores, 512.0 MB RAM
15/04/22 14:30:14 INFO AppClient$ClientActor: Executor updated: app-20150422143013-0002/0 is now RUNNING
15/04/22 14:30:14 INFO AppClient$ClientActor: Executor updated: app-20150422143013-0002/1 is now RUNNING
15/04/22 14:30:14 INFO AppClient$ClientActor: Executor updated: app-20150422143013-0002/1 is now LOADING
15/04/22 14:30:14 INFO AppClient$ClientActor: Executor updated: app-20150422143013-0002/0 is now LOADING
15/04/22 14:30:14 INFO NettyBlockTransferService: Server created on 53124
15/04/22 14:30:14 INFO BlockManagerMaster: Trying to register BlockManager
15/04/22 14:30:14 INFO BlockManagerMasterActor: Registering block manager 172.21.7.182:53124 with 265.4 MB RAM, BlockManagerId(
15/04/22 14:30:14 INFO BlockManagerMaster: Registered BlockManager
15/04/22 14:30:14 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/04/22 14:30:14 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556
15/04/22 14:30:14 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB)
15/04/22 14:30:14 INFO MemoryStore: ensureFreeSpace(22692) called with curMem=163705, maxMem=278302556
15/04/22 14:30:14 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 265.2 MB)
15/04/22 14:30:14 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.21.7.182:53124 (size: 22.2 KB, free: 265.4 MB)
15/04/22 14:30:14 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/04/22 14:30:14 INFO SparkContext: Created broadcast 0 from textFile at SimpleApp.scala:10
15/04/22 14:30:15 INFO FileInputFormat: Total input paths to process : 1
15/04/22 14:30:15 INFO SparkContext: Starting job: count at SimpleApp.scala:11
15/04/22 14:30:15 INFO DAGScheduler: Got job 0 (count at SimpleApp.scala:11) with 2 output partitions (allowLocal=false)
15/04/22 14:30:15 INFO DAGScheduler: Final stage: Stage 0(count at SimpleApp.scala:11)
15/04/22 14:30:15 INFO DAGScheduler: Parents of final stage: List()
15/04/22 14:30:15 INFO DAGScheduler: Missing parents: List()
15/04/22 14:30:15 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[2] at filter at SimpleApp.scala:11), which has no missing parents
15/04/22 14:30:15 INFO MemoryStore: ensureFreeSpace(2848) called with curMem=186397, maxMem=278302556
15/04/22 14:30:15 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.8 KB, free 265.2 MB)
15/04/22 14:30:15 INFO MemoryStore: ensureFreeSpace(2055) called with curMem=189245, maxMem=278302556
15/04/22 14:30:15 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.0 KB, free 265.2 MB)
15/04/22 14:30:15 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.21.7.182:53124 (size: 2.0 KB, free: 265.4 MB)
15/04/22 14:30:15 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/04/22 14:30:15 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:839
15/04/22 14:30:15 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MapPartitionsRDD[2] at filter at SimpleApp.scala:11)
15/04/22 14:30:15 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/04/22 14:30:30 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/04/22 14:30:32 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@bananapi:38979/user/Executor#266790798] with ID 1
15/04/22 14:30:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:32 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@bananapi:43806/user/Executor#-850130035] with ID 0
15/04/22 14:30:33 INFO BlockManagerMasterActor: Registering block manager bananapi:60321 with 267.3 MB RAM, BlockManagerId(1, bananapi, 60321)
15/04/22 14:30:33 INFO BlockManagerMasterActor: Registering block manager bananapi:51018 with 267.3 MB RAM, BlockManagerId(0, bananapi, 51018)
15/04/22 14:30:34 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, bananapi): java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1155)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:166)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
... 11 more
Caused by: java.lang.IllegalArgumentException
at org.apache.spark.io.SnappyCompressionCodec.
... 20 more
15/04/22 14:30:34 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor bananapi: java.io.IOException (java.lang.reflect.InvocationTargetException) [duplicate 1]
15/04/22 14:30:34 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 2, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:34 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 3, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:34 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 2) on executor bananapi: java.io.IOException (java.lang.reflect.InvocationTargetException) [duplicate 2]
15/04/22 14:30:34 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:35 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 3) on executor bananapi: java.io.IOException (java.lang.reflect.InvocationTargetException) [duplicate 3]
15/04/22 14:30:35 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:35 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on executor bananapi: java.io.IOException (java.lang.reflect.InvocationTargetException) [duplicate 4]
15/04/22 14:30:35 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:35 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor bananapi: java.io.IOException (java.lang.reflect.InvocationTargetException) [duplicate 5]
15/04/22 14:30:35 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, bananapi, PROCESS_LOCAL, 1389 bytes)
15/04/22 14:30:35 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor bananapi: java.io.IOException (java.lang.reflect.InvocationTargetException) [duplicate 6]
15/04/22 14:30:35 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
15/04/22 14:30:35 INFO TaskSchedulerImpl: Cancelling stage 0
15/04/22 14:30:35 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/04/22 14:30:35 INFO DAGScheduler: Job 0 failed: count at SimpleApp.scala:11, took 20.358100 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, bananapi): java.io.IOException: java.lang.reflect.InvocationTargetException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1155)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:58)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:166)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1152)
... 11 more
Caused by: java.lang.IllegalArgumentException
at org.apache.spark.io.SnappyCompressionCodec.
... 20 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
查看worker的logs发现报错:
15/04/23 07:14:43 INFO Worker: Asked to launch executor app-20150423151444-0005/1 for Simple Application15/04/23 07:14:46 INFO Utils: Spark assembly has been built with Hive, including Datanucleus jars on classpath15/04/23 07:14:46 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/jdk1.7.0_60/bin/java" "-cp" ":/usr/local/spark/spark-1.3.0-bin-hadoop2.4/sbin/../conf:/usr/local/spark/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/usr/local/spark/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar" "-XX:MaxPermSize=128m" "-Dspark.driver.port=56739" "-Xms256M" "-Xmx256M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@master:56739/user/CoarseGrainedScheduler" "--executor-id" "1" "--hostname" "worker1" "--cores" "2" "--app-id" "app-20150423151444-0005" "--worker-url" "akka.tcp://sparkWorker@worker1:55641/user/Worker"
15/04/23 07:15:03 INFO Worker: Asked to kill executor app-20150423151444-0005/1
15/04/23 07:15:03 INFO ExecutorRunner: Runner thread for executor app-20150423151444-0005/1 interrupted
15/04/23 07:15:03 INFO ExecutorRunner: Killing process!
15/04/23 07:15:03 ERROR FileAppender: Error writing stream to file /usr/local/spark/spark-1.3.0-bin-hadoop2.4/work/app-20150423151444-0005/1/stderr
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617)
at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)15/04/23 07:15:04 INFO Worker: Executor app-20150423151444-0005/1 finished with state KILLED exitStatus 14315/04/23 07:15:04 INFO Worker: Cleaning up local directories for application app-20150423151444-000515/04/23 07:15:04 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@worker1:43177] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].15/04/23 07:15:04 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%40172.21.7.128%3A39296-7#1191656166] was not delivered. [8] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
参考 http://blog.sina.com.cn/s/blog_59c29ded0102v5m3.html
http://blog.csdn.net/oopsoom/article/details/38763985
可能是内存不足导致的。