apache-spark org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120

apache-spark org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120

Ask Question
up vote 2 down vote favorite

I has configured a spark cluster in standalone mode. I can see that both workers are running, but when I start a spark-shell I have this problem: Configuration of spark cluster is automatic.

val lines=sc.parallelize(List(1,2,3,4))

this work ok the new rdd has created but when I start the next task

lines.take(2).foreach(println) 

I have this error that i cant solve:

Output:

 16/02/18 10:27:02 INFO DAGScheduler: Got job 0 (take at :24) with 1 output partitions
 16/02/18 10:27:02 INFO DAGScheduler: Final stage: ResultStage 0 (take at :24)
 16/02/18 10:27:02 INFO DAGScheduler: Parents of final stage: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Missing parents: List() 
 16/02/18 10:27:02 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at :21), which has no missing parents
 16/02/18 10:27:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1288.0 B, free 1288.0 B)
 16/02/18 10:27:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 856.0 B, free 2.1 KB)

A minute and a half later:

16/02/18 10:28:43 WARN NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(0,java.io.IOException: Failed to create directory /srv/spark/work/app-20160218102438-0000/0)] in 2 attempts org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) 
    at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
    at scala.util.Try$.apply(Try.scala:161)
    at scala.util.Failure.recover(Try.scala:185)
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
    at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293) 
    at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.complete(Promise.scala:55) 
    at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235) 
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.processBatch$1(Future.scala:643) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply$mcV$sp(Future.scala:658)
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch$$anonfun$run$1.apply(Future.scala:635) 
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72) 
    at scala.concurrent.Future$InternalCallbackExecutor$Batch.run(Future.scala:634) 
    at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) 
    at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:685) 
    at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) 
    at scala.concurrent.Promise$class.tryFailure(Promise.scala:112) 
    at scala.concurrent.impl.Promise$DefaultPromise.tryFailure(Promise.scala:153) 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:241) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) 
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.util.concurrent.TimeoutException: Cannot receive any reply in 120 seconds 
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anon$1.run(NettyRpcEnv.scala:242) ... 7 more

In one worker, I can see this log error :

Invalid maximum head size: -Xmxm0M could not create JVM

and also I can see a some problem I think that is related to a problem with binding a port or something like this.

share improve this question
  • You have to review your spark conf. Ensure both workers can connect by ssh without password. and check that Xmxm0M –  rhernando  Feb 19 '16 at 9:15
  • yes both can connect without password, it is a requirement to set up a cluster in standalone mode. I can see in the UI that the two worker are alive. But when i start a task fail. I can see in the UI the app is launched but the memory per worker is 0. I change spark.executor.memory=1g but i still see 0 MB per node. I am very confuse about this –  Miren  Feb 19 '16 at 9:45 
  • could you share your spark_env.sh? –  rhernando  Feb 19 '16 at 9:56
  • i solve the problem apparently!!! I have a problem in spark_env.sh one of the variables are deprecated and another problem was that i have a variable with the exevutor memory but when i launched the app i see this paramater with the value0 but i launched the spark-shell with parameter --executor-memory=1g and after i launched a task and this task has executed succesfully with one of the worker.–  Miren  Feb 19 '16 at 10:14 
  • Can you tell which one was deprecated? –  Sahil Sareen  May 20 '16 at 12:15
show  1 more comment

2 Answers

active oldest votes
up vote 2 down vote

The problem can be in area of ports visibility between spark cluster and client instance.

It is strange from user side but that is the feature of Spark architecture - each Spark node should see client instance and specific port defined by spark.driver.port in SparkContext configuration. This option by default is empty and it's mean that this port will be chosen randomly. As a result with default configuration each Spark node need to see any port of client instance. But you can override spark.driver.port.

It can be a problem if your client machine is behind firewall or inside docker container for example. You need to open this port outside

你可能感兴趣的:(spark)