spark-streaming:Could not compute split, block not found

14/10/07 18:10:27 WARN scheduler.TaskSetManager: Lost task 45.0 in stage 12.0 (TID 129, domU-12-31-39-04-60-07.compute-1.internal): java.lang.Exception: Could not compute split, block input-0-1412705397200 not found 
1278         org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) 
1279         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
1280         org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
1281         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) 
1282         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
1283         org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
1284         org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) 
1285         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) 
1286         org.apache.spark.rdd.RDD.iterator(RDD.scala:229) 
1287         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) 
1288         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
1289         org.apache.spark.scheduler.Task.run(Task.scala:54) 
1290         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
1291         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
1292         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 

1293         java.lang.Thread.run(Thread.java:745) 

spark-streaming程序在跑时有时会报错如上。

原因如下:

JavaReceiverInputDStream<SparkFlumeEvent> flumeStream = FlumeUtils.createStream(jssc, hostIp, port);

用上面的方法创建DStream时,默认采用的storageLevel是MEMROY_ONLY_SER,就是接收到的数据仅存于内存。

那么如果我们启动内存driver-memory和executor-memory(主要是executor-memory)设置小了,那么spark会自动丢弃放不下的数据,那么当需要计算该数据时自然报错说找不到数据块。解决办法是使用MEMORY_AND_DISK_SER级别来存日志,当内存不够时直接写磁盘或者增大executor-memory的大小。

你可能感兴趣的:(spark-streaming:Could not compute split, block not found)