【Spark五十二】Spark Streaming整合Flume-NG一

 

Spark Stream代码:

package spark.examples.streaming

import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}

object SparkFlumeNGWordCount {
  def main(args : Array[String]) {
    val conf = new SparkConf().setAppName("SparkFlumeNGWordCount")
    val ssc = new StreamingContext(conf, Seconds(10))

    val lines = FlumeUtils.createStream(ssc,"localhost",9999)
    // Print out the count of events received from this server in each batch
    lines.count().map(cnt => "Received " + cnt + " flume events. at " + System.currentTimeMillis() ).print()
    ssc.start()
    ssc.awaitTermination();
  }
}

 

说明:9999端口是Spark Streaming启动的服务监听的端口,等待Flume向这个端口发送数据包,因此,这个例子表示,Flume的Sink会向9999写入数据。

 

Spark Streaming对接FlumeNG有两种方式,一种是FlumeNG将消息Push给Spark Streaming,如上例,Spark Streaming开启端口9999进行数据监听等待FlumeNG向该端口写入数据,FlumeNG使用avro sink把Spark Streaming所在的9999端口作为目标输出,The receiver acts as an Avro sink, and we need to configure Flume to send the data to the Avro sink。下面会进行配置

 

代码部署并启动Spark

1.将示例代码打成一个jar包spark-streaming-flume.jar中,

注意点:

1)由于Spark Streaming使用了FlumeUtils类,它包含在Spark集成Flume的包中,而这个包默认没有打在Spark的发行包中,因此需要将它打到jar包中,

flume-ng-core-1.5.2.jar
flume-ng-sdk-1.5.2.jar
spark-streaming-flume_2.11-1.2.0.jar

把上面的三个包打到spark-streaming-flume.jar中,原因是在实验中发现,通过给spark-submit脚本指定--jars的方式,spark streaming提示找不到jar包

 

2. 启动Spark Streaming

 

./spark-submit --deploy-mode client --name SparkFlumeEventCount --master spark://hadoop.master:7077 --executor-memory 512M --total-executor-cores 2  --class spark.examples.streaming.SparkFlumeNGWordCount spark-streaming-flume.jar

 

3. Spark Streaming启动后,telnet localhost 9999会发现,9999端口的服务由Spark Streaming启动,并且Spark Streaming进行监听

 

 

FlumeNG配置并启动

 1. FlumeNG配置

 

a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
#Flume startup 19999,which wait for avro client connects to it and send Avro Flume event
a1.sources.r1.port = 19999

# Describe the sink
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = localhost
###9999 is opened by other process, Flume will write data to it via Socket
a1.sinks.k1.port = 9999

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
1.1上面的配置的含义,Flume接收到的数据将发送到9999端口,配置avro sink

1.2Flume的数据接收源是监听19999端口,即Flume启动Agent时,会启动监听19999端口服务;同时,接收的数据的格式是avro格式

2. Flume Agent启动

使用如下脚本启动Flume Agent a1

 

./flume-ng agent -c . -f ../conf/spark.conf -n a1

启动后,flume将监听于19999端口

3. Flume Avro client启动

使用如下脚本启动Flume Avro client

 

./flume-ng avro-client -c . -H localhost -p 19999

 以上脚本启动avro client时,会连接到19999服务,此时在控制台可以输入数据

 

验证

经过上面的步骤,FlumeNG已经和Spark Streaming完成了对接,此时在启动Flume avro client的console输入数据可以观察启动Flume Agent的console和启动Spark Streaming的console,是否有数据显示

 

 

问题1: 忘了看下Avro数据格式是怎么样的了,使用这种avro格式的数据,RDD中的数据是avro格式的吗?

问题2:【revisit结果】:如下问题不是问题,原因在于提交application的时候,master是standalone cluster,需要使用两个以上的core(包括2个),只要配置虚拟机的core个数大于1即可解决)

 

当通过avro client写入数据后,只有在Spark刚启动时,看到有如下信息:

-------------------------------------------
Time: 1424151070000 ms
-------------------------------------------
Received 0 flume events. at 1424151077873

 

 

只在Spark Streaming的console上看到如下的信息:

 

15/02/17 00:43:00 INFO scheduler.TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
15/02/17 00:43:10 INFO scheduler.JobScheduler: Added jobs for time 1424151790000 ms
15/02/17 00:43:20 INFO scheduler.JobScheduler: Added jobs for time 1424151800000 ms
15/02/17 00:43:27 INFO storage.BlockManagerInfo: Added input-0-1424151807400 in memory on localhost:39338 (size: 1095.0 B, free: 267.2 MB)
15/02/17 00:43:30 INFO scheduler.JobScheduler: Added jobs for time 1424151810000 ms
15/02/17 00:43:40 INFO scheduler.JobScheduler: Added jobs for time 1424151820000 ms
15/02/17 00:43:50 INFO scheduler.JobScheduler: Added jobs for time 1424151830000 ms
15/02/17 00:44:00 INFO scheduler.JobScheduler: Added jobs for time 1424151840000 ms
15/02/17 00:44:10 INFO scheduler.JobScheduler: Added jobs for time 1424151850000 ms
15/02/17 00:44:20 INFO scheduler.JobScheduler: Added jobs for time 1424151860000 ms
15/02/17 00:44:30 INFO scheduler.JobScheduler: Added jobs for time 1424151870000 ms
15/02/17 00:44:40 INFO scheduler.JobScheduler: Added jobs for time 1424151880000 ms
15/02/17 00:44:50 INFO scheduler.JobScheduler: Added jobs for time 1424151890000 ms
15/02/17 00:45:00 INFO scheduler.JobScheduler: Added jobs for time 1424151900000 ms
15/02/17 00:45:10 INFO scheduler.JobScheduler: Added jobs for time 1424151910000 ms
15/02/17 00:45:20 INFO scheduler.JobScheduler: Added jobs for time 1424151920000 ms
15/02/17 00:45:30 INFO scheduler.JobScheduler: Added jobs for time 1424151930000 ms
15/02/17 00:45:40 INFO scheduler.JobScheduler: Added jobs for time 1424151940000 ms
15/02/17 00:45:50 INFO scheduler.JobScheduler: Added jobs for time 1424151950000 ms
15/02/17 00:46:00 INFO scheduler.JobScheduler: Added jobs for time 1424151960000 ms
15/02/17 00:46:10 INFO scheduler.JobScheduler: Added jobs for time 1424151970000 ms
15/02/17 00:46:20 INFO scheduler.JobScheduler: Added jobs for time 1424151980000 ms
15/02/17 00:46:30 INFO scheduler.JobScheduler: Added jobs for time 1424151990000 ms
15/02/17 00:46:40 INFO scheduler.JobScheduler: Added jobs for time 1424152000000 ms
15/02/17 00:46:50 INFO scheduler.JobScheduler: Added jobs for time 1424152010000 ms
15/02/17 00:47:00 INFO scheduler.JobScheduler: Added jobs for time 1424152020000 ms
15/02/17 00:47:10 INFO scheduler.JobScheduler: Added jobs for time 1424152030000 ms
15/02/17 00:47:20 INFO scheduler.JobScheduler: Added jobs for time 1424152040000 ms
15/02/17 00:47:30 INFO scheduler.JobScheduler: Added jobs for time 1424152050000 ms
15/02/17 00:47:40 INFO scheduler.JobScheduler: Added jobs for time 1424152060000 ms
15/02/17 00:47:50 INFO scheduler.JobScheduler: Added jobs for time 1424152070000 ms

 这是为什么??表示代码并没有被处理?

在Spark程序启动的过程中,看导如下的输出(表示两个Receiver?)

 

15/02/17 00:55:00 INFO scheduler.ReceiverTracker: Registered receiver for stream 0 from akka.tcp://sparkExecutor@localhost:48810
15/02/17 00:55:01 INFO scheduler.ReceiverTracker: Registered receiver for stream 0 from akka.tcp://sparkExecutor@localhost:48810

 

将total-executor-cores改为4还是不行

 

 

把Spark应用程序打印出来的日志记录下来

park assembly has been built with Hive, including Datanucleus jars on classpath
======================================================
15/02/17 01:09:15 INFO spark.SecurityManager: Changing view acls to: hadoop
15/02/17 01:09:15 INFO spark.SecurityManager: Changing modify acls to: hadoop
15/02/17 01:09:15 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/02/17 01:09:16 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/17 01:09:16 INFO Remoting: Starting remoting
15/02/17 01:09:16 INFO util.Utils: Successfully started service 'sparkDriver' on port 40328.
15/02/17 01:09:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:40328]
15/02/17 01:09:16 INFO spark.SparkEnv: Registering MapOutputTracker
15/02/17 01:09:16 INFO spark.SparkEnv: Registering BlockManagerMaster
15/02/17 01:09:16 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150217010916-d6ea
15/02/17 01:09:16 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/02/17 01:09:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/17 01:09:17 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-058c312c-2673-4d5a-9c02-5c8da54e05b2
15/02/17 01:09:17 INFO spark.HttpServer: Starting HTTP Server
15/02/17 01:09:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/17 01:09:18 INFO server.AbstractConnector: Started [email protected]:47160
15/02/17 01:09:18 INFO util.Utils: Successfully started service 'HTTP file server' on port 47160.
15/02/17 01:09:18 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/02/17 01:09:18 INFO server.AbstractConnector: Started [email protected]:4040
15/02/17 01:09:18 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/02/17 01:09:18 INFO ui.SparkUI: Started SparkUI at http://localhost:4040
15/02/17 01:09:18 INFO spark.SparkContext: Added JAR file:/home/hadoop/software/spark-1.2.0-bin-hadoop2.4/bin/Hello2.jar at http://localhost:47160/jars/Hello2.jar with timestamp 1424153358566
15/02/17 01:09:18 INFO client.AppClient$ClientActor: Connecting to master spark://hadoop.master:7077...
15/02/17 01:09:19 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150217010919-0005
15/02/17 01:09:19 INFO client.AppClient$ClientActor: Executor added: app-20150217010919-0005/0 on worker-20150216230045-localhost-43229 (localhost:43229) with 1 cores
15/02/17 01:09:19 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150217010919-0005/0 on hostPort localhost:43229 with 1 cores, 512.0 MB RAM
15/02/17 01:09:19 INFO client.AppClient$ClientActor: Executor updated: app-20150217010919-0005/0 is now LOADING
15/02/17 01:09:19 INFO client.AppClient$ClientActor: Executor updated: app-20150217010919-0005/0 is now RUNNING
15/02/17 01:09:19 INFO netty.NettyBlockTransferService: Server created on 54183
15/02/17 01:09:19 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/02/17 01:09:19 INFO storage.BlockManagerMasterActor: Registering block manager localhost:54183 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 54183)
15/02/17 01:09:20 INFO storage.BlockManagerMaster: Registered BlockManager
15/02/17 01:09:22 INFO scheduler.EventLoggingListener: Logging events to hdfs://hadoop.master:9000/user/hadoop/sparkevt/app-20150217010919-0005
15/02/17 01:09:22 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/02/17 01:09:23 INFO scheduler.ReceiverTracker: ReceiverTracker started
15/02/17 01:09:23 INFO dstream.ForEachDStream: metadataCleanupDelay = -1
15/02/17 01:09:23 INFO flume.FlumeInputDStream: metadataCleanupDelay = -1
15/02/17 01:09:23 INFO flume.FlumeInputDStream: Slide time = 10000 ms
15/02/17 01:09:23 INFO flume.FlumeInputDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/02/17 01:09:23 INFO flume.FlumeInputDStream: Checkpoint interval = null
15/02/17 01:09:23 INFO flume.FlumeInputDStream: Remember duration = 10000 ms
15/02/17 01:09:23 INFO flume.FlumeInputDStream: Initialized and validated org.apache.spark.streaming.flume.FlumeInputDStream@73406330
15/02/17 01:09:23 INFO dstream.ForEachDStream: Slide time = 10000 ms
15/02/17 01:09:23 INFO dstream.ForEachDStream: Storage level = StorageLevel(false, false, false, false, 1)
15/02/17 01:09:23 INFO dstream.ForEachDStream: Checkpoint interval = null
15/02/17 01:09:23 INFO dstream.ForEachDStream: Remember duration = 10000 ms
15/02/17 01:09:23 INFO dstream.ForEachDStream: Initialized and validated org.apache.spark.streaming.dstream.ForEachDStream@8301f6d
15/02/17 01:09:23 INFO util.RecurringTimer: Started timer for JobGenerator at time 1424153370000
15/02/17 01:09:23 INFO scheduler.JobGenerator: Started JobGenerator at 1424153370000 ms
15/02/17 01:09:23 INFO scheduler.JobScheduler: Started JobScheduler
15/02/17 01:09:23 INFO spark.SparkContext: Starting job: start at SparkFlumeNGWordCount.scala:22
15/02/17 01:09:23 INFO scheduler.DAGScheduler: Registering RDD 2 (start at SparkFlumeNGWordCount.scala:22)
15/02/17 01:09:23 INFO scheduler.DAGScheduler: Got job 0 (start at SparkFlumeNGWordCount.scala:22) with 20 output partitions (allowLocal=false)
15/02/17 01:09:23 INFO scheduler.DAGScheduler: Final stage: Stage 1(start at SparkFlumeNGWordCount.scala:22)
15/02/17 01:09:23 INFO scheduler.DAGScheduler: Parents of final stage: List(Stage 0)
15/02/17 01:09:23 INFO scheduler.DAGScheduler: Missing parents: List(Stage 0)
15/02/17 01:09:23 INFO scheduler.DAGScheduler: Submitting Stage 0 (MappedRDD[2] at start at SparkFlumeNGWordCount.scala:22), which has no missing parents
15/02/17 01:09:24 INFO storage.MemoryStore: ensureFreeSpace(2720) called with curMem=0, maxMem=280248975
15/02/17 01:09:24 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.7 KB, free 267.3 MB)
15/02/17 01:09:24 INFO storage.MemoryStore: ensureFreeSpace(1943) called with curMem=2720, maxMem=280248975
15/02/17 01:09:24 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1943.0 B, free 267.3 MB)
15/02/17 01:09:24 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:54183 (size: 1943.0 B, free: 267.3 MB)
15/02/17 01:09:24 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/17 01:09:24 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/02/17 01:09:24 INFO scheduler.DAGScheduler: Submitting 50 missing tasks from Stage 0 (MappedRDD[2] at start at SparkFlumeNGWordCount.scala:22)
15/02/17 01:09:24 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 50 tasks
15/02/17 01:09:27 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@localhost:40137/user/Executor#112804417] with ID 0
15/02/17 01:09:27 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:27 INFO storage.BlockManagerMasterActor: Registering block manager localhost:48519 with 267.3 MB RAM, BlockManagerId(0, localhost, 48519)
15/02/17 01:09:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:48519 (size: 1943.0 B, free: 267.3 MB)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2163 ms on localhost (1/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 257 ms on localhost (2/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 256 ms on localhost (3/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 274 ms on localhost (4/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 70 ms on localhost (5/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 45 ms on localhost (6/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 44 ms on localhost (7/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 69 ms on localhost (8/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 58 ms on localhost (9/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 10.0 in stage 0.0 (TID 10, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 37 ms on localhost (10/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10) in 107 ms on localhost (11/50)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 0.0 (TID 12, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:29 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 0.0 (TID 11) in 75 ms on localhost (12/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 0.0 (TID 13, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 45 ms on localhost (13/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 48 ms on localhost (14/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 15.0 in stage 0.0 (TID 15, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 14.0 in stage 0.0 (TID 14) in 39 ms on localhost (15/50)
15/02/17 01:09:30 INFO scheduler.JobScheduler: Added jobs for time 1424153370000 ms
15/02/17 01:09:30 INFO scheduler.JobScheduler: Starting job streaming job 1424153370000 ms.0 from job set of time 1424153370000 ms
15/02/17 01:09:30 INFO spark.SparkContext: Starting job: foreachRDD at SparkFlumeNGWordCount.scala:20
15/02/17 01:09:30 INFO scheduler.DAGScheduler: Job 1 finished: foreachRDD at SparkFlumeNGWordCount.scala:20, took 0.000868 s
15/02/17 01:09:30 INFO scheduler.JobScheduler: Finished job streaming job 1424153370000 ms.0 from job set of time 1424153370000 ms
15/02/17 01:09:30 INFO scheduler.JobScheduler: Total delay: 0.170 s for time 1424153370000 ms (execution: 0.040 s)
15/02/17 01:09:30 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer()
15/02/17 01:09:30 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer()
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 98 ms on localhost (16/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 36 ms on localhost (17/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 18.0 in stage 0.0 (TID 18, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 17.0 in stage 0.0 (TID 17) in 37 ms on localhost (18/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 18.0 in stage 0.0 (TID 18) in 47 ms on localhost (19/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 19.0 in stage 0.0 (TID 19) in 38 ms on localhost (20/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 20.0 in stage 0.0 (TID 20) in 37 ms on localhost (21/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 21.0 in stage 0.0 (TID 21) in 53 ms on localhost (22/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 23.0 in stage 0.0 (TID 23, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 22.0 in stage 0.0 (TID 22) in 37 ms on localhost (23/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 24.0 in stage 0.0 (TID 24, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 23.0 in stage 0.0 (TID 23) in 42 ms on localhost (24/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 25.0 in stage 0.0 (TID 25, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 24.0 in stage 0.0 (TID 24) in 63 ms on localhost (25/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 26.0 in stage 0.0 (TID 26, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 25.0 in stage 0.0 (TID 25) in 52 ms on localhost (26/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 27.0 in stage 0.0 (TID 27, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 26.0 in stage 0.0 (TID 26) in 47 ms on localhost (27/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 28.0 in stage 0.0 (TID 28, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 27.0 in stage 0.0 (TID 27) in 28 ms on localhost (28/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 29.0 in stage 0.0 (TID 29, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 28.0 in stage 0.0 (TID 28) in 32 ms on localhost (29/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 30.0 in stage 0.0 (TID 30, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 29.0 in stage 0.0 (TID 29) in 35 ms on localhost (30/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 31.0 in stage 0.0 (TID 31, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 30.0 in stage 0.0 (TID 30) in 36 ms on localhost (31/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 32.0 in stage 0.0 (TID 32, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 31.0 in stage 0.0 (TID 31) in 58 ms on localhost (32/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 33.0 in stage 0.0 (TID 33, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 32.0 in stage 0.0 (TID 32) in 37 ms on localhost (33/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 34.0 in stage 0.0 (TID 34, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 33.0 in stage 0.0 (TID 33) in 33 ms on localhost (34/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 35.0 in stage 0.0 (TID 35, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 34.0 in stage 0.0 (TID 34) in 49 ms on localhost (35/50)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Starting task 36.0 in stage 0.0 (TID 36, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:30 INFO scheduler.TaskSetManager: Finished task 35.0 in stage 0.0 (TID 35) in 37 ms on localhost (36/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 37.0 in stage 0.0 (TID 37, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 36.0 in stage 0.0 (TID 36) in 39 ms on localhost (37/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 38.0 in stage 0.0 (TID 38, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 37.0 in stage 0.0 (TID 37) in 53 ms on localhost (38/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 39.0 in stage 0.0 (TID 39, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 38.0 in stage 0.0 (TID 38) in 37 ms on localhost (39/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 40.0 in stage 0.0 (TID 40, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 39.0 in stage 0.0 (TID 39) in 30 ms on localhost (40/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 41.0 in stage 0.0 (TID 41, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 40.0 in stage 0.0 (TID 40) in 32 ms on localhost (41/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 42.0 in stage 0.0 (TID 42, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 41.0 in stage 0.0 (TID 41) in 44 ms on localhost (42/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 43.0 in stage 0.0 (TID 43, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 42.0 in stage 0.0 (TID 42) in 33 ms on localhost (43/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 44.0 in stage 0.0 (TID 44, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 43.0 in stage 0.0 (TID 43) in 62 ms on localhost (44/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 45.0 in stage 0.0 (TID 45, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 44.0 in stage 0.0 (TID 44) in 31 ms on localhost (45/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 46.0 in stage 0.0 (TID 46, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 45.0 in stage 0.0 (TID 45) in 37 ms on localhost (46/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 47.0 in stage 0.0 (TID 47, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 46.0 in stage 0.0 (TID 46) in 36 ms on localhost (47/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 48.0 in stage 0.0 (TID 48, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 47.0 in stage 0.0 (TID 47) in 42 ms on localhost (48/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 49.0 in stage 0.0 (TID 49, localhost, PROCESS_LOCAL, 1297 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 48.0 in stage 0.0 (TID 48) in 31 ms on localhost (49/50)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 49.0 in stage 0.0 (TID 49) in 33 ms on localhost (50/50)
15/02/17 01:09:31 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/02/17 01:09:31 INFO scheduler.DAGScheduler: Stage 0 (start at SparkFlumeNGWordCount.scala:22) finished in 7.011 s
15/02/17 01:09:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/02/17 01:09:31 INFO scheduler.DAGScheduler: running: Set()
15/02/17 01:09:31 INFO scheduler.DAGScheduler: waiting: Set(Stage 1)
15/02/17 01:09:31 INFO scheduler.DAGScheduler: failed: Set()
15/02/17 01:09:31 INFO scheduler.DAGScheduler: Missing parents for Stage 1: List()
15/02/17 01:09:31 INFO scheduler.DAGScheduler: Submitting Stage 1 (ShuffledRDD[3] at start at SparkFlumeNGWordCount.scala:22), which is now runnable
15/02/17 01:09:31 INFO storage.MemoryStore: ensureFreeSpace(2232) called with curMem=4663, maxMem=280248975
15/02/17 01:09:31 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.2 KB, free 267.3 MB)
15/02/17 01:09:31 INFO storage.MemoryStore: ensureFreeSpace(1642) called with curMem=6895, maxMem=280248975
15/02/17 01:09:31 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1642.0 B, free 267.3 MB)
15/02/17 01:09:31 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:54183 (size: 1642.0 B, free: 267.3 MB)
15/02/17 01:09:31 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/02/17 01:09:31 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/02/17 01:09:31 INFO scheduler.DAGScheduler: Submitting 20 missing tasks from Stage 1 (ShuffledRDD[3] at start at SparkFlumeNGWordCount.scala:22)
15/02/17 01:09:31 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 20 tasks
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 50, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:48519 (size: 1642.0 B, free: 267.3 MB)
15/02/17 01:09:31 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to sparkExecutor@localhost:40137
15/02/17 01:09:31 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 251 bytes
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 51, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 50) in 130 ms on localhost (1/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 1.0 (TID 52, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 51) in 41 ms on localhost (2/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 1.0 (TID 53, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 1.0 (TID 52) in 38 ms on localhost (3/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 1.0 (TID 54, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 1.0 (TID 53) in 51 ms on localhost (4/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 1.0 (TID 55, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 1.0 (TID 54) in 36 ms on localhost (5/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 1.0 (TID 56, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 1.0 (TID 55) in 41 ms on localhost (6/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 1.0 (TID 57, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 1.0 (TID 56) in 41 ms on localhost (7/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 1.0 (TID 58, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 1.0 (TID 57) in 37 ms on localhost (8/20)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 1.0 (TID 59, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:31 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 1.0 (TID 58) in 35 ms on localhost (9/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 10.0 in stage 1.0 (TID 60, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 1.0 (TID 59) in 39 ms on localhost (10/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 11.0 in stage 1.0 (TID 61, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 10.0 in stage 1.0 (TID 60) in 35 ms on localhost (11/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 12.0 in stage 1.0 (TID 62, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 11.0 in stage 1.0 (TID 61) in 35 ms on localhost (12/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 13.0 in stage 1.0 (TID 63, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 1.0 (TID 62) in 35 ms on localhost (13/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 14.0 in stage 1.0 (TID 64, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 13.0 in stage 1.0 (TID 63) in 43 ms on localhost (14/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 15.0 in stage 1.0 (TID 65, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 14.0 in stage 1.0 (TID 64) in 45 ms on localhost (15/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 16.0 in stage 1.0 (TID 66, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 15.0 in stage 1.0 (TID 65) in 31 ms on localhost (16/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 17.0 in stage 1.0 (TID 67, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 16.0 in stage 1.0 (TID 66) in 33 ms on localhost (17/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 18.0 in stage 1.0 (TID 68, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 17.0 in stage 1.0 (TID 67) in 36 ms on localhost (18/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 19.0 in stage 1.0 (TID 69, localhost, PROCESS_LOCAL, 1104 bytes)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 18.0 in stage 1.0 (TID 68) in 32 ms on localhost (19/20)
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Finished task 19.0 in stage 1.0 (TID 69) in 45 ms on localhost (20/20)
15/02/17 01:09:32 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Stage 1 (start at SparkFlumeNGWordCount.scala:22) finished in 0.814 s
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Job 0 finished: start at SparkFlumeNGWordCount.scala:22, took 8.603400 s
15/02/17 01:09:32 INFO scheduler.ReceiverTracker: Starting 1 receivers
15/02/17 01:09:32 INFO spark.SparkContext: Starting job: start at SparkFlumeNGWordCount.scala:22
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Got job 2 (start at SparkFlumeNGWordCount.scala:22) with 1 output partitions (allowLocal=false)
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Final stage: Stage 2(start at SparkFlumeNGWordCount.scala:22)
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Submitting Stage 2 (ParallelCollectionRDD[0] at start at SparkFlumeNGWordCount.scala:22), which has no missing parents
15/02/17 01:09:32 INFO storage.MemoryStore: ensureFreeSpace(55064) called with curMem=8537, maxMem=280248975
15/02/17 01:09:32 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 53.8 KB, free 267.2 MB)
15/02/17 01:09:32 INFO storage.MemoryStore: ensureFreeSpace(32760) called with curMem=63601, maxMem=280248975
15/02/17 01:09:32 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 32.0 KB, free 267.2 MB)
15/02/17 01:09:32 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:54183 (size: 32.0 KB, free: 267.2 MB)
15/02/17 01:09:32 INFO storage.BlockManagerMaster: Updated info of block broadcast_2_piece0
15/02/17 01:09:32 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/02/17 01:09:32 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 2 (ParallelCollectionRDD[0] at start at SparkFlumeNGWordCount.scala:22)
15/02/17 01:09:32 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/02/17 01:09:32 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 70, localhost, NODE_LOCAL, 1837 bytes)
15/02/17 01:09:32 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:48519 (size: 32.0 KB, free: 267.2 MB)
15/02/17 01:09:32 INFO scheduler.ReceiverTracker: Registered receiver for stream 0 from akka.tcp://sparkExecutor@localhost:40137
15/02/17 01:09:33 INFO scheduler.ReceiverTracker: Registered receiver for stream 0 from akka.tcp://sparkExecutor@localhost:40137
15/02/17 01:09:40 INFO scheduler.JobScheduler: Added jobs for time 1424153380000 ms
15/02/17 01:09:40 INFO spark.SparkContext: Starting job: foreachRDD at SparkFlumeNGWordCount.scala:20
15/02/17 01:09:40 INFO scheduler.DAGScheduler: Job 3 finished: foreachRDD at SparkFlumeNGWordCount.scala:20, took 0.000019 s
15/02/17 01:09:40 INFO scheduler.JobScheduler: Starting job streaming job 1424153380000 ms.0 from job set of time 1424153380000 ms
15/02/17 01:09:40 INFO scheduler.JobScheduler: Finished job streaming job 1424153380000 ms.0 from job set of time 1424153380000 ms
15/02/17 01:09:40 INFO scheduler.JobScheduler: Total delay: 0.004 s for time 1424153380000 ms (execution: 0.000 s)
15/02/17 01:09:40 INFO rdd.BlockRDD: Removing RDD 4 from persistence list
15/02/17 01:09:40 INFO storage.BlockManager: Removing RDD 4
15/02/17 01:09:40 INFO flume.FlumeInputDStream: Removing blocks of RDD BlockRDD[4] at createStream at SparkFlumeNGWordCount.scala:14 of time 1424153380000 ms
15/02/17 01:09:40 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer()
15/02/17 01:09:40 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer()
15/02/17 01:09:50 INFO spark.SparkContext: Starting job: foreachRDD at SparkFlumeNGWordCount.scala:20
15/02/17 01:09:50 INFO scheduler.DAGScheduler: Job 4 finished: foreachRDD at SparkFlumeNGWordCount.scala:20, took 0.000037 s
15/02/17 01:09:50 INFO scheduler.JobScheduler: Starting job streaming job 1424153390000 ms.0 from job set of time 1424153390000 ms
15/02/17 01:09:50 INFO scheduler.JobScheduler: Finished job streaming job 1424153390000 ms.0 from job set of time 1424153390000 ms
15/02/17 01:09:50 INFO scheduler.JobScheduler: Total delay: 0.012 s for time 1424153390000 ms (execution: 0.000 s)
15/02/17 01:09:50 INFO scheduler.JobScheduler: Added jobs for time 1424153390000 ms
15/02/17 01:09:50 INFO rdd.BlockRDD: Removing RDD 5 from persistence list
15/02/17 01:09:50 INFO storage.BlockManager: Removing RDD 5
15/02/17 01:09:50 INFO flume.FlumeInputDStream: Removing blocks of RDD BlockRDD[5] at createStream at SparkFlumeNGWordCount.scala:14 of time 1424153390000 ms
15/02/17 01:09:50 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer(1424153370000 ms)
15/02/17 01:09:50 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer()
15/02/17 01:10:00 INFO scheduler.JobScheduler: Added jobs for time 1424153400000 ms
15/02/17 01:10:00 INFO spark.SparkContext: Starting job: foreachRDD at SparkFlumeNGWordCount.scala:20
15/02/17 01:10:00 INFO scheduler.DAGScheduler: Job 5 finished: foreachRDD at SparkFlumeNGWordCount.scala:20, took 0.000020 s
15/02/17 01:10:00 INFO scheduler.JobScheduler: Starting job streaming job 1424153400000 ms.0 from job set of time 1424153400000 ms
15/02/17 01:10:00 INFO scheduler.JobScheduler: Finished job streaming job 1424153400000 ms.0 from job set of time 1424153400000 ms
15/02/17 01:10:00 INFO scheduler.JobScheduler: Total delay: 0.022 s for time 1424153400000 ms (execution: 0.000 s)
15/02/17 01:10:00 INFO rdd.BlockRDD: Removing RDD 6 from persistence list
15/02/17 01:10:00 INFO storage.BlockManager: Removing RDD 6
15/02/17 01:10:00 INFO flume.FlumeInputDStream: Removing blocks of RDD BlockRDD[6] at createStream at SparkFlumeNGWordCount.scala:14 of time 1424153400000 ms
15/02/17 01:10:00 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer(1424153380000 ms)
15/02/17 01:10:00 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer()
15/02/17 01:10:09 INFO storage.BlockManagerInfo: Added input-0-1424153409400 in memory on localhost:48519 (size: 897.0 B, free: 267.2 MB)
15/02/17 01:10:10 INFO scheduler.JobScheduler: Added jobs for time 1424153410000 ms
15/02/17 01:10:10 INFO spark.SparkContext: Starting job: foreachRDD at SparkFlumeNGWordCount.scala:20
15/02/17 01:10:10 INFO scheduler.JobScheduler: Starting job streaming job 1424153410000 ms.0 from job set of time 1424153410000 ms
15/02/17 01:10:10 INFO scheduler.DAGScheduler: Got job 6 (foreachRDD at SparkFlumeNGWordCount.scala:20) with 1 output partitions (allowLocal=false)
15/02/17 01:10:10 INFO scheduler.DAGScheduler: Final stage: Stage 3(foreachRDD at SparkFlumeNGWordCount.scala:20)
15/02/17 01:10:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/02/17 01:10:10 INFO scheduler.DAGScheduler: Missing parents: List()
15/02/17 01:10:10 INFO scheduler.DAGScheduler: Submitting Stage 3 (BlockRDD[8] at createStream at SparkFlumeNGWordCount.scala:14), which has no missing parents
15/02/17 01:10:10 INFO storage.MemoryStore: ensureFreeSpace(1016) called with curMem=96361, maxMem=280248975
15/02/17 01:10:10 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 1016.0 B, free 267.2 MB)
15/02/17 01:10:10 INFO storage.MemoryStore: ensureFreeSpace(758) called with curMem=97377, maxMem=280248975
15/02/17 01:10:10 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 758.0 B, free 267.2 MB)
15/02/17 01:10:10 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:54183 (size: 758.0 B, free: 267.2 MB)
15/02/17 01:10:10 INFO storage.BlockManagerMaster: Updated info of block broadcast_3_piece0
15/02/17 01:10:10 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838
15/02/17 01:10:10 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 3 (BlockRDD[8] at createStream at SparkFlumeNGWordCount.scala:14)
15/02/17 01:10:10 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
15/02/17 01:10:14 INFO storage.BlockManagerInfo: Added input-0-1424153413800 in memory on localhost:48519 (size: 181.0 B, free: 267.2 MB)
15/02/17 01:10:20 INFO scheduler.JobScheduler: Added jobs for time 1424153420000 ms
15/02/17 01:10:30 INFO scheduler.JobScheduler: Added jobs for time 1424153430000 ms
15/02/17 01:10:40 INFO scheduler.JobScheduler: Added jobs for time 1424153440000 ms
15/02/17 01:10:50 INFO scheduler.JobScheduler: Added jobs for time 1424153450000 ms
15/02/17 01:11:00 INFO scheduler.JobScheduler: Added jobs for time 1424153460000 ms
15/02/17 01:11:10 INFO scheduler.JobScheduler: Added jobs for time 1424153470000 ms
15/02/17 01:11:20 INFO scheduler.JobScheduler: Added jobs for time 1424153480000 ms
15/02/17 01:11:30 INFO scheduler.JobScheduler: Added jobs for time 1424153490000 ms
15/02/17 01:11:40 INFO scheduler.JobScheduler: Added jobs for time 1424153500000 ms
15/02/17 01:11:50 INFO scheduler.JobScheduler: Added jobs for time 1424153510000 ms
15/02/17 01:12:00 INFO scheduler.JobScheduler: Added jobs for time 1424153520000 ms
15/02/17 01:12:10 INFO scheduler.JobScheduler: Added jobs for time 1424153530000 ms
15/02/17 01:12:20 INFO scheduler.JobScheduler: Added jobs for time 1424153540000 ms

 

Push Based FlumeNG+Spark Streaming不足:

Despite its simplicity, the disadvantage of this approach is its lack of transactions. This increases the chance of losing small amounts of data in case of the failure of the worker node running the receiver. Furthermore, if the worker running the receiver fails, the system will try to launch the receiver at a different location, and Flume will need to be reconfigured to send to the new worker. This is often challenging to set up.

 

最后一点使得,这种方式基本不可用,生产环境下如果worker node挂了,整个Spark Streaming就不能工作了。

 

 

参考:

http://blog.csdn.net/lskyne/article/details/37561235

http://www.cnblogs.com/lxf20061900/p/3866252.html

http://www.iteblog.com/archives/1063

你可能感兴趣的:(Stream)