res43: Array[String] = Array(demo test can you see the best, walking in the sun, hello world, hello spark, demo test can you see the best, walking in the sun, hello world, hello spark)
flatMap
scala> val wordcount = file.flatMap(line=>line.split(" ")).map(word=>(word,1))
wordcount: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[42] at map at <console>:23
scala> wordcount.toArray
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
15/12/09 14:19:21 INFO SparkContext: Starting job: toArray at <console>:26
15/12/09 14:19:21 INFO DAGScheduler: Got job 36 (toArray at <console>:26) with 2 output partitions (allowLocal=false)
15/12/09 14:19:21 INFO DAGScheduler: Final stage: ResultStage 36(toArray at <console>:26)
15/12/09 14:19:21 INFO DAGScheduler: Parents of final stage: List()
15/12/09 14:19:21 INFO DAGScheduler: Missing parents: List()
15/12/09 14:19:21 INFO DAGScheduler: Submitting ResultStage 36 (MapPartitionsRDD[42] at map at <console>:23), which has no missing parents
15/12/09 14:19:21 INFO MemoryStore: ensureFreeSpace(3488) called with curMem=1564303, maxMem=277877882
15/12/09 14:19:21 INFO MemoryStore: Block broadcast_44 stored as values in memory (estimated size 3.4 KB, free 263.5 MB)
15/12/09 14:19:21 INFO MemoryStore: ensureFreeSpace(1927) called with curMem=1567791, maxMem=277877882
15/12/09 14:19:21 INFO MemoryStore: Block broadcast_44_piece0 stored as bytes in memory (estimated size 1927.0 B, free 263.5 MB)
15/12/09 14:19:21 INFO BlockManagerInfo: Added broadcast_44_piece0 in memory on 10.28.23.201:60179 (size: 1927.0 B, free: 264.9 MB)
15/12/09 14:19:21 INFO SparkContext: Created broadcast 44 from broadcast at DAGScheduler.scala:874
15/12/09 14:19:21 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 36 (MapPartitionsRDD[42] at map at <console>:23)
15/12/09 14:19:21 INFO TaskSchedulerImpl: Adding task set 36.0 with 2 tasks
15/12/09 14:19:21 INFO TaskSetManager: Starting task 0.0 in stage 36.0 (TID 62, 10.28.23.201, PROCESS_LOCAL, 1397 bytes)
15/12/09 14:19:21 INFO TaskSetManager: Starting task 1.0 in stage 36.0 (TID 63, 10.28.23.203, PROCESS_LOCAL, 1397 bytes)
15/12/09 14:19:21 INFO BlockManagerInfo: Added broadcast_44_piece0 in memory on 10.28.23.201:51294 (size: 1927.0 B, free: 264.9 MB)
15/12/09 14:19:21 INFO BlockManagerInfo: Added broadcast_44_piece0 in memory on 10.28.23.203:57813 (size: 1927.0 B, free: 264.9 MB)
15/12/09 14:19:21 INFO TaskSetManager: Finished task 1.0 in stage 36.0 (TID 63) in 107 ms on 10.28.23.203 (1/2)
15/12/09 14:19:21 INFO TaskSetManager: Finished task 0.0 in stage 36.0 (TID 62) in 133 ms on 10.28.23.201 (2/2)
15/12/09 14:19:21 INFO TaskSchedulerImpl: Removed TaskSet 36.0, whose tasks have all completed, from pool
15/12/09 14:19:21 INFO DAGScheduler: ResultStage 36 (toArray at <console>:26) finished in 0.141 s
15/12/09 14:19:21 INFO DAGScheduler: Job 36 finished: toArray at <console>:26, took 0.198194 s
res53: Array[(String, Int)] = Array((demo,1), (test,1), (can,1), (you,1), (see,1), (the,1), (best,1), (walking,1), (in,1), (the,1), (sun,1), (hello,1), (world,1), (hello,1), (spark,1))
groupByKey
scala> wordcount.groupByKey().toArray
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
15/12/09 14:20:32 INFO SparkContext: Starting job: toArray at <console>:28
15/12/09 14:20:32 INFO DAGScheduler: Registering RDD 42 (map at <console>:23)
15/12/09 14:20:32 INFO DAGScheduler: Got job 37 (toArray at <console>:28) with 2 output partitions (allowLocal=false)
15/12/09 14:20:32 INFO DAGScheduler: Final stage: ResultStage 38(toArray at <console>:28)
15/12/09 14:20:32 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 37)
15/12/09 14:20:32 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 37)
15/12/09 14:20:32 INFO DAGScheduler: Submitting ShuffleMapStage 37 (MapPartitionsRDD[42] at map at <console>:23), which has no missing parents
15/12/09 14:20:32 INFO MemoryStore: ensureFreeSpace(4864) called with curMem=1569718, maxMem=277877882
15/12/09 14:20:32 INFO MemoryStore: Block broadcast_45 stored as values in memory (estimated size 4.8 KB, free 263.5 MB)
15/12/09 14:20:32 INFO MemoryStore: ensureFreeSpace(2564) called with curMem=1574582, maxMem=277877882
15/12/09 14:20:32 INFO MemoryStore: Block broadcast_45_piece0 stored as bytes in memory (estimated size 2.5 KB, free 263.5 MB)
15/12/09 14:20:32 INFO BlockManagerInfo: Added broadcast_45_piece0 in memory on 10.28.23.201:60179 (size: 2.5 KB, free: 264.8 MB)
15/12/09 14:20:32 INFO SparkContext: Created broadcast 45 from broadcast at DAGScheduler.scala:874
15/12/09 14:20:32 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 37 (MapPartitionsRDD[42] at map at <console>:23)
15/12/09 14:20:32 INFO TaskSchedulerImpl: Adding task set 37.0 with 2 tasks
15/12/09 14:20:32 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID 64, 10.28.23.202, PROCESS_LOCAL, 1386 bytes)
15/12/09 14:20:32 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID 65, 10.28.23.201, PROCESS_LOCAL, 1386 bytes)
15/12/09 14:20:32 INFO BlockManagerInfo: Added broadcast_45_piece0 in memory on 10.28.23.202:50706 (size: 2.5 KB, free: 264.9 MB)
15/12/09 14:20:32 INFO BlockManagerInfo: Added broadcast_45_piece0 in memory on 10.28.23.201:51294 (size: 2.5 KB, free: 264.9 MB)
15/12/09 14:20:32 INFO TaskSetManager: Finished task 0.0 in stage 37.0 (TID 64) in 274 ms on 10.28.23.202 (1/2)
15/12/09 14:20:34 INFO TaskSetManager: Finished task 1.0 in stage 37.0 (TID 65) in 1690 ms on 10.28.23.201 (2/2)
15/12/09 14:20:34 INFO TaskSchedulerImpl: Removed TaskSet 37.0, whose tasks have all completed, from pool
15/12/09 14:20:34 INFO DAGScheduler: ShuffleMapStage 37 (map at <console>:23) finished in 1.691 s
15/12/09 14:20:34 INFO DAGScheduler: looking for newly runnable stages
15/12/09 14:20:34 INFO DAGScheduler: running: Set()
15/12/09 14:20:34 INFO DAGScheduler: waiting: Set(ResultStage 38)
15/12/09 14:20:34 INFO DAGScheduler: failed: Set()
15/12/09 14:20:34 INFO DAGScheduler: Missing parents for ResultStage 38: List()
15/12/09 14:20:34 INFO DAGScheduler: Submitting ResultStage 38 (ShuffledRDD[43] at groupByKey at <console>:25), which is now runnable
15/12/09 14:20:34 INFO MemoryStore: ensureFreeSpace(5384) called with curMem=1577146, maxMem=277877882
15/12/09 14:20:34 INFO MemoryStore: Block broadcast_46 stored as values in memory (estimated size 5.3 KB, free 263.5 MB)
15/12/09 14:20:34 INFO MemoryStore: ensureFreeSpace(2765) called with curMem=1582530, maxMem=277877882
15/12/09 14:20:34 INFO MemoryStore: Block broadcast_46_piece0 stored as bytes in memory (estimated size 2.7 KB, free 263.5 MB)
15/12/09 14:20:34 INFO BlockManagerInfo: Added broadcast_46_piece0 in memory on 10.28.23.201:60179 (size: 2.7 KB, free: 264.8 MB)
15/12/09 14:20:34 INFO SparkContext: Created broadcast 46 from broadcast at DAGScheduler.scala:874
15/12/09 14:20:34 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 38 (ShuffledRDD[43] at groupByKey at <console>:25)
15/12/09 14:20:34 INFO TaskSchedulerImpl: Adding task set 38.0 with 2 tasks
15/12/09 14:20:34 INFO TaskSetManager: Starting task 0.0 in stage 38.0 (TID 66, 10.28.23.203, PROCESS_LOCAL, 1165 bytes)
15/12/09 14:20:34 INFO TaskSetManager: Starting task 1.0 in stage 38.0 (TID 67, 10.28.23.202, PROCESS_LOCAL, 1165 bytes)
15/12/09 14:20:34 INFO BlockManagerInfo: Added broadcast_46_piece0 in memory on 10.28.23.202:50706 (size: 2.7 KB, free: 264.9 MB)
15/12/09 14:20:34 INFO BlockManagerInfo: Added broadcast_46_piece0 in memory on 10.28.23.203:57813 (size: 2.7 KB, free: 264.9 MB)
15/12/09 14:20:34 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.28.23.202:53700
15/12/09 14:20:34 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 162 bytes
15/12/09 14:20:34 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.28.23.203:56943
15/12/09 14:20:34 INFO TaskSetManager: Finished task 1.0 in stage 38.0 (TID 67) in 552 ms on 10.28.23.202 (1/2)
15/12/09 14:20:34 INFO TaskSetManager: Finished task 0.0 in stage 38.0 (TID 66) in 558 ms on 10.28.23.203 (2/2)
15/12/09 14:20:34 INFO TaskSchedulerImpl: Removed TaskSet 38.0, whose tasks have all completed, from pool
15/12/09 14:20:34 INFO DAGScheduler: ResultStage 38 (toArray at <console>:28) finished in 0.558 s
15/12/09 14:20:34 INFO DAGScheduler: Job 37 finished: toArray at <console>:28, took 2.518733 s
res54: Array[(String, Iterable[Int])] = Array((can,CompactBuffer(1)), (best,CompactBuffer(1)), (hello,CompactBuffer(1, 1)), (sun,CompactBuffer(1)), (test,CompactBuffer(1)), (world,CompactBuffer(1)), (walking,CompactBuffer(1)), (spark,CompactBuffer(1)), (you,CompactBuffer(1)), (demo,CompactBuffer(1)), (in,CompactBuffer(1)), (see,CompactBuffer(1)), (the,CompactBuffer(1, 1)))
distinct
scala> wordcount.count()
15/12/09 14:29:52 INFO SparkContext: Starting job: count at <console>:26
15/12/09 14:29:52 INFO DAGScheduler: Got job 48 (count at <console>:26) with 2 output partitions (allowLocal=false)
15/12/09 14:29:52 INFO DAGScheduler: Final stage: ResultStage 58(count at <console>:26)
15/12/09 14:29:52 INFO DAGScheduler: Parents of final stage: List()
15/12/09 14:29:52 INFO DAGScheduler: Missing parents: List()
15/12/09 14:29:52 INFO DAGScheduler: Submitting ResultStage 58 (MapPartitionsRDD[42] at map at <console>:23), which has no missing parents
15/12/09 14:29:52 INFO MemoryStore: ensureFreeSpace(3344) called with curMem=1579816, maxMem=277877882
15/12/09 14:29:52 INFO MemoryStore: Block broadcast_65 stored as values in memory (estimated size 3.3 KB, free 263.5 MB)
15/12/09 14:29:52 INFO MemoryStore: ensureFreeSpace(1893) called with curMem=1583160, maxMem=277877882
15/12/09 14:29:52 INFO MemoryStore: Block broadcast_65_piece0 stored as bytes in memory (estimated size 1893.0 B, free 263.5 MB)
15/12/09 14:29:52 INFO BlockManagerInfo: Added broadcast_65_piece0 in memory on 10.28.23.201:60179 (size: 1893.0 B, free: 264.8 MB)
15/12/09 14:29:52 INFO SparkContext: Created broadcast 65 from broadcast at DAGScheduler.scala:874
15/12/09 14:29:52 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 58 (MapPartitionsRDD[42] at map at <console>:23)
15/12/09 14:29:52 INFO TaskSchedulerImpl: Adding task set 58.0 with 2 tasks
15/12/09 14:29:52 INFO TaskSetManager: Starting task 0.0 in stage 58.0 (TID 104, 10.28.23.203, PROCESS_LOCAL, 1397 bytes)
15/12/09 14:29:52 INFO TaskSetManager: Starting task 1.0 in stage 58.0 (TID 105, 10.28.23.201, PROCESS_LOCAL, 1397 bytes)
15/12/09 14:29:52 INFO BlockManagerInfo: Added broadcast_65_piece0 in memory on 10.28.23.201:51294 (size: 1893.0 B, free: 264.9 MB)
15/12/09 14:29:52 INFO BlockManagerInfo: Added broadcast_65_piece0 in memory on 10.28.23.203:57813 (size: 1893.0 B, free: 264.9 MB)
15/12/09 14:29:52 INFO TaskSetManager: Finished task 0.0 in stage 58.0 (TID 104) in 67 ms on 10.28.23.203 (1/2)
15/12/09 14:29:52 INFO TaskSetManager: Finished task 1.0 in stage 58.0 (TID 105) in 69 ms on 10.28.23.201 (2/2)
15/12/09 14:29:52 INFO DAGScheduler: ResultStage 58 (count at <console>:26) finished in 0.083 s
15/12/09 14:29:52 INFO TaskSchedulerImpl: Removed TaskSet 58.0, whose tasks have all completed, from pool
15/12/09 14:29:52 INFO DAGScheduler: Job 48 finished: count at <console>:26, took 0.119783 s
res72: Long = 15
scala> wordcount.distinct.count()
15/12/09 14:30:10 INFO SparkContext: Starting job: count at <console>:26
15/12/09 14:30:10 INFO DAGScheduler: Registering RDD 52 (distinct at <console>:26)
15/12/09 14:30:10 INFO DAGScheduler: Got job 50 (count at <console>:26) with 2 output partitions (allowLocal=false)
15/12/09 14:30:10 INFO DAGScheduler: Final stage: ResultStage 61(count at <console>:26)
15/12/09 14:30:10 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 60)
15/12/09 14:30:10 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 60)
15/12/09 14:30:10 INFO DAGScheduler: Submitting ShuffleMapStage 60 (MapPartitionsRDD[52] at distinct at <console>:26), which has no missing parents
15/12/09 14:30:10 INFO MemoryStore: ensureFreeSpace(4232) called with curMem=1564303, maxMem=277877882
15/12/09 14:30:10 INFO MemoryStore: Block broadcast_67 stored as values in memory (estimated size 4.1 KB, free 263.5 MB)
15/12/09 14:30:10 INFO MemoryStore: ensureFreeSpace(2306) called with curMem=1568535, maxMem=277877882
15/12/09 14:30:10 INFO MemoryStore: Block broadcast_67_piece0 stored as bytes in memory (estimated size 2.3 KB, free 263.5 MB)
15/12/09 14:30:10 INFO BlockManagerInfo: Added broadcast_67_piece0 in memory on 10.28.23.201:60179 (size: 2.3 KB, free: 264.9 MB)
15/12/09 14:30:10 INFO SparkContext: Created broadcast 67 from broadcast at DAGScheduler.scala:874
15/12/09 14:30:10 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 60 (MapPartitionsRDD[52] at distinct at <console>:26)
15/12/09 14:30:10 INFO TaskSchedulerImpl: Adding task set 60.0 with 2 tasks
15/12/09 14:30:10 INFO TaskSetManager: Starting task 0.0 in stage 60.0 (TID 108, 10.28.23.203, PROCESS_LOCAL, 1386 bytes)
15/12/09 14:30:10 INFO TaskSetManager: Starting task 1.0 in stage 60.0 (TID 109, 10.28.23.201, PROCESS_LOCAL, 1386 bytes)
15/12/09 14:30:10 INFO BlockManagerInfo: Added broadcast_67_piece0 in memory on 10.28.23.201:51294 (size: 2.3 KB, free: 264.9 MB)
15/12/09 14:30:10 INFO BlockManagerInfo: Added broadcast_67_piece0 in memory on 10.28.23.203:57813 (size: 2.3 KB, free: 264.9 MB)
15/12/09 14:30:10 INFO TaskSetManager: Finished task 0.0 in stage 60.0 (TID 108) in 75 ms on 10.28.23.203 (1/2)
15/12/09 14:30:10 INFO TaskSetManager: Finished task 1.0 in stage 60.0 (TID 109) in 135 ms on 10.28.23.201 (2/2)
15/12/09 14:30:10 INFO TaskSchedulerImpl: Removed TaskSet 60.0, whose tasks have all completed, from pool
15/12/09 14:30:10 INFO DAGScheduler: ShuffleMapStage 60 (distinct at <console>:26) finished in 0.135 s
15/12/09 14:30:10 INFO DAGScheduler: looking for newly runnable stages
15/12/09 14:30:10 INFO DAGScheduler: running: Set()
15/12/09 14:30:10 INFO DAGScheduler: waiting: Set(ResultStage 61)
15/12/09 14:30:10 INFO DAGScheduler: failed: Set()
15/12/09 14:30:10 INFO DAGScheduler: Missing parents for ResultStage 61: List()
15/12/09 14:30:10 INFO DAGScheduler: Submitting ResultStage 61 (MapPartitionsRDD[54] at distinct at <console>:26), which is now runnable
15/12/09 14:30:10 INFO MemoryStore: ensureFreeSpace(2584) called with curMem=1570841, maxMem=277877882
15/12/09 14:30:10 INFO MemoryStore: Block broadcast_68 stored as values in memory (estimated size 2.5 KB, free 263.5 MB)
15/12/09 14:30:10 INFO MemoryStore: ensureFreeSpace(1530) called with curMem=1573425, maxMem=277877882
15/12/09 14:30:10 INFO MemoryStore: Block broadcast_68_piece0 stored as bytes in memory (estimated size 1530.0 B, free 263.5 MB)
15/12/09 14:30:10 INFO BlockManagerInfo: Added broadcast_68_piece0 in memory on 10.28.23.201:60179 (size: 1530.0 B, free: 264.8 MB)
15/12/09 14:30:10 INFO SparkContext: Created broadcast 68 from broadcast at DAGScheduler.scala:874
15/12/09 14:30:10 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 61 (MapPartitionsRDD[54] at distinct at <console>:26)
15/12/09 14:30:10 INFO TaskSchedulerImpl: Adding task set 61.0 with 2 tasks
15/12/09 14:30:10 INFO TaskSetManager: Starting task 0.0 in stage 61.0 (TID 110, 10.28.23.201, PROCESS_LOCAL, 1165 bytes)
15/12/09 14:30:10 INFO TaskSetManager: Starting task 1.0 in stage 61.0 (TID 111, 10.28.23.203, PROCESS_LOCAL, 1165 bytes)
15/12/09 14:30:10 INFO BlockManagerInfo: Added broadcast_68_piece0 in memory on 10.28.23.201:51294 (size: 1530.0 B, free: 264.9 MB)
15/12/09 14:30:10 INFO BlockManagerInfo: Added broadcast_68_piece0 in memory on 10.28.23.203:57813 (size: 1530.0 B, free: 264.9 MB)
15/12/09 14:30:10 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 9 to 10.28.23.201:44784
15/12/09 14:30:10 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 9 is 163 bytes
15/12/09 14:30:10 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 9 to 10.28.23.203:56943
15/12/09 14:30:10 INFO TaskSetManager: Finished task 1.0 in stage 61.0 (TID 111) in 101 ms on 10.28.23.203 (1/2)
15/12/09 14:30:10 INFO TaskSetManager: Finished task 0.0 in stage 61.0 (TID 110) in 103 ms on 10.28.23.201 (2/2)
15/12/09 14:30:10 INFO TaskSchedulerImpl: Removed TaskSet 61.0, whose tasks have all completed, from pool
15/12/09 14:30:10 INFO DAGScheduler: ResultStage 61 (count at <console>:26) finished in 0.103 s
15/12/09 14:30:10 INFO DAGScheduler: Job 50 finished: count at <console>:26, took 0.267361 s
res74: Long = 13
sortByKey
scala> wordcount.sortByKey(true) take 20
15/12/09 14:31:41 INFO SparkContext: Starting job: sortByKey at <console>:26
15/12/09 14:31:41 INFO DAGScheduler: Got job 53 (sortByKey at <console>:26) with 2 output partitions (allowLocal=false)
15/12/09 14:31:41 INFO DAGScheduler: Final stage: ResultStage 64(sortByKey at <console>:26)
15/12/09 14:31:41 INFO DAGScheduler: Parents of final stage: List()
15/12/09 14:31:41 INFO DAGScheduler: Missing parents: List()
15/12/09 14:31:41 INFO DAGScheduler: Submitting ResultStage 64 (MapPartitionsRDD[56] at sortByKey at <console>:26), which has no missing parents
15/12/09 14:31:41 INFO MemoryStore: ensureFreeSpace(3984) called with curMem=1564303, maxMem=277877882
15/12/09 14:31:41 INFO MemoryStore: Block broadcast_71 stored as values in memory (estimated size 3.9 KB, free 263.5 MB)
15/12/09 14:31:41 INFO MemoryStore: ensureFreeSpace(2127) called with curMem=1568287, maxMem=277877882
15/12/09 14:31:41 INFO MemoryStore: Block broadcast_71_piece0 stored as bytes in memory (estimated size 2.1 KB, free 263.5 MB)
15/12/09 14:31:41 INFO BlockManagerInfo: Added broadcast_71_piece0 in memory on 10.28.23.201:60179 (size: 2.1 KB, free: 264.9 MB)
15/12/09 14:31:41 INFO SparkContext: Created broadcast 71 from broadcast at DAGScheduler.scala:874
15/12/09 14:31:41 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 64 (MapPartitionsRDD[56] at sortByKey at <console>:26)
15/12/09 14:31:41 INFO TaskSchedulerImpl: Adding task set 64.0 with 2 tasks
15/12/09 14:31:41 INFO TaskSetManager: Starting task 0.0 in stage 64.0 (TID 116, 10.28.23.202, PROCESS_LOCAL, 1397 bytes)
15/12/09 14:31:41 INFO TaskSetManager: Starting task 1.0 in stage 64.0 (TID 117, 10.28.23.203, PROCESS_LOCAL, 1397 bytes)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_71_piece0 in memory on 10.28.23.202:50706 (size: 2.1 KB, free: 264.9 MB)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_71_piece0 in memory on 10.28.23.203:57813 (size: 2.1 KB, free: 264.9 MB)
15/12/09 14:31:42 INFO TaskSetManager: Finished task 0.0 in stage 64.0 (TID 116) in 86 ms on 10.28.23.202 (1/2)
15/12/09 14:31:42 INFO TaskSetManager: Finished task 1.0 in stage 64.0 (TID 117) in 83 ms on 10.28.23.203 (2/2)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Removed TaskSet 64.0, whose tasks have all completed, from pool
15/12/09 14:31:42 INFO DAGScheduler: ResultStage 64 (sortByKey at <console>:26) finished in 0.087 s
15/12/09 14:31:42 INFO DAGScheduler: Job 53 finished: sortByKey at <console>:26, took 0.124303 s
15/12/09 14:31:42 INFO SparkContext: Starting job: take at <console>:26
15/12/09 14:31:42 INFO DAGScheduler: Registering RDD 42 (map at <console>:23)
15/12/09 14:31:42 INFO DAGScheduler: Got job 54 (take at <console>:26) with 1 output partitions (allowLocal=true)
15/12/09 14:31:42 INFO DAGScheduler: Final stage: ResultStage 66(take at <console>:26)
15/12/09 14:31:42 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 65)
15/12/09 14:31:42 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 65)
15/12/09 14:31:42 INFO DAGScheduler: Submitting ShuffleMapStage 65 (MapPartitionsRDD[42] at map at <console>:23), which has no missing parents
15/12/09 14:31:42 INFO MemoryStore: ensureFreeSpace(4288) called with curMem=1570414, maxMem=277877882
15/12/09 14:31:42 INFO MemoryStore: Block broadcast_72 stored as values in memory (estimated size 4.2 KB, free 263.5 MB)
15/12/09 14:31:42 INFO MemoryStore: ensureFreeSpace(2366) called with curMem=1574702, maxMem=277877882
15/12/09 14:31:42 INFO MemoryStore: Block broadcast_72_piece0 stored as bytes in memory (estimated size 2.3 KB, free 263.5 MB)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_72_piece0 in memory on 10.28.23.201:60179 (size: 2.3 KB, free: 264.8 MB)
15/12/09 14:31:42 INFO SparkContext: Created broadcast 72 from broadcast at DAGScheduler.scala:874
15/12/09 14:31:42 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 65 (MapPartitionsRDD[42] at map at <console>:23)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Adding task set 65.0 with 2 tasks
15/12/09 14:31:42 INFO TaskSetManager: Starting task 0.0 in stage 65.0 (TID 118, 10.28.23.202, PROCESS_LOCAL, 1386 bytes)
15/12/09 14:31:42 INFO TaskSetManager: Starting task 1.0 in stage 65.0 (TID 119, 10.28.23.203, PROCESS_LOCAL, 1386 bytes)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_72_piece0 in memory on 10.28.23.203:57813 (size: 2.3 KB, free: 264.9 MB)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_72_piece0 in memory on 10.28.23.202:50706 (size: 2.3 KB, free: 264.9 MB)
15/12/09 14:31:42 INFO TaskSetManager: Finished task 1.0 in stage 65.0 (TID 119) in 61 ms on 10.28.23.203 (1/2)
15/12/09 14:31:42 INFO TaskSetManager: Finished task 0.0 in stage 65.0 (TID 118) in 67 ms on 10.28.23.202 (2/2)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Removed TaskSet 65.0, whose tasks have all completed, from pool
15/12/09 14:31:42 INFO DAGScheduler: ShuffleMapStage 65 (map at <console>:23) finished in 0.066 s
15/12/09 14:31:42 INFO DAGScheduler: looking for newly runnable stages
15/12/09 14:31:42 INFO DAGScheduler: running: Set()
15/12/09 14:31:42 INFO DAGScheduler: waiting: Set(ResultStage 66)
15/12/09 14:31:42 INFO DAGScheduler: failed: Set()
15/12/09 14:31:42 INFO DAGScheduler: Missing parents for ResultStage 66: List()
15/12/09 14:31:42 INFO DAGScheduler: Submitting ResultStage 66 (ShuffledRDD[57] at sortByKey at <console>:26), which is now runnable
15/12/09 14:31:42 INFO MemoryStore: ensureFreeSpace(2528) called with curMem=1577068, maxMem=277877882
15/12/09 14:31:42 INFO MemoryStore: Block broadcast_73 stored as values in memory (estimated size 2.5 KB, free 263.5 MB)
15/12/09 14:31:42 INFO MemoryStore: ensureFreeSpace(1512) called with curMem=1579596, maxMem=277877882
15/12/09 14:31:42 INFO MemoryStore: Block broadcast_73_piece0 stored as bytes in memory (estimated size 1512.0 B, free 263.5 MB)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_73_piece0 in memory on 10.28.23.201:60179 (size: 1512.0 B, free: 264.8 MB)
15/12/09 14:31:42 INFO SparkContext: Created broadcast 73 from broadcast at DAGScheduler.scala:874
15/12/09 14:31:42 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 66 (ShuffledRDD[57] at sortByKey at <console>:26)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Adding task set 66.0 with 1 tasks
15/12/09 14:31:42 INFO TaskSetManager: Starting task 0.0 in stage 66.0 (TID 120, 10.28.23.201, PROCESS_LOCAL, 1165 bytes)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_73_piece0 in memory on 10.28.23.201:51294 (size: 1512.0 B, free: 264.9 MB)
15/12/09 14:31:42 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 10 to 10.28.23.201:44784
15/12/09 14:31:42 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 10 is 163 bytes
15/12/09 14:31:42 INFO TaskSetManager: Finished task 0.0 in stage 66.0 (TID 120) in 122 ms on 10.28.23.201 (1/1)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Removed TaskSet 66.0, whose tasks have all completed, from pool
15/12/09 14:31:42 INFO DAGScheduler: ResultStage 66 (take at <console>:26) finished in 0.123 s
15/12/09 14:31:42 INFO DAGScheduler: Job 54 finished: take at <console>:26, took 0.218343 s
15/12/09 14:31:42 INFO SparkContext: Starting job: take at <console>:26
15/12/09 14:31:42 INFO DAGScheduler: Got job 55 (take at <console>:26) with 1 output partitions (allowLocal=true)
15/12/09 14:31:42 INFO DAGScheduler: Final stage: ResultStage 68(take at <console>:26)
15/12/09 14:31:42 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 67)
15/12/09 14:31:42 INFO DAGScheduler: Missing parents: List()
15/12/09 14:31:42 INFO DAGScheduler: Submitting ResultStage 68 (ShuffledRDD[57] at sortByKey at <console>:26), which has no missing parents
15/12/09 14:31:42 INFO MemoryStore: ensureFreeSpace(2528) called with curMem=1581108, maxMem=277877882
15/12/09 14:31:42 INFO MemoryStore: Block broadcast_74 stored as values in memory (estimated size 2.5 KB, free 263.5 MB)
15/12/09 14:31:42 INFO MemoryStore: ensureFreeSpace(1512) called with curMem=1583636, maxMem=277877882
15/12/09 14:31:42 INFO MemoryStore: Block broadcast_74_piece0 stored as bytes in memory (estimated size 1512.0 B, free 263.5 MB)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_74_piece0 in memory on 10.28.23.201:60179 (size: 1512.0 B, free: 264.8 MB)
15/12/09 14:31:42 INFO SparkContext: Created broadcast 74 from broadcast at DAGScheduler.scala:874
15/12/09 14:31:42 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 68 (ShuffledRDD[57] at sortByKey at <console>:26)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Adding task set 68.0 with 1 tasks
15/12/09 14:31:42 INFO TaskSetManager: Starting task 0.0 in stage 68.0 (TID 121, 10.28.23.202, PROCESS_LOCAL, 1165 bytes)
15/12/09 14:31:42 INFO BlockManagerInfo: Added broadcast_74_piece0 in memory on 10.28.23.202:50706 (size: 1512.0 B, free: 264.9 MB)
15/12/09 14:31:42 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 10 to 10.28.23.202:53700
15/12/09 14:31:42 INFO TaskSetManager: Finished task 0.0 in stage 68.0 (TID 121) in 55 ms on 10.28.23.202 (1/1)
15/12/09 14:31:42 INFO TaskSchedulerImpl: Removed TaskSet 68.0, whose tasks have all completed, from pool
15/12/09 14:31:42 INFO DAGScheduler: ResultStage 68 (take at <console>:26) finished in 0.055 s
15/12/09 14:31:42 INFO DAGScheduler: Job 55 finished: take at <console>:26, took 0.070686 s
res77: Array[(String, Int)] = Array((best,1), (can,1), (demo,1), (hello,1), (hello,1), (in,1), (see,1), (spark,1), (sun,1), (test,1), (the,1), (the,1), (walking,1), (world,1), (you,1))