Spark中广播变量的作用(foreach往map添加数据后没有数据)

给定如下代码:

 
  
import org.apache.spark.{SparkConf, SparkContext}

object draft2 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("评分").setMaster("local")
    val sc = new SparkContext(conf)

    val maxTime = scala.collection.mutable.Map[String,Long]()
    maxTime += ("chenjie" -> 4)
    maxTime += ("chenjie" -> 5)

    val contentInfo = sc.textFile("file:///home/chenjie/newContentInfo")

    contentInfo.foreach{  line =>
      maxTime += (line.split("\\|")(0) -> 1)
    }

    for( (k,v) <- maxTime){
      println(k,v)
    }
  }
}
照理说map中不应该只有chenjie->5,而是包含foreach中添加的值。
但事实上只有原来的值。


18/04/03 15:47:49 INFO DAGScheduler: Job 0 finished: foreach at draft2.scala:14, took 2.022240 s
(chenjie,5)
18/04/03 15:47:49 INFO SparkContext: Invoking stop() from shutdown hook
。。。
Process finished with exit code 0


改为广播变量的方式:


(C38696614,1)
(C40265938,1)
(C40006313,1)
(C40714260,1)
18/04/03 15:49:43 INFO SparkContext: Invoking stop() from shutdown hook
18/04/03 15:49:43 INFO SparkUI: Stopped Spark web UI at http://192.168.1.101:4040
18/04/03 15:49:43 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/04/03 15:49:43 INFO MemoryStore: MemoryStore cleared
18/04/03 15:49:43 INFO BlockManager: BlockManager stopped


你可能感兴趣的:(大数据,bug人生)