Spark-0.5.2源码解析-collection shuffle

collection shuffle的意思就是打乱列表元素原有顺序返回一个新的列表,在Spark 0.5.2的源代码版本中,实现代码如下:

/**
    * Shuffle the elements of a collection into a random order,returning the
    * result in a new collection.Unlike scala.util.Random.shuffle,this method
    * uses a local random number generator,avoiding inter-thread contention.
    *
    * @param seq
    * @tparam T
    * @return
    */
  def randomize[T: ClassManifest](seq: TraversableOnce[T]): Seq[T] = {
    randomizeInPlace(seq.toArray)
  }

  /**
    * Shuffle the elements of an array into a random order,modifying the
    * original array.Returns the original array.
    *
    */
  def randomizeInPlace[T](arr: Array[T], rand: Random = new Random): Array[T] = {
    for (i <- (arr.length - 1) to 1 by -1) {
      val j = rand.nextInt(i)
      val tmp = arr(j)
      arr(j) = arr(i)
      arr(i) = tmp
    }
    arr
  }

这里值得关注的是randomizeInPlace方法参数传递了Random类型参数以避免多线程干扰问题。

你可能感兴趣的:(Spark-0.5.2源码解析-collection shuffle)