黑猴子的家:sample 随机抽样

1、Code

val samplerdd = sc.makeRDD(Array(
     "spark1","spark2","spark3","spark4","spark5",
     "hadoop1","hadoop2","hadoop3","java4","java5"
))

samplerdd.sample(false,0.3).foreach(println)

2、结果

spark4
hadoop2
java5

3、sample

sample(withReplacement:Boolean,fraction:Double,seed:Long)
            withReplacement  是否放回抽样
                   true 代表如果抽中A元素,之后还可以抽取A元素
                   false 代表如果抽中A元素,之后不可以抽取A元素
             fraction  抽样比例
             seed  抽样算法的初始化值

你可能感兴趣的:(Spark)