京城风四娘

spark 算子例子_spark 算子详解 ------Action 算子介绍

一、无输出的算子

1.foreach 算子

功能：对 RDD 中的每个元素都应用 f 函数操作，无返回值。

源码：

/**

* Applies a function f to all elements of this RDD.

def foreach(f: T => Unit): Unit = withScope {

val cleanF = sc.clean(f)

sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[20] at parallelize at :24

scala> rdd1.foreach(x => printf("%d ", x))

1 2 3 4 5 6 7 8 9

2.foreachPartition 算子

功能：该函数和foreach类似，不同的是,foreach是直接在每个partition中直接对iterator执行foreach操作,传入的function只是在foreach内部使用,

而foreachPartition是在每个partition中把iterator给传入的function,让function自己对iterator进行处理(可以避免内存溢出)。

简单来说，foreach的iterator是针对的rdd中的元素，而foreachPartition的iterator是针对的分区本身。

源码：

/**

* Return a new RDD by applying a function to each partition of this RDD, while tracking the index * of the original partition. * * `preservesPartitioning` indicates whether the input function preserves the partitioner, which

* should be `false` unless this is a pair RDD and the input function doesn't modify the keys.

def mapPartitionsWithIndex[U: ClassTag](

f: (Int, Iterator[T]) => Iterator[U],

preservesPartitioning: Boolean = false): RDD[U] = withScope {

val cleanedF = sc.clean(f)

new MapPartitionsRDD(

this,

(context: TaskContext, index: Int, iter: Iterator[T]) => cleanedF(index, iter),

preservesPartitioning)

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9, 2)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[23] at parallelize at :24

scala> rdd1.foreachPartition(x => printf("%s ", x.size))

4 5

二、输出到 HDFS 等文件系统的算子

1.saveAsTextFile 算子

功能：该函数将数据输出，以文本文件的形式写入本地文件系统或者HDFS等。Spark将对每个元素调用toString方法，将数据元素转换为文本文件中的一行记录。若将文件保存到本地文件系统，那么只会保存在executor所在机器的本地目录。

源码：

/**

* Save this RDD as a text file, using string representations of elements.

def saveAsTextFile(path: String): Unit = withScope {

// https://issues.apache.org/jira/browse/SPARK-2075

// NullWritable is a `Comparable` in Hadoop 1.+, so the compiler cannot find an implicit

// Ordering for it and will use the default `null`. However, it's a `Comparable[NullWritable]`

// in Hadoop 2.+, so the compiler will call the implicit `Ordering.ordered` method to create an

// Ordering for `NullWritable`. That's why the compiler will generate different anonymous

// classes for `saveAsTextFile` in Hadoop 1.+ and Hadoop 2.+.

// Therefore, here we provide an explicit Ordering `null` to make sure the compiler generate

// same bytecodes for `saveAsTextFile`. val nullWritableClassTag = implicitly[ClassTag[NullWritable]]

val textClassTag = implicitly[ClassTag[Text]]

val r = this.mapPartitions { iter =>

val text = new Text()

iter.map { x =>

text.set(x.toString)

(NullWritable.get(), text)

}

} RDD.rddToPairRDDFunctions(r)(nullWritableClassTag, textClassTag, null)

.saveAsHadoopFile[TextOutputFormat[NullWritable, Text]](path)

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9, 2)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[26] at parallelize at :24

scala> rdd1.saveAsTextFile("file:///opt/app/test/saveAsTextFileTest.txt")

2.saveAsObjectFile 算子

功能：该函数用于将RDD以ObjectFile形式写入本地文件系统或者HDFS等。

源码：

/**

* Save this RDD as a SequenceFile of serialized objects.

def saveAsObjectFile(path: String): Unit = withScope {

this.mapPartitions(iter => iter.grouped(10).map(_.toArray))

.map(x => (NullWritable.get(), new BytesWritable(Utils.serialize(x))))

.saveAsSequenceFile(path)

}

示例：

scala> val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[40] at parallelize at :24

scala> rdd1.saveAsObjectFile("file:///opt/app/test/saveAsObejctFileTest.txt")

3.saveAsHadoopFile 算子

功能：该函数将RDD存储在HDFS上的文件中,可以指定outputKeyClass、outputValueClass以及压缩格式,每个分区输出一个文件。

源码：

/**

* Output the RDD to any Hadoop-supported file system, using a Hadoop `OutputFormat` class

* supporting the key and value types K and V in this RDD.

* @note We should make sure our tasks are idempotent when speculation is enabled, i.e. do

* not use output committer that writes data directly.

* There is an example in https://issues.apache.org/jira/browse/SPARK-10063 to show the bad

* result of using direct output committer with speculation enabled. */def saveAsHadoopFile(

path: String,

keyClass: Class[_],

valueClass: Class[_],

outputFormatClass: Class[_ <: outputformat _>

conf: JobConf = new JobConf(self.context.hadoopConfiguration),

codec: Option[Class[_ <: compressioncodec none unit="self.withScope">

// Rename this as hadoopConf internally to avoid shadowing (see SPARK-2038).

val hadoopConf = conf

hadoopConf.setOutputKeyClass(keyClass)

hadoopConf.setOutputValueClass(valueClass)

conf.setOutputFormat(outputFormatClass)

for (c

hadoopConf.setCompressMapOutput(true)

hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true")

hadoopConf.setMapOutputCompressorClass(c)

hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec", c.getCanonicalName)

hadoopConf.set("mapreduce.output.fileoutputformat.compress.type",

CompressionType.BLOCK.toString)

}

// Use configured output committer if already set

if (conf.getOutputCommitter == null) {

hadoopConf.setOutputCommitter(classOf[FileOutputCommitter])

}

// When speculation is on and output committer class name contains "Direct", we should warn

// users that they may loss data if they are using a direct output committer. val speculationEnabled = self.conf.getBoolean("spark.speculation", false)

val outputCommitterClass = hadoopConf.get("mapred.output.committer.class", "")

if (speculationEnabled && outputCommitterClass.contains("Direct")) {

val warningMessage =

s"$outputCommitterClass may be an output committer that writes data directly to " +

"the final location. Because speculation is enabled, this output committer may " +

"cause data loss (see the case in SPARK-10063). If possible, please use an output " +

"committer that does not have this behavior (e.g. FileOutputCommitter)."

logWarning(warningMessage)

}

FileOutputFormat.setOutputPath(hadoopConf,

SparkHadoopWriterUtils.createPathFromString(path, hadoopConf))

saveAsHadoopDataset(hadoopConf)

}

示例：

val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

rdd1.saveAsHadoopFile("hdfs://192.168.199.201:8020/test",classOf[ClassTag[Text]],classOf[IntWritable],classOf[TextOutputFormat[Text,IntWritable]])

4.saveAsSequenceFile 算子

功能：该函数用于将RDD以Hadoop SequenceFile的形式写入本地文件系统或者HDFS等。

源码：

/**

* Output the RDD as a Hadoop SequenceFile using the Writable types we infer from the RDD's key

* and value types. If the key or value are Writable, then we use their classes directly;

* otherwise we map primitive types such as Int and Double to IntWritable, DoubleWritable, etc,

* byte arrays to BytesWritable, and Strings to Text. The `path` can be on any Hadoop-supported

* file system.

def saveAsSequenceFile(

path: String,

codec: Option[Class[_ <: compressioncodec none unit="self.withScope">

def anyToWritable[U

// TODO We cannot force the return type of `anyToWritable` be same as keyWritableClass and

// valueWritableClass at the compile time. To implement that, we need to add type parameters to

// SequenceFileRDDFunctions. however, SequenceFileRDDFunctions is a public class so it will be a

// breaking change. val convertKey = self.keyClass != _keyWritableClass

val convertValue = self.valueClass != _valueWritableClass

logInfo("Saving as sequence file of type " +

s"(${_keyWritableClass.getSimpleName},${_valueWritableClass.getSimpleName})" )

val format = classOf[SequenceFileOutputFormat[Writable, Writable]]

val jobConf = new JobConf(self.context.hadoopConfiguration)

if (!convertKey && !convertValue) {

self.saveAsHadoopFile(path, _keyWritableClass, _valueWritableClass, format, jobConf, codec)

} else if (!convertKey && convertValue) {

self.map(x => (x._1, anyToWritable(x._2))).saveAsHadoopFile(

path, _keyWritableClass, _valueWritableClass, format, jobConf, codec)

} else if (convertKey && !convertValue) {

self.map(x => (anyToWritable(x._1), x._2)).saveAsHadoopFile(

path, _keyWritableClass, _valueWritableClass, format, jobConf, codec)

} else if (convertKey && convertValue) {

self.map(x => (anyToWritable(x._1), anyToWritable(x._2))).saveAsHadoopFile(

path, _keyWritableClass, _valueWritableClass, format, jobConf, codec)

}

示例：

scala> val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[38] at parallelize at :24

scala> rdd1.saveAsSequenceFile("file:///opt/app/test/saveAsSequenceFileTest1.txt")

5.saveAsHadoopDataset 算子

功能：该函数使用旧的Hadoop API将RDD输出到任何Hadoop支持的存储系统，例如Hbase,为该存储系统使用Hadoop JobConf 对象。

源码：

/**

* Output the RDD to any Hadoop-supported storage system, using a Hadoop JobConf object for

* that storage system. The JobConf should set an OutputFormat and any output paths required

* (e.g. a table name to write to) in the same way as it would be configured for a Hadoop

* MapReduce job.

def saveAsHadoopDataset(conf: JobConf): Unit = self.withScope {

val config = new HadoopMapRedWriteConfigUtil[K, V](new SerializableJobConf(conf))

SparkHadoopWriter.write(

rdd = self,

config = config)

}

示例：

val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

var jobConf = new JobConf()

jobConf.setOutputFormat(classOf[TextOutputFormat[Text,IntWritable]])

jobConf.setOutputKeyClass(classOf[Text])

jobConf.setOutputValueClass(classOf[IntWritable])

jobConf.set("mapred.output.dir","/test/")

rdd1.saveAsHadoopDataset(jobConf)

6.saveAsNewAPIHadoopFile 算子

功能：该函数用于将RDD数据保存到HDFS上，使用新版本Hadoop API。用法基本同saveAsHadoopFile。

源码：

/**

* Output the RDD to any Hadoop-supported file system, using a new Hadoop API `OutputFormat`

* (mapreduce.OutputFormat) object supporting the key and value types K and V in this RDD.

def saveAsNewAPIHadoopFile(

path: String,

keyClass: Class[_],

valueClass: Class[_],

outputFormatClass: Class[_ <: newoutputformat _>

conf: Configuration = self.context.hadoopConfiguration): Unit = self.withScope {

// Rename this as hadoopConf internally to avoid shadowing (see SPARK-2038).

val hadoopConf = conf

val job = NewAPIHadoopJob.getInstance(hadoopConf)

job.setOutputKeyClass(keyClass)

job.setOutputValueClass(valueClass)

job.setOutputFormatClass(outputFormatClass)

val jobConfiguration = job.getConfiguration

jobConfiguration.set("mapreduce.output.fileoutputformat.outputdir", path)

saveAsNewAPIHadoopDataset(jobConfiguration)

}

示例：

val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

rdd1.saveAsNewAPIHadoopFile("hdfs://192.168.199.201:8020/test",classOf[Text],classOf[IntWritable],classOf[output.TextOutputFormat[Text,IntWritable]])

7.saveAsNewAPIHadoopDataset 算子

功能：使用新的Hadoop API将RDD输出到任何Hadoop支持的存储系统，例如Hbase,为该存储系统使用Hadoop Configuration对象。Conf设置一个OutputFormat和任何需要的输出路径(如要写入的表名)，就像为Hadoop MapReduce作业配置的那样。

源码：

/**

* Output the RDD to any Hadoop-supported storage system with new Hadoop API, using a Hadoop

* Configuration object for that storage system. The Conf should set an OutputFormat and any

* output paths required (e.g. a table name to write to) in the same way as it would be

* configured for a Hadoop MapReduce job.

* @note We should make sure our tasks are idempotent when speculation is enabled, i.e. do

* not use output committer that writes data directly.

* There is an example in https://issues.apache.org/jira/browse/SPARK-10063 to show the bad

* result of using direct output committer with speculation enabled.

def saveAsNewAPIHadoopDataset(conf: Configuration): Unit = self.withScope {

val config = new HadoopMapReduceWriteConfigUtil[K, V](new SerializableConfiguration(conf))

SparkHadoopWriter.write(

rdd = self,

config = config)

}

示例：

val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

var jobConf = new JobConf()

jobConf.setOutputFormat(classOf[TextOutputFormat[Text,IntWritable]])

jobConf.setOutputKeyClass(classOf[Text])

jobConf.setOutputValueClass(classOf[IntWritable])

jobConf.set("mapred.output.dir","/test/")

rdd1.saveAsNewAPIHadoopDataset(jobConf)

三、输出 scala 集合和数据类型的算子

1.first 算子

功能：返回RDD中的第一个元素，不排序。

源码：

/**

* Return the first element in this RDD.

def first(): T = withScope {

take(1) match {

case Array(t) => t

case _ => throw new UnsupportedOperationException("empty collection")

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :24

scala> val rdd2 = rdd1.first()

rdd2: Int = 1

scala> print(rdd2)

2.count 算子

功能：返回RDD中的元素数量。

源码：

/**

* Return the number of elements in the RDD.

def count(): Long = sc.runJob(this, Utils.getIteratorSize _).sum

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at :24

scala> println(rdd1.count())

3.reduce 算子

功能：将RDD中元素两两传递给输入函数，同时产生一个新值，新值与RDD中下一个元素再被传递给输入函数，直到最后只有一个值为止。

源码：

/**

* Reduces the elements of this RDD using the specified commutative and

* associative binary operator.

def reduce(f: (T, T) => T): T = withScope {

val cleanF = sc.clean(f)

val reducePartition: Iterator[T] => Option[T] = iter => {

if (iter.hasNext) {

Some(iter.reduceLeft(cleanF))

} else {

None

}

} var jobResult: Option[T] = None

val mergeResult = (index: Int, taskResult: Option[T]) => {

if (taskResult.isDefined) {

jobResult = jobResult match {

case Some(value) => Some(f(value, taskResult.get))

case None => taskResult

}

} } sc.runJob(this, reducePartition, mergeResult)

// Get the final result out of our Option, or throw an exception if the RDD was empty

jobResult.getOrElse(throw new UnsupportedOperationException("empty collection"))

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at :24

scala> val rdd2 = rdd1.reduce((x,y) => x + y)

rdd2: Int = 45

4.collect 算子

功能：将一个RDD以一个Array数组形式返回其中的所有元素。

源码：

/**

* Return an array that contains all of the elements in this RDD.

* @note This method should only be used if the resulting array is expected to be small, as

* all the data is loaded into the driver's memory.

def collect(): Array[T] = withScope {

val results = sc.runJob(this, (iter: Iterator[T]) => iter.toArray)

Array.concat(results: _*)

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[3] at parallelize at :24

scala> rdd1.collect

res3: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)

5.take 算子

功能：返回一个包含数据集前n个元素的数组(从0下标到n-1下标的元素)，不排序。

源码：

/**

* Take the first num elements of the RDD. It works by first scanning one partition, and use the

* results from that partition to estimate the number of additional partitions needed to satisfy

* the limit.

* @note This method should only be used if the resulting array is expected to be small, as

* all the data is loaded into the driver's memory.

* @note Due to complications in the internal implementation, this method will raise

* an exception if called on an RDD of `Nothing` or `Null`.

def take(num: Int): Array[T] = withScope {

val scaleUpFactor = Math.max(conf.getInt("spark.rdd.limit.scaleUpFactor", 4), 2)

if (num == 0) {

new Array[T](0)

} else {

val buf = new ArrayBuffer[T]

val totalParts = this.partitions.length

var partsScanned = 0

while (buf.size < num && partsScanned < totalParts) {

// The number of partitions to try in this iteration. It is ok for this number to be

// greater than totalParts because we actually cap it at totalParts in runJob. var numPartsToTry = 1L

val left = num - buf.size

if (partsScanned > 0) {

// If we didn't find any rows after the previous iteration, quadruple and retry.

// Otherwise, interpolate the number of partitions we need to try, but overestimate // it by 50%. We also cap the estimation in the end. if (buf.isEmpty) {

numPartsToTry = partsScanned * scaleUpFactor

} else {

// As left > 0, numPartsToTry is always >= 1

numPartsToTry = Math.ceil(1.5 * left * partsScanned / buf.size).toInt

numPartsToTry = Math.min(numPartsToTry, partsScanned * scaleUpFactor)

}

val p = partsScanned.until(math.min(partsScanned + numPartsToTry, totalParts).toInt)

val res = sc.runJob(this, (it: Iterator[T]) => it.take(left).toArray, p)

res.foreach(buf ++= _.take(num - buf.size))

partsScanned += p.size

}

buf.toArray

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at :24

scala> val rdd2 = rdd1.take(3)

rdd2: Array[Int] = Array(1, 2, 3)

6.top 算子

功能：从按降序排列的RDD中获取前N个元素，或者有可选的key函数决定顺序，返回一个数组。

源码：

/**

* Returns the top k (largest) elements from this RDD as defined by the specified

* implicit Ordering[T] and maintains the ordering. This does the opposite of

* [[takeOrdered]]. For example:

* {{{

* sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1)

* // returns Array(12)

* sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2)

* // returns Array(6, 5)

* }}}

* @note This method should only be used if the resulting array is expected to be small, as

* all the data is loaded into the driver's memory.

* @param num k, the number of top elements to return

* @param ord the implicit ordering for T

* @return an array of top elements

*/def top(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope {

takeOrdered(num)(ord.reverse)

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at :24

scala> val rdd2 = rdd1.top(3)

rdd2: Array[Int] = Array(9, 8, 7)

7.takeOrdered 算子

功能：返回RDD中前n个元素，并按默认顺序排序(升序)或者按自定义比较器顺序排序。

源码：

/**

* Returns the first k (smallest) elements from this RDD as defined by the specified

* implicit Ordering[T] and maintains the ordering. This does the opposite of [[top]].

* For example:

* {{{

* sc.parallelize(Seq(10, 4, 2, 12, 3)).takeOrdered(1)

* // returns Array(2)

* sc.parallelize(Seq(2, 3, 4, 5, 6)).takeOrdered(2)

* // returns Array(2, 3) * }}}

* @note This method should only be used if the resulting array is expected to be small, as

* all the data is loaded into the driver's memory.

* @param num k, the number of elements to return

* @param ord the implicit ordering for T

* @return an array of top elements

*/def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope {

if (num == 0) {

Array.empty

} else {

val mapRDDs = mapPartitions { items =>

// Priority keeps the largest elements, so let's reverse the ordering.

val queue = new BoundedPriorityQueue[T](num)(ord.reverse)

queue ++= collectionUtils.takeOrdered(items, num)(ord)

Iterator.single(queue)

}

if (mapRDDs.partitions.length == 0) {

Array.empty

} else {

mapRDDs.reduce { (queue1, queue2) =>

queue1 ++= queue2

queue1

}.toArray.sorted(ord)

}

}}

示例：

scala> val rdd1 = sc.makeRDD(Seq(5,4,2,1,3,6))

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at makeRDD at :24

scala> val rdd2 = rdd1.takeOrdered(3)

rdd2: Array[Int] = Array(1, 2, 3)

8.aggregate 算子

功能：aggregate函数将每个分区里面的元素进行聚合(seqOp)，然后用combine函数将每个分区的结果和初始值(zeroValue)进行combine操作。这个函数最终返回的类型不需要和RDD中元素类型一致。

源码：

/**

* Aggregate the elements of each partition, and then the results for all the partitions, using

* given combine functions and a neutral "zero value". This function can return a different result

* type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U

* and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are

* allowed to modify and return their first argument instead of creating a new U to avoid memory

* allocation.

* @param zeroValue the initial value for the accumulated result of each partition for the

* `seqOp` operator, and also the initial value for the combine results from

* different partitions for the `combOp` operator - this will typically be the

* neutral element (e.g. `Nil` for list concatenation or `0` for summation)

* @param seqOp an operator used to accumulate results within a partition

* @param combOp an associative operator used to combine results from different partitions

*/def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U = withScope {

// Clone the zero value since we will also be serializing it as part of tasks

var jobResult = Utils.clone(zeroValue, sc.env.serializer.newInstance())

val cleanSeqOp = sc.clean(seqOp)

val cleanCombOp = sc.clean(combOp)

val aggregatePartition = (it: Iterator[T]) => it.aggregate(zeroValue)(cleanSeqOp, cleanCombOp)

val mergeResult = (index: Int, taskResult: U) => jobResult = combOp(jobResult, taskResult)

sc.runJob(this, aggregatePartition, mergeResult)

jobResult

}

示例：

scala> val rdd1 = sc.parallelize(1 to 9, 3)

rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[11] at parallelize at :24

》

scala> val rdd2 = rdd1.aggregate((0,0))(

| (acc,number) => (acc._1 + number, acc._2 + 1),

| (par1,par2) => (par1._1 + par2._1, par1._2 + par2._2)

| )

rdd2: (Int, Int) = (45,9)

9.fold 算子

功能：通过op函数聚合各分区中的元素及合并各分区的元素，op函数需要两个参数，在开始时第一个传入的参数为zeroValue,T为RDD数据集的数据类型，，其作用相当于SeqOp和comOp函数都相同的aggregate函数。

源码：

/**

* Aggregate the elements of each partition, and then the results for all the partitions, using a

* given associative function and a neutral "zero value". The function

* op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object

* allocation; however, it should not modify t2.

* This behaves somewhat differently from fold operations implemented for non-distributed

* collections in functional languages like Scala. This fold operation may be applied to

* partitions individually, and then fold those results into the final result, rather than

* apply the fold to each element sequentially in some defined ordering. For functions

* that are not commutative, the result may differ from that of a fold applied to a

* non-distributed collection.

* @param zeroValue the initial value for the accumulated result of each partition for the `op`

* operator, and also the initial value for the combine results from different

* partitions for the `op` operator - this will typically be the neutral

* element (e.g. `Nil` for list concatenation or `0` for summation)

* @param op an operator used to both accumulate results within a partition and combine results

* from different partitions */def fold(zeroValue: T)(op: (T, T) => T): T = withScope {

// Clone the zero value since we will also be serializing it as part of tasks

var jobResult = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())

val cleanOp = sc.clean(op)

val foldPartition = (iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)

val mergeResult = (index: Int, taskResult: T) => jobResult = op(jobResult, taskResult)

sc.runJob(this, foldPartition, mergeResult)

jobResult

}

示例：

scala> val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 5), ("a", 4)), 2)

rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[13] at parallelize at :24

scala> val rdd2 = rdd1.fold(("e", 0))((val1, val2) => { if (val1._2 >= val2._2) val1 else val2})

rdd2: (String, Int) = (d,5)

scala> println(rdd2)

(d,5)

10.lookup 算子

功能：该函数对(Key，Value)型的RDD操作，返回指定Key对应的元素形成的Seq。这个函数处理优化的部分在于，如果这个RDD包含分区器，则只会对应处理K所在的分区，然后返回由(K，V)形成的Seq。如果RDD不包含分区器，则需要对全RDD元素进行暴力扫描处理，搜索指定K对应的元素

源码：

/**

* Return the list of values in the RDD for key `key`. This operation is done efficiently if the

* RDD has a known partitioner by only searching the partition that the key maps to.

def lookup(key: K): Seq[V] = self.withScope {

self.partitioner match {

case Some(p) =>

val index = p.getPartition(key)

val process = (it: Iterator[(K, V)]) => {

val buf = new ArrayBuffer[V]

for (pair

buf += pair._2

}

buf

} : Seq[V]

val res = self.context.runJob(self, process, Array(index))

res(0)

case None =>

self.filter(_._1 == key).map(_._2).collect()

}

示例：

scala> val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 4), ("a", 5)), 2)

rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[14] at parallelize at :24

scala> val rdd2 = rdd1.lookup("a")

rdd2: Seq[Int] = WrappedArray(1, 5)

11.countByKey 算子

功能：用于统计RDD[K,V]中每个K的数量，返回具有每个key的计数的(k，int)pairs的Map。

源码：

/**

* Count the number of elements for each key, collecting the results to a local Map.

* @note This method should only be used if the resulting map is expected to be small, as

* the whole thing is loaded into the driver's memory.

* To handle very large results, consider using rdd.mapValues(_ => 1L).reduceByKey(_ + _), which * returns an RDD[T, Long] instead of a map.

def countByKey(): Map[K, Long] = self.withScope {

self.mapValues(_ => 1L).reduceByKey(_ + _).collect().toMap

}

示例：

scala> val rdd1 = sc.parallelize(Array(("a", 1), ("b", 2), ("c", 3), ("d", 4), ("a", 5)), 2)

rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[17] at parallelize at :24

scala> val rdd2 = rdd1.countByKey()

rdd2: scala.collection.Map[String,Long] = Map(d -> 1, b -> 1, a -> 2, c -> 1)

你可能感兴趣的:(spark,算子例子)

学点心理知识，呵护孩子健康静候花开_7090
昨天听了华中师范大学教育管理学系副教授张玲老师的《哪里才是学生心理健康的最后庇护所，超越教育与技术的思考》的讲座。今天又重新学习了一遍，收获匪浅。张玲博士也注意到了当今社会上的孩子由于心理问题导致的自残、自杀及伤害他人等恶性事件。她向我们普及了一个重要的命题，她说心理健康的一些基本命题，我们与我们通常的一些教育命题是不同的，她还举了几个例子，让我们明白我们原来以为的健康并非心理学上的健康。比如如果
每日一题——第九十题互联网打工人no1 C语言程序设计每日一练 c语言
题目：判断子串是否与主串匹配#include#include#include//////判断子串是否在主串中匹配//////主串///子串///boolisSubstring(constchar*str,constchar*substr){intlenstr=strlen(str);//计算主串的长度intlenSub=strlen(substr);//计算子串的长度//遍历主字符串，对每个可能得
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
社保应该缴15年还是25年？那种方式最划算？袋鼠观保保险规划师
社保无论是缴费15年还是25年，影响最大的就是养老保险和医疗保险，缴费时间越长越有利！1.养老保险真的交满15年就够了吗？要知道，社保缴费时长，直接影响到退休后能拿多少养老金，而且交得越久，退休领得越多。我拿深圳作为例子，想拿到养老金必须满足两个条件：只要达到一定的退休年龄，养老保险累计交满15年就可以拿到养老金了。那如果多缴了20年、25年甚至30年，是不是浪费了？实际上，缴满15年只是刚好可以
作业是家庭关系的枢纽潘海松
回想一下，当孩子做作业的时候，我们不断地在和孩子聊天、沟通，互相提出一些要求，也不可避免地，会产生分歧。举个最常见的例子，我们告诉孩子：「该写作业了。」娃是什么反应？好的亲子关系，孩子会乖乖停掉手里的事马上去写作业，或者好声好气地和家长商量，能不能在半个小时（或某个时间）开始。而不如意的亲子关系，孩子听到这句话的瞬间，就是各种不情愿，敷衍、拖延甚至于撒谎、撒泼打滚。最后，成为当天家庭里坏情绪的引爆
C++ lambda闭包消除类成员变量 barbyQAQ c++c++java 算法
原文链接：https://blog.csdn.net/qq_51470638/article/details/142151502一、背景在面向对象编程时，常常要添加类成员变量。然而类成员一旦多了之后，也会带来干扰。拿到一个类，一看成员变量好几十个，就问你怕不怕？二、解决思路可以借助函数式编程思想，来消除一些不必要的类成员变量。三、实例举个例子：classClassA{public:...intfu
Python实现关联规则推荐这孩子谁懂哈 Python Machine Learning python 关联规则机器学习
1.什么关联规则关联规则（AssociationRules）是反映一个事物与其他事物之间的相互依存性和关联性，如果两个或多个事物之间存在一定的关联关系，那么，其中一个事物就能通过其他事物预测到。关联规则是数据挖掘的一个重要技术，用于从大量数据中挖掘出有价值的数据项之间的相关关系。关联规则挖掘的最经典的例子就是沃尔玛的啤酒与尿布的故事，通过对超市购物篮数据进行分析，即顾客放入购物篮中不同商品之间的关
3.增删改查--连接查询问女何所忆
关系型数据库的一个特点就是，多张表之间存在关系，以致于我们可以连接多张表进行查询操作，所以连接查询会是关系型数据库中最常见的操作。连接查询主要分为三种，交叉连接、内连接和外连接，我们一个个说。1、交叉连接交叉连接其实连接查询的第一个阶段，它简单表现为两张表的笛卡尔积形式，具体例子：如果你没学过数学中的笛卡尔积概念，你可以这样简单的理解这里的交叉连接：两张表的交叉连接就是一个连接合并的过程，T1表中
python实现规则引擎_规则引擎python weixin_39601511 python实现规则引擎
广告关闭回望2020，你在技术之路上，有什么收获和成长么？对于未来，你有什么期待么？云+社区年度征文，各种定制好礼等你！我正在用python编写日志收集分析应用程序，我需要编写一个“规则引擎”来匹配和处理日志消息。它需要具有以下特点：正则表达式匹配消息本身消息严重性优先级的算术比较布尔运算符我设想一个例子规则可能是这样的：(message~program:messageandseverity>=h
metaRTC5.0 API编程指南(一) metaRTC metaRTC c++c语言 webrtc
概述metaRTC5.0版本API进行了重构，本篇文章将介绍webrtc传输调用流程和例子。metaRTC5.0版本提供了C++和纯C两种接口。纯C接口YangPeerConnection头文件:include/yangrtc/YangPeerConnection.htypedefstruct{void*conn;YangAVInfo*avinfo;YangStreamConfigstreamco
教师资格证常考的5个知识点 a3cb74a20840
知识点1：教育与人的发展(5规律、4因素、3动因)五大规律：顺序性—循序渐进阶段性—不搞“一刀切”不平衡性—抓关键期互补性—扬长避短个别差异性—因材施教考点精华：1.举例子对应五大规律;2.每个规律的教学启示;3规律特点。四大因素：遗传(地位：物质前提、可能性)环境(地位：多种可能、现实性)学校教育(主导)个人主观能动性(动力、决定)三大动因：内发论(1.孟子：性善论;2.弗洛伊德：性本能)外铄论
自定义分区我的K8409 Hadoop hdfs hadoop 大数据
通过简单例子了解partition分区类的重写方法分区是在MR的过程中进行的，属于Shuffle阶段但是在Job端不要忘记进行调用：job.setPartitionerClass(xxx.class)按照年龄分区：classAgePartitionerextendsPartitioner{@OverridepublicintgetPartition(MyComparablekey,NullWrit
Scanpy源码浅析之pp.normalize_total 何物昂
版本导入Scanpy,其版本为'1.9.1'，如果你看到的源码和下文有差异，其可能是由于版本差异。importscanpyasscsc.__version__#'1.9.1'例子函数pp.normalize_total用于Normalizecountspercell，其源代码在scanpy/preprocessing/_normalization.py我们通过一个简单例子来了解该函数主要功能:将一
python画出分子化学空间分布（UMAP） Sakaiay python
利用umap画出分子化学空间分布图安装pipinstallumap-learn下面是用一个数据集举的例子importtorchimportumapimportpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltimportseabornassnsfromsklearn.manifoldimportTSNEfromrdkit.Chemimport
Linux使用mjpg-streamer进行图像传输 —你的鼬先生 Linux驱动 linux 树莓派图像传输
图像传输是一项在Linux操作系统中比较常见的一个操作，在视频图传时，一般是采用MJPG-streamer来进行图像传输，本文就以树莓派为例子，来示范一个图像传输。1.树莓派的摄像头激活首先更新树莓派sudoapt-getupdatesudoapt-getupgrade随后打开树莓派的配置界面，选择InterfaceOptionsudoraspi-config在InterfaceOption选择C
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
感赏107（2019.3.7）能量放任让孩子行为更规范张天艳
能量上的放任，逐渐让孩子的行为变得规范。我从点滴上举个小例子。女儿刚放寒假就剪了短头发，女孩子爱美的天性就完全展现了出来。女儿要剪短发，我支持了她，夸她这个头型很漂亮，她就很开心。而不是我一味的去阻挠。女儿自从剪了头发，隔一天就洗一次头发，浪费的时间相对很多。以往我会抱怨浪费时间啊，唠叨她快点啊，都是碎碎念。现在女儿洗头发我会说：“你爱清洁这个习惯很好，干干净净的，我女儿漂漂亮亮的，真是好！”由衷
小说《101所》09：官司（中）一言莫辩
经过合同、沙盘和现场对比，李天明觉得外部环境的变化，可以打打官司，至少还有沙盘模型作为证据，虽然合同里声明不能作为的合同的条款，但外部环境足以影响到是否购买底楼的房子，而且这是开发商提供的格式合同，该条款明显规避了开发商的责任，签订合同时没有特别的提示，李天明记得当初自学法律时，记得特别清楚，书上举的例子是保险合同的免责条款。慎重起见，李天明专门咨询了法院和律师朋友，虽然没有得到确切的答复，但是找
Python静态方法@staticmethod和类方法@classmethod 西北小生_
Python静态方法@staticmethod和类方法@classmethod经常出现在类的定义中，二者和常规实例方法之间有什么区别呢？先看例子：classA:cnt=0val=1def__init__(self,cnt=0,val=1):self.val=valA.cnt+=1defnormal_fun(self,x):print(x+self.val)@classmethoddefget_cn
Python: round函数湫兮之风 python python 开发语言 numpy 人工智能
语法在Python中，round()是一个内置函数，用于对浮点数进行四舍五入。基本语法如下：round(number,ndigits)其中：number是你要四舍五入的浮点数。ndigits（可选）决定了四舍五入到哪个位置，0是到整数位，负数是到十位、百位等。如果不提供这个参数，那么默认四舍五入到最接近的整数。例子：print(round(3.14159,2))#输出：3.14print(roun
系统设计DDIA之Chapter 7 Transactions 之防止丢失更新暴躁老哥在线刷题 SystemDesign 数据库系统设计大数据系统架构 DDIA
防止丢失更新涉及处理多个事务并发写入时发生的各种冲突类型。虽然“读已提交”和“快照隔离”等隔离级别管理与读取相关的冲突，但防止丢失更新需要额外的措施来处理写写冲突。丢失更新问题：当两个事务同时读取一个值，对其进行修改，然后将修改后的值写回时，会发生这种问题。一个修改可能会覆盖或“破坏”另一个修改，导致更新丢失。例子包括递增计数器、更新复杂文档，或多个用户同时编辑相同内容。防止丢失更新的解决方案：原
Spark 组件 GraphX、Streaming 叶域大数据 spark spark 大数据分布式
Spark组件GraphX、Streaming一、SparkGraphX1.1GraphX的主要概念1.2GraphX的核心操作1.3示例代码1.4GraphX的应用场景二、SparkStreaming2.1SparkStreaming的主要概念2.2示例代码2.3SparkStreaming的集成2.4SparkStreaming的应用场景SparkGraphX用于处理图和图并行计算。Graph
OpenCV结构分析与形状描述符（24）检测两个旋转矩形之间是否相交的一个函数rotatedRectangleIntersection()的使用 jndingxin OpenCV opencv 人工智能计算机视觉
操作系统：ubuntu22.04OpenCV版本：OpenCV4.9IDE:VisualStudioCode编程语言：C++11算法描述测两个旋转矩形之间是否存在交集。如果存在交集，则还返回交集区域的顶点。下面是一些交集配置的例子。斜线图案表示交集区域，红色顶点是由函数返回的。rotatedRectangleIntersection()这个函数看起来像是用于检测两个旋转矩形之间是否相交的一个方法。
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
2023-04-26 自省第一天 A银子
第一个，我愿意并且还得起。换的起，还的起。（1）今天去超市给儿子买了零食，然后看到昨天同事给我吃的饼干，葱油饼味的，自己也买了一点。我愿意花，我花的起。第二个是，不断的去挖掘，聚焦、放大自己的丰盛，要写感赏文，感赏你生命中每一天发生的美好与丰盛。第三，挖掘聚焦并放大别人生命中丰盛的例子，然后随喜，随喜就是把别人丰盛的感觉，并拷贝一份到自己的磁场去。别人的兴奋、丰盛，一并的给拷贝过来。1.今天值得开
WPF中的控件转换（Transform） A_nanda WPF赏析 wpf
不可不知的WPF转换（Transform）在WPF开发中，经常会需要用到UI控件的2D转换（如：旋转，缩放，移动，倾斜等功能），本文以一些简单的小例子，简述如何通过Transform类实现FrameworkElement对象的2D转换，仅供学习分享使用，如有不足之处，还请指正。什么是Transform?转换（Transform）定义如何将控件从一个坐标空间映射或转换到另一个坐标空间。2D转换可以通
CSS中如何实现鼠标悬停效果？神明木佑 css 前端
在CSS中，您可以使用:hover伪类来实现鼠标悬停效果。:hover伪类会在用户将鼠标悬停在选择器所匹配的元素上时应用指定的样式。下面是一个简单的例子，展示了如何在鼠标悬停时改变文本颜色和背景颜色：MouseHoverExample.hover-effect{color:black;background-color:white;padding:10px;text-align:center;}.h
记录生活第552天，2023-04-08 快乐姐星球
日行一记去年诺贝尔奖经济学奖得主所做的研究就证明：读名校的孩子并不一定比读普通学校的孩子未来更好，但长期坚持读书与学习，确实可以给未来带来更多的收益。生命不是短跑，生命其实是马拉松，而这个马拉松一定是那些有坚强耐力的人才能跑得最远、跑得最好，而不是那些起跑很快，却半路“夭折”的人。像这样的例子其实也挺多的，所以我们作为家长向远看、向前看，向将来看齐。【弟子规】晨读打卡第80天心有疑随札记就人问求确
学习成为会布局的人弘毅聊财商
世界上99%的人都是赌徒，剩下的1%是庄家，1%中的1%是布局的人。我先举个例子，开拓一下你的思维：比如某个大山里发现了金矿，大家都一窝蜂的跑去挖金子，这时候，你也跟着去挖，基本已经挖不到了你要做的不是加入淘金大军，而是应该在通往挖金子路上卖水，卖干粮，卖工具，卖挖金子技巧的书，赚那些挖金子的人的钱这个时代，赚钱最核心的本质是：去赚那些想赚钱的人的钱你看微商为啥能做那么大，就是因为他们在赚，想赚钱
Java之String类不互关就取关 java python 开发语言
一、String类常用方法1.引用类型的比较我们知道在Java中两个引用遍历是不能用"=="号来比较的，而String类重写了父类objects的equals方法，实现了引用类型的比较例子importjava.util.Scanner;publicclassMain{publicstaticvoidmain(String[]args){Stringstr1="helloworld";Strings
对股票分析时要注意哪些主要因素？会飞的奇葩猪股票分析云掌股吧
　　众所周知，对散户投资者来说，股票技术分析是应战股市的核心武器，想学好股票的技术分析一定要知道哪些是重点学习的，其实非常简单，我们只要记住三个要素：成交量、价格趋势、振荡指标。一、成交量　　大盘的成交量状态。成交量大说明市场的获利机会较多，成交量小说明市场的获利机会较少。当沪市的成交量超过150亿时是强市市场状态，运用技术找综合买点较准；
【Scala十八】视图界定与上下文界定 bit1129 scala
Context Bound，上下文界定，是Scala为隐式参数引入的一种语法糖，使得隐式转换的编码更加简洁。隐式参数首先引入一个泛型函数max，用于取a和b的最大值 def max[T](a: T, b: T) = { if (a > b) a else b } 因为T是未知类型，只有运行时才会代入真正的类型，因此调用a >
C语言的分支——Object-C程序设计阅读有感 darkblue086 apple c 框架 cocoa
自从1972年贝尔实验室Dennis Ritchie开发了C语言，C语言已经有了很多版本和实现，从Borland到microsoft还是GNU、Apple都提供了不同时代的多种选择，我们知道C语言是基于Thompson开发的B语言的，Object-C是以SmallTalk-80为基础的。和C++不同的是，Object C并不是C的超集，因为有很多特性与C是不同的。 Object-C程序设计这本书
去除浏览器对表单值的记忆周凡杨 html 记忆 autocomplete form 浏览
&n
java的树形通讯录 g21121 java
最近用到企业通讯录，虽然以前也开发过，但是用的是jsf，拼成的树形，及其笨重和难维护。后来就想到直接生成json格式字符串，页面上也好展现。 // 首先取出每个部门的联系人 for (int i = 0; i < depList.size(); i++) { List<Contacts> list = getContactList(depList.get(i
Nginx安装部署 510888780 nginx linux
Nginx ("engine x") 是一个高性能的 HTTP 和反向代理服务器，也是一个 IMAP/POP3/SMTP 代理服务器。 Nginx 是由 Igor Sysoev 为俄罗斯访问量第二的 Rambler.ru 站点开发的，第一个公开版本0.1.0发布于2004年10月4日。其将源代码以类BSD许可证的形式发布，因它的稳定性、丰富的功能集、示例配置文件和低系统资源
java servelet异步处理请求墙头上一根草ｊａｖａ异步返回ｓｅｒｖｌｅｔ
servlet3.0以后支持异步处理请求，具体是使用AsyncContext ，包装httpservletRequest以及httpservletResponse具有异步的功能， final AsyncContext ac = request.startAsync(request, response); ac.s
我的spring学习笔记8-Spring中Bean的实例化 aijuans Spring 3
在Spring中要实例化一个Bean有几种方法： 1、最常用的（普通方法） <bean id="myBean" class="www.6e6.org.MyBean" /> 使用这样方法，按Spring就会使用Bean的默认构造方法，也就是把没有参数的构造方法来建立Bean实例。（有构造方法的下个文细说） 2、还
为Mysql创建最优的索引 annan211 mysql 索引
索引对于良好的性能非常关键，尤其是当数据规模越来越大的时候，索引的对性能的影响越发重要。索引经常会被误解甚至忽略，而且经常被糟糕的设计。索引优化应该是对查询性能优化最有效的手段了，索引能够轻易将查询性能提高几个数量级，最优的索引会比较好的索引性能要好2个数量级。 1 索引的类型 (1) B-Tree 不出意外，这里提到的索引都是指 B-
日期函数百合不是茶 oracle sql 日期函数查询
ORACLE日期时间函数大全 TO_DATE格式(以时间:2007-11-02 13:45:25为例) Year: yy two digits 两位年显示值:07 yyy three digits 三位年显示值:007
线程优先级 bijian1013 java thread 多线程 java多线程
多线程运行时需要定义线程运行的先后顺序。线程优先级是用数字表示，数字越大线程优先级越高，取值在1到10，默认优先级为5。实例： package com.bijian.study; /** * 因为在代码段当中把线程B的优先级设置高于线程A,所以运行结果先执行线程B的run()方法后再执行线程A的run()方法 * 但在实际中，JAVA的优先级不准，强烈不建议用此方法来控制执
适配器模式和代理模式的区别 bijian1013 java 设计模式
一.简介适配器模式：适配器模式（英语：adapter pattern）有时候也称包装样式或者包装。将一个类的接口转接成用户所期待的。一个适配使得因接口不兼容而不能在一起工作的类工作在一起，做法是将类别自己的接口包裹在一个已存在的类中。 &nbs
【持久化框架MyBatis3三】MyBatis3 SQL映射配置文件 bit1129 Mybatis3
SQL映射配置文件一方面类似于Hibernate的映射配置文件，通过定义实体与关系表的列之间的对应关系。另一方面使用<select>,<insert>,<delete>，<update>元素定义增删改查的SQL语句，这些元素包含三方面内容 1. 要执行的SQL语句 2. SQL语句的入参，比如查询条件 3. SQL语句的返回结果
oracle大数据表复制备份个人经验 bitcarter oracle 大表备份大表数据复制
前提：数据库仓库A（就拿oracle11g为例）中有两个用户user1和user2,现在有user1中有表ldm_table1,且表ldm_table1有数据5千万以上，ldm_table1中的数据是从其他库B（数据源）中抽取过来的，前期业务理解不够或者需求有变，数据有变动需要重新从B中抽取数据到A库表ldm_table1中。
HTTP加速器varnish安装小记 ronin47 http varnish 加速
上午共享的那个varnish安装手册，个人看了下，有点不知所云，好吧~看来还是先安装玩玩！苦逼公司服务器没法连外网，不能用什么wget或yum命令直接下载安装，每每看到别人博客贴出的在线安装代码时，总有一股羡慕嫉妒“恨”冒了出来。。。好吧，既然没法上外网，那只能麻烦点通过下载源码来编译安装了！ Varnish 3.0.4下载地址： http://repo.varnish-cache.org/
java-73-输入一个字符串，输出该字符串中对称的子字符串的最大长度 bylijinnan java
public class LongestSymmtricalLength { /* * Q75题目：输入一个字符串，输出该字符串中对称的子字符串的最大长度。 * 比如输入字符串“google”，由于该字符串里最长的对称子字符串是“goog”，因此输出4。 */ public static void main(String[] args) { Str
学习编程的一点感想 Cb123456 编程感想 Gis
写点感想，总结一些，也顺便激励一些自己.现在就是复习阶段，也做做项目. 本专业是GIS专业，当初觉得本专业太水，靠这个会活不下去的，所以就报了培训班。学习的时候，进入状态很慢，而且当初进去的时候，已经上到Java高级阶段了，所以.....，呵呵，之后有点感觉了，不过，还是不好好写代码，还眼高手低的，有
[能源与安全]美国与中国 comsci 能源
现在有一个局面：地球上的石油只剩下N桶，这些油只够让中国和美国这两个国家中的一个顺利过渡到宇宙时代，但是如果这两个国家为争夺这些石油而发生战争，其结果是两个国家都无法平稳过渡到宇宙时代。。。。而且在战争中，剩下的石油也会被快速消耗在战争中，结果是两败俱伤。。。在这个大
SEMI-JOIN执行计划突然变成HASH JOIN了的原因分析 cwqcwqmax9 oracle
甲说： A B两个表总数据量都很大，在百万以上。 idx1 idx2字段表示是索引字段 A B 两表上都有 col1字段表示普通字段 select xxx from A where A.idx1 between mmm and nnn and exists (select 1 from B where B.idx2 =
SpringMVC-ajax返回值乱码解决方案 dashuaifu Ajax springMVC response 中文乱码
SpringMVC-ajax返回值乱码解决方案一：（自己总结，测试过可行） ajax返回如果含有中文汉字，则使用：（如下例：） @RequestMapping(value="/xxx.do") public @ResponseBody void getPunishReasonB
Linux系统中查看日志的常用命令 dcj3sjt126com OS
因为在日常的工作中，出问题的时候查看日志是每个管理员的习惯，作为初学者，为了以后的需要，我今天将下面这些查看命令共享给各位 cat tail -f 日志文件说明 /var/log/message 系统启动后的信息和错误日志，是Red Hat Linux中最常用的日志之一 /var/log/secure 与安全相关的日志信息 /var/log/maillog 与邮件相关的日志信
[应用结构]应用 dcj3sjt126com PHP yii2
应用主体应用主体是管理 Yii 应用系统整体结构和生命周期的对象。每个Yii应用系统只能包含一个应用主体，应用主体在入口脚本中创建并能通过表达式 \Yii::$app 全局范围内访问。补充: 当我们说"一个应用"，它可能是一个应用主体对象，也可能是一个应用系统，是根据上下文来决定[译：中文为避免歧义，Application翻译为应
assertThat用法 eksliang JUnit assertThat
junit4.0 assertThat用法一般匹配符1、assertThat( testedNumber, allOf( greaterThan(8), lessThan(16) ) ); 注释： allOf匹配符表明如果接下来的所有条件必须都成立测试才通过，相当于“与”（&&） 2、assertThat( testedNumber, anyOf( g
android点滴2 gundumw100 应用服务器 android 网络应用 OS HTC
如何让Drawable绕着中心旋转？ Animation a = new RotateAnimation(0.0f, 360.0f, Animation.RELATIVE_TO_SELF, 0.5f, Animation.RELATIVE_TO_SELF,0.5f); a.setRepeatCount(-1); a.setDuration(1000); 如何控制Andro
超简洁的CSS下拉菜单 ini html Web 工作 html5 css
效果体验：http://hovertree.com/texiao/css/3.htmHTML文件： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>简洁的HTML+CSS下拉菜单-HoverTree</title>
kafka consumer防止数据丢失 kane_xie kafka offset commit
kafka最初是被LinkedIn设计用来处理log的分布式消息系统，因此它的着眼点不在数据的安全性（log偶尔丢几条无所谓），换句话说kafka并不能完全保证数据不丢失。尽管kafka官网声称能够保证at-least-once，但如果consumer进程数小于partition_num，这个结论不一定成立。考虑这样一个case，partiton_num=2
@Repository、@Service、@Controller 和 @Component mhtbbx DAO spring bean prototype
@Repository、@Service、@Controller 和 @Component 将类标识为Bean Spring 自 2.0 版本开始，陆续引入了一些注解用于简化 Spring 的开发。@Repository注解便属于最先引入的一批，它用于将数据访问层 (DAO 层 ) 的类标识为 Spring Bean。具体只需将该注解标注在 DAO类上即可。同时，为了让 Spring 能够扫描类
java 多线程高并发读写控制误区 qifeifei java thread
先看一下下面的错误代码，对写加了synchronized控制，保证了写的安全，但是问题在哪里呢？ public class testTh7 { private String data; public String read(){ System.out.println(Thread.currentThread().getName() + "read data "
mongodb replica set(副本集)设置步骤 tcrct java mongodb
网上已经有一大堆的设置步骤的了，根据我遇到的问题，整理一下，如下：首先先去下载一个mongodb最新版，目前最新版应该是2.6 cd /usr/local/bin wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.6.0.tgz tar -zxvf mongodb-linux-x86_64-2.6.0.t
rust学习笔记 wudixiaotie 学习笔记
1.rust里绑定变量是let，默认绑定了的变量是不可更改的，所以如果想让变量可变就要加上mut。 let x = 1; let mut y = 2; 2.match 相当于erlang中的case，但是case的每一项后都是分号，但是rust的match却是逗号。 3.match 的每一项最后都要加逗号，但是最后一项不加也不会报错，所有结尾加逗号的用法都是类似。 4.每个语句结尾都要加分