小强签名设计

SparkStreaming之基本数据源输入

本文大部分内容来自http://blog.csdn.net/legotime/article/details/51836036，其中部分内容做了添加和修改。

环境：
Redhat5.5 64位（我这里的Linux版本有点低，你可以再高一些）
spark-1.6.3-bin-hadoop2.6
scala-2.10.6
jdk-8u91-linux-x64
开发工具：scala-SDK-4.6.1（下载地址：http://scala-ide.org/）

输入DStreams表示从数据源获取的原始数据流。Spark Streaming拥有两类数据源
（1）基本源（Basic sources）：这些源在StreamingContext API中直接可用。例如文件系统、套接字连接、
Akka的actor等。
（2）高级源（Advanced sources）：这些源包括Kafka,Flume,Kinesis,Twitter等等。
1、基本数据源输入源码
SparkStream 对于外部的数据输入源，一共有下面几种：
（1）用户自定义的数据源：receiverStream
（2）根据TCP协议的数据源： socketTextStream、socketStream
（3）网络数据源：rawSocketStream
（4）hadoop文件系统输入源：fileStream、textFileStream、binaryRecordsStream
（5）其他输入源（队列形式的RDD）：queueStream

源代码路径：spark-1.6.3\streaming\src\main\scala\org\apache\spark\streaming

  /**
   * Create an input stream with any arbitrary user implemented receiver.
   * Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
   * @param receiver Custom implementation of Receiver
   */
  def receiverStream[T: ClassTag](receiver: Receiver[T]): ReceiverInputDStream[T] = {
    withNamedScope("receiver stream") {
      new PluggableInputDStream[T](this, receiver)
    }
  }

  /**
   * Create an input stream with any arbitrary user implemented actor receiver.
   * Find more details at: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
   * @param props Props object defining creation of the actor
   * @param name Name of the actor
   * @param storageLevel RDD storage level (default: StorageLevel.MEMORY_AND_DISK_SER_2)
   *
   * @note An important point to note:
   *       Since Actor may exist outside the spark framework, It is thus user's responsibility
   *       to ensure the type safety, i.e parametrized type of data received and actorStream
   *       should be same.
   */
  def actorStream[T: ClassTag](
      props: Props,
      name: String,
      storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2,
      supervisorStrategy: SupervisorStrategy = ActorSupervisorStrategy.defaultStrategy
    ): ReceiverInputDStream[T] = withNamedScope("actor stream") {
    receiverStream(new ActorReceiver[T](props, name, storageLevel, supervisorStrategy))
  }

  /**
   * Create a input stream from TCP source hostname:port. Data is received using
   * a TCP socket and the receive bytes is interpreted as UTF8 encoded `\n` delimited
   * lines.
   * @param hostname      Hostname to connect to for receiving data
   * @param port          Port to connect to for receiving data
   * @param storageLevel  Storage level to use for storing the received objects
   *                      (default: StorageLevel.MEMORY_AND_DISK_SER_2)
   */
  def socketTextStream(
      hostname: String,
      port: Int,
      storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
    ): ReceiverInputDStream[String] = withNamedScope("socket text stream") {
    socketStream[String](hostname, port, SocketReceiver.bytesToLines, storageLevel)
  }

  /**
   * Create a input stream from TCP source hostname:port. Data is received using
   * a TCP socket and the receive bytes it interepreted as object using the given
   * converter.
   * @param hostname      Hostname to connect to for receiving data
   * @param port          Port to connect to for receiving data
   * @param converter     Function to convert the byte stream to objects
   * @param storageLevel  Storage level to use for storing the received objects
   * @tparam T            Type of the objects received (after converting bytes to objects)
   */
  def socketStream[T: ClassTag](
      hostname: String,
      port: Int,
      converter: (InputStream) => Iterator[T],
      storageLevel: StorageLevel
    ): ReceiverInputDStream[T] = {
    new SocketInputDStream[T](this, hostname, port, converter, storageLevel)
  }

  /**
   * Create a input stream from network source hostname:port, where data is received
   * as serialized blocks (serialized using the Spark's serializer) that can be directly
   * pushed into the block manager without deserializing them. This is the most efficient
   * way to receive data.
   * @param hostname      Hostname to connect to for receiving data
   * @param port          Port to connect to for receiving data
   * @param storageLevel  Storage level to use for storing the received objects
   *                      (default: StorageLevel.MEMORY_AND_DISK_SER_2)
   * @tparam T            Type of the objects in the received blocks
   */
  def rawSocketStream[T: ClassTag](
      hostname: String,
      port: Int,
      storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER_2
    ): ReceiverInputDStream[T] = withNamedScope("raw socket stream") {
    new RawInputDStream[T](this, hostname, port, storageLevel)
  }

  /**
   * Create a input stream that monitors a Hadoop-compatible filesystem
   * for new files and reads them using the given key-value types and input format.
   * Files must be written to the monitored directory by "moving" them from another
   * location within the same file system. File names starting with . are ignored.
   * @param directory HDFS directory to monitor for new file
   * @tparam K Key type for reading HDFS file
   * @tparam V Value type for reading HDFS file
   * @tparam F Input format for reading HDFS file
   */
  def fileStream[
    K: ClassTag,
    V: ClassTag,
    F <: NewInputFormat[K, V]: ClassTag
  ] (directory: String): InputDStream[(K, V)] = {
    new FileInputDStream[K, V, F](this, directory)
  }

  /**
   * Create a input stream that monitors a Hadoop-compatible filesystem
   * for new files and reads them using the given key-value types and input format.
   * Files must be written to the monitored directory by "moving" them from another
   * location within the same file system.
   * @param directory HDFS directory to monitor for new file
   * @param filter Function to filter paths to process
   * @param newFilesOnly Should process only new files and ignore existing files in the directory
   * @tparam K Key type for reading HDFS file
   * @tparam V Value type for reading HDFS file
   * @tparam F Input format for reading HDFS file
   */
  def fileStream[
    K: ClassTag,
    V: ClassTag,
    F <: NewInputFormat[K, V]: ClassTag
  ] (directory: String, filter: Path => Boolean, newFilesOnly: Boolean): InputDStream[(K, V)] = {
    new FileInputDStream[K, V, F](this, directory, filter, newFilesOnly)
  }

  /**
   * Create a input stream that monitors a Hadoop-compatible filesystem
   * for new files and reads them using the given key-value types and input format.
   * Files must be written to the monitored directory by "moving" them from another
   * location within the same file system. File names starting with . are ignored.
   * @param directory HDFS directory to monitor for new file
   * @param filter Function to filter paths to process
   * @param newFilesOnly Should process only new files and ignore existing files in the directory
   * @param conf Hadoop configuration
   * @tparam K Key type for reading HDFS file
   * @tparam V Value type for reading HDFS file
   * @tparam F Input format for reading HDFS file
   */
  def fileStream[
    K: ClassTag,
    V: ClassTag,
    F <: NewInputFormat[K, V]: ClassTag
  ] (directory: String,
     filter: Path => Boolean,
     newFilesOnly: Boolean,
     conf: Configuration): InputDStream[(K, V)] = {
    new FileInputDStream[K, V, F](this, directory, filter, newFilesOnly, Option(conf))
  }

  /**
   * Create a input stream that monitors a Hadoop-compatible filesystem
   * for new files and reads them as text files (using key as LongWritable, value
   * as Text and input format as TextInputFormat). Files must be written to the
   * monitored directory by "moving" them from another location within the same
   * file system. File names starting with . are ignored.
   * @param directory HDFS directory to monitor for new file
   */
  def textFileStream(directory: String): DStream[String] = withNamedScope("text file stream") {
    fileStream[LongWritable, Text, TextInputFormat](directory).map(_._2.toString)
  }

  /**
   * Create an input stream that monitors a Hadoop-compatible filesystem
   * for new files and reads them as flat binary files, assuming a fixed length per record,
   * generating one byte array per record. Files must be written to the monitored directory
   * by "moving" them from another location within the same file system. File names
   * starting with . are ignored.
   *
   * '''Note:''' We ensure that the byte array for each record in the
   * resulting RDDs of the DStream has the provided record length.
   *
   * @param directory HDFS directory to monitor for new file
   * @param recordLength length of each record in bytes
   */
  def binaryRecordsStream(
      directory: String,
      recordLength: Int): DStream[Array[Byte]] = withNamedScope("binary records stream") {
    val conf = sc_.hadoopConfiguration
    conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, recordLength)
    val br = fileStream[LongWritable, BytesWritable, FixedLengthBinaryInputFormat](
      directory, FileInputDStream.defaultFilter: Path => Boolean, newFilesOnly = true, conf)
    val data = br.map { case (k, v) =>
      val bytes = v.getBytes
      require(bytes.length == recordLength, "Byte array does not have correct length. " +
        s"${bytes.length} did not equal recordLength: $recordLength")
      bytes
    }
    data
  }

  /**
   * Create an input stream from a queue of RDDs. In each batch,
   * it will process either one or all of the RDDs returned by the queue.
   *
   * NOTE: Arbitrary RDDs can be added to `queueStream`, there is no way to recover data of
   * those RDDs, so `queueStream` doesn't support checkpointing.
   *
   * @param queue      Queue of RDDs
   * @param oneAtATime Whether only one RDD should be consumed from the queue in every interval
   * @tparam T         Type of objects in the RDD
   */
  def queueStream[T: ClassTag](
      queue: Queue[RDD[T]],
      oneAtATime: Boolean = true
    ): InputDStream[T] = {
    queueStream(queue, oneAtATime, sc.makeRDD(Seq[T](), 1))
  }
  
  /**
   * Create an input stream from a queue of RDDs. In each batch,
   * it will process either one or all of the RDDs returned by the queue.
   *
   * NOTE: Arbitrary RDDs can be added to `queueStream`, there is no way to recover data of
   * those RDDs, so `queueStream` doesn't support checkpointing.
   *
   * @param queue      Queue of RDDs
   * @param oneAtATime Whether only one RDD should be consumed from the queue in every interval
   * @param defaultRDD Default RDD is returned by the DStream when the queue is empty.
   *                   Set as null if no RDD should be returned when empty
   * @tparam T         Type of objects in the RDD
   */
  def queueStream[T: ClassTag](
      queue: Queue[RDD[T]],
      oneAtATime: Boolean,
      defaultRDD: RDD[T]
    ): InputDStream[T] = {
    new QueueInputDStream(this, queue, oneAtATime, defaultRDD)
  }

2、实验
2.1 用户自定义的数据输入源实验
第一步：创建外部scoket端，数据流模式器，程序如下：
[hadoop@h71 q1]$ vi streamingSimulation.scala

import java.io.{PrintWriter}  
import java.net.ServerSocket  
import scala.io.Source

object streamingSimulation {  
  def index(n: Int) = scala.util.Random.nextInt(n)  

  def main(args: Array[String]) {  
    // 调用该模拟器需要三个参数，分为为文件路径、端口号和间隔时间（单位：毫秒）  
    if (args.length != 3) {  
      System.err.println("Usage:   ")  
      System.exit(1)  
    }  
  
    // 获取指定文件总的行数  
    val filename = args(0)  
    val lines = Source.fromFile(filename).getLines.toList  
    val filerow = lines.length  
  
    // 指定监听某端口，当外部程序请求时建立连接  
    val listener = new ServerSocket(args(1).toInt)  
  
    while (true) {  
      val socket = listener.accept()  
      new Thread() {  
        override def run = {  
          println("Got client connected from: " + socket.getInetAddress)  
          val out = new PrintWriter(socket.getOutputStream(), true)  
          while (true) {  
            Thread.sleep(args(2).toLong)  
            // 当该端口接受请求时，随机获取某行数据发送给对方  
            val content = lines(index(filerow))  
            println("-------------------------------------------")  
            println(s"Time: ${System.currentTimeMillis()}")  
            println("-------------------------------------------")  
            println(content)  
            out.write(content + '\n')  
            out.flush()  
          }  
          socket.close()  
        }  
      }.start()  
    }  
  }  
}

第二步：运行上述代码
[hadoop@h71 q1]$ scalac streamingSimulation.scala
[hadoop@h71 q1]$ scala streamingSimulation /home/hadoop/q1/Information 9999 1000
其中9999是端口号，1000（ms）是发送数据间隔时间，/home/hadoop/q1/Information是数据文件information的位置，information内部数据如下：
hello world
hello hadoop
hello hive

第三步：在eclipse中编写自己的receiver函数和SparkStreaming程序，程序如下：

import java.io.{BufferedReader, InputStreamReader}  
import java.net.Socket  
import java.nio.charset.StandardCharsets  

import org.apache.spark.{Logging, SparkConf}  
import org.apache.spark.storage.StorageLevel  
import org.apache.spark.streaming.{Seconds, StreamingContext}  
import org.apache.spark.streaming.receiver.Receiver  
  
/** 
  * Custom Receiver that receives data over a socket. Received bytes are interpreted as 
  * text and \n delimited lines are considered as records. They are then counted and printed. 
  * 
  * To run this on your local machine, you need to first run a Netcat server 
  *    `$ nc -lk 9999` 
  * and then run the example 
  *    `$ bin/run-example org.apache.spark.examples.streaming.CustomReceiver localhost 9999` 
  */  
object CustomReceiver {  
  def main(args: Array[String]) {  
    if (args.length < 2) {  
      System.err.println("Usage: CustomReceiver  ")  
      System.exit(1)  
    }  
  
    // Create the context with a 1 second batch size  
    val sparkConf = new SparkConf().setAppName("CustomReceiver").setMaster("local[4]")  
    val ssc = new StreamingContext(sparkConf, Seconds(1))  
  
    // Create an input stream with the custom receiver on target ip:port and count the  
    // words in input stream of \n delimited text (eg. generated by 'nc')  
    val lines = ssc.receiverStream(new CustomReceiver(args(0), args(1).toInt))  
    val words = lines.flatMap(_.split(" "))  
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)  
    wordCounts.print()  
    ssc.start()  
    ssc.awaitTermination()  
  }  
}  
  
class CustomReceiver(host: String, port: Int)  
  extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {  
  
  def onStart() {  
    // Start the thread that receives data over a connection  
    new Thread("Socket Receiver") {  
      override def run() { receive() }  
    }.start()  
  }  
  
  def onStop() {  
    // There is nothing much to do as the thread calling receive()  
    // is designed to stop by itself isStopped() returns false  
  }  
  
  /** Create a socket connection and receive data until receiver is stopped */  
  private def receive() {  
    var socket: Socket = null  
    var userInput: String = null  
    try {  
      logInfo("Connecting to " + host + ":" + port)  
      socket = new Socket(host, port)  
      logInfo("Connected to " + host + ":" + port)  
      val reader = new BufferedReader(  
        new InputStreamReader(socket.getInputStream(), StandardCharsets.UTF_8))  
      userInput = reader.readLine()  
      while(!isStopped && userInput != null) {  
        store(userInput)  
        userInput = reader.readLine()  
      }  
      reader.close()  
      socket.close()  
      logInfo("Stopped receiving")  
      restart("Trying to connect again")  
    } catch {  
      case e: java.net.ConnectException =>  
        restart("Error connecting to " + host + ":" + port, e)  
      case t: Throwable =>  
        restart("Error receiving data", t)  
    }  
  }  
}

第四步：用scala-SDK-4.6.1将代码打成jar包并上传到/home/hadoop/spark-1.6.3-bin-hadoop2.6目录下并运行程序
[hadoop@h71 spark-1.6.3-bin-hadoop2.6]$ bin/spark-submit --class CustomReceiver haha.jar h71 9999
[hadoop@h71 q1]$ scala streamingSimulation Information 9999 1000

Got client connected from: /192.168.8.71
-------------------------------------------         -------------------------------------------
Time: 1489751287491                                 Time: 1489751428000 ms
-------------------------------------------         -------------------------------------------    
hello world                                  ---->  (hello,1)
                                                    (world,1)           
-------------------------------------------         -------------------------------------------    
Time: 1489751288494                                 Time: 1489751429000 ms    
-------------------------------------------         -------------------------------------------    
hello hadoop                                 ---->  (hadoop,1)
                                                    (hello,1)
-------------------------------------------         -------------------------------------------    
Time: 1489751289496                                 Time: 1489751430000 ms    
-------------------------------------------         -------------------------------------------    
hello world                                  ---->  (hello,1)
                                                    (world,1)

2.2 根据TCP协议的数据源实验
(1)socketTextStream函数
第一步：数据模拟，参考前面
第二步：编写SparkStreaming程序，程序如下：

import org.apache.log4j.{Level, Logger}  
import org.apache.spark.streaming.{Seconds, StreamingContext}  
import org.apache.spark.{SparkConf, SparkContext}  

object TCPOnStreaming {  
  def main(args: Array[String]) {  

    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)  
    Logger.getLogger("org.eclipse.jetty.Server").setLevel(Level.OFF)  

    val conf = new SparkConf().setAppName("TCPOnStreaming example").setMaster("local[4]")  
    val sc = new SparkContext(conf)  
    val ssc = new StreamingContext(sc,Seconds(2))  

    //set the Checkpoint directory  
    ssc.checkpoint("/Res")  

    //get the socket Streaming data  
    val socketStreaming = ssc.socketTextStream("h71",9999)  

    val data = socketStreaming.map(x =>(x,1))  
    data.print()  

    ssc.start()  
    ssc.awaitTermination()  
  }  
}

第三步，打成jar包并运行程序，结果如下：

[hadoop@h71 spark-1.6.3-bin-hadoop2.6]$ bin/spark-submit --class TCPOnStreaming haha.jar
-------------------------------------------
Time: 1489751944000 ms
-------------------------------------------
(hello hadoop,1)
(hello hive,1)
17/03/17 19:59:04 INFO WriteAheadLogManager  for Thread: Attempting to clear 0 old log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489751942000: 
-------------------------------------------
Time: 1489751946000 ms
-------------------------------------------
(hello hadoop,1)
(hello hive,1)
17/03/17 19:59:06 INFO WriteAheadLogManager  for Thread: Attempting to clear 0 old log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489751944000: 
-------------------------------------------
Time: 1489751948000 ms
-------------------------------------------
(hello world,1)
(hello world,1)
17/03/17 19:59:08 INFO WriteAheadLogManager  for Thread: Attempting to clear 0 old log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489751946000: 
-------------------------------------------
Time: 1489751950000 ms
-------------------------------------------
(hello world,1)
(hello world,1)

数据端的数据：
[hadoop@h71 q1]$ scala streamingSimulation Information 9999 1000

Got client connected from: /192.168.8.71
-------------------------------------------
Time: 1489752463871
-------------------------------------------
hello hadoop
-------------------------------------------
Time: 1489752464873
-------------------------------------------
hello hive
-------------------------------------------
Time: 1489752465875
-------------------------------------------
hello hadoop
-------------------------------------------
Time: 1489752466880
-------------------------------------------
hello hive
-------------------------------------------
Time: 1489752467883
-------------------------------------------
hello world
-------------------------------------------
Time: 1489752468884
-------------------------------------------
hello world
-------------------------------------------
Time: 1489752469887
-------------------------------------------
hello world
-------------------------------------------
Time: 1489752470890
-------------------------------------------
hello world

(2)socketStream函数
第一步：数据模拟，参考前面
第二步：编写SparkStreaming程序，程序如下：

import java.io.{InputStream}

import org.apache.log4j.{Level, Logger}  
import org.apache.spark.storage.StorageLevel  
import org.apache.spark.streaming.{Seconds, StreamingContext}  
import org.apache.spark.{SparkConf, SparkContext}  

object socketStreamData {  
  def main(args: Array[String]) {  

    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)  
    Logger.getLogger("org.eclipse.jetty.Server").setLevel(Level.OFF)  
  
    val conf = new SparkConf().setAppName("socketStreamData example").setMaster("local[4]")  
    val sc = new SparkContext(conf)  
    val ssc = new StreamingContext(sc,Seconds(2))  
  
    //set the Checkpoint directory  
    ssc.checkpoint("/Res")  
  
    val SocketData = ssc.socketStream[String]("h71",9999,myDeserialize,StorageLevel.MEMORY_AND_DISK_SER )  
  
    //val data = SocketData.map(x =>(x,1))  
    //data.print()  
    SocketData.print()  
  
    ssc.start()  
    ssc.awaitTermination()  
  }  
  
  def myDeserialize(data:InputStream): Iterator[String]={  
    data.read().toString.map( x => x.hashCode().toString).iterator  
  }  
}

第三步，运行程序，结果如下：

[hadoop@h71 spark-1.6.3-bin-hadoop2.6]$ bin/spark-submit --class socketStreamData haha.jar
-------------------------------------------
Time: 1489752826000 ms
-------------------------------------------
49
48
52
17/03/17 20:13:46 INFO WriteAheadLogManager  for Thread: Attempting to clear 1 old log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489752824000: hdfs://h71:9000/Res/receivedBlockMetadata/log-1489752464020-1489752524020
17/03/17 20:13:46 INFO WriteAheadLogManager  for Thread: Cleared log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489752824000
-------------------------------------------
Time: 1489752828000 ms
-------------------------------------------
17/03/17 20:13:48 INFO WriteAheadLogManager  for Thread: Attempting to clear 0 old log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489752826000: 
17/03/17 20:13:48 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Socket data stream had no more data
-------------------------------------------
Time: 1489752830000 ms
-------------------------------------------
49
48
52
17/03/17 20:13:50 INFO WriteAheadLogManager  for Thread: Attempting to clear 0 old log files in hdfs://h71:9000/Res/receivedBlockMetadata older than 1489752828000: 
17/03/17 20:13:51 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Socket data stream had no more data
-------------------------------------------
Time: 1489752832000 ms
-------------------------------------------
49
48
52

（我这里没有搞明白这些数字是咋么来的，并且为什么三次都是49、48、52，原文中我见有时候是三个数字有时候是两个啊）

数据端的数据：
-------------------------------------------
Time: 1489752834427
-------------------------------------------
hello hive
-------------------------------------------
Time: 1489752834476
-------------------------------------------
hello world
-------------------------------------------
Time: 1489752834478
-------------------------------------------
hello hive

2.3 网络数据源：rawSocketStream

import org.apache.spark.SparkConf  
import org.apache.spark.storage.StorageLevel  
import org.apache.spark.streaming._  
import org.apache.spark.util.IntParam  
  
/** 
 * Receives text from multiple rawNetworkStreams and counts how many '\n' delimited 
 * lines have the word 'the' in them. This is useful for benchmarking purposes. This 
 * will only work with spark.streaming.util.RawTextSender running on all worker nodes 
 * and with Spark using Kryo serialization (set Java property "spark.serializer" to 
 * "org.apache.spark.serializer.KryoSerializer"). 
 * Usage: RawNetworkGrep     
 *    is the number rawNetworkStreams, which should be same as number 
 *               of work nodes in the cluster 
 *    is "localhost". 
 *    is the port on which RawTextSender is running in the worker nodes. 
 *    is the Spark Streaming batch duration in milliseconds. 
 */  
object RawNetworkGrep {  
  def main(args: Array[String]) {  
    if (args.length != 4) {  
      System.err.println("Usage: RawNetworkGrep    ")  
      System.exit(1)  
    }  
  
    StreamingExamples.setStreamingLogLevels()  
  
    val Array(IntParam(numStreams), host, IntParam(port), IntParam(batchMillis)) = args  
    val sparkConf = new SparkConf().setAppName("RawNetworkGrep")  
    // Create the context  
    val ssc = new StreamingContext(sparkConf, Duration(batchMillis))  
  
    val rawStreams = (1 to numStreams).map(_ =>  
      ssc.rawSocketStream[String](host, port, StorageLevel.MEMORY_ONLY_SER_2)).toArray  
    val union = ssc.union(rawStreams)  
    union.filter(_.contains("the")).count().foreachRDD(r =>  
      println("Grep count: " + r.collect().mkString))  
    ssc.start()  
    ssc.awaitTermination()  
  }
}

注：这个我把代码导入eclipse中报错，原文中也没有详细说明，这个功能我没有成功试验

2.4 hadoop文件系统输入源：fileStream、textFileStream、binaryRecordsStream
(1)fileStream函数
(2)textFileStream函数
第一步、写好SparkStreaming程序，并启动

import org.apache.log4j.{Level, Logger}  
import org.apache.spark.streaming.{Seconds, StreamingContext}  
import org.apache.spark.{SparkConf, SparkContext}  

object fileStreamData {  
  def main(args: Array[String]) {  
    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)  
    Logger.getLogger("org.eclipse.jetty.Server").setLevel(Level.OFF)  

    val conf = new SparkConf().setAppName("fileStreamData").setMaster("local[2]")  
    val sc =new SparkContext(conf)  
    val ssc = new StreamingContext(sc, Seconds(2))  

//    如果想读取Linux本地目录的话，要这样写：val lines = ssc.textFileStream("file:///in/")
    val lines = ssc.textFileStream("hdfs://h71:9000/in/")
    val wordCount = lines.flatMap(_.split(" ")).map(x => (x,1)).reduceByKey(_+_)  
    wordCount.print()  

    ssc.start()  
    ssc.awaitTermination()  
  }  
}

注：这不是数据来源是hadoop吗，不知为何原博主在代码中写的是Linux本地文件目录（val lines = ssc.textFileStream("/root/application/dataDir/")），并且按原博主的论述也无法成功啊。然后就是代码中注释掉的
//fileStream 用法
//val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("hdfs:///examples/").map{ case (x, y) => (x.toString, y.toString) }
//val lines = ssc.fileStream[LongWritable, Text, TextInputFormat]("/root/application/dataDir/").map{ case (x, y) => (x.toString, y.toString) }
//lines.print()
我没看懂也没有试验成功，不知原博主是什么用意

第二步：运行程序，结果如下：
[hadoop@h71 spark-1.6.3-bin-hadoop2.6]$ bin/spark-submit --class fileStreamData haha.jar
并将数据上传到hadoop相应目录：
[hadoop@h71 q1]$ hadoop fs -put Information /in

控制台打印信息：

-------------------------------------------
Time: 1489753630000 ms
-------------------------------------------

17/03/17 20:27:12 INFO input.FileInputFormat: Total input paths to process : 1
-------------------------------------------
Time: 1489753632000 ms
-------------------------------------------
(hive,1)
(hello,3)
(world,1)
(hadoop,1)

-------------------------------------------
Time: 1489753634000 ms
-------------------------------------------

2.5 其他输入源（队列形式的RDD）：queueStream

import scala.collection.mutable.Queue  
  
import org.apache.spark.SparkConf  
import org.apache.spark.rdd.RDD  
import org.apache.spark.streaming.{Seconds, StreamingContext}  
  
object QueueStream {  
  def main(args: Array[String]) {  
  
    val sparkConf = new SparkConf().setAppName("QueueStream").setMaster("local[4]")  
    // Create the context  
    val ssc = new StreamingContext(sparkConf, Seconds(1))  
  
    // Create the queue through which RDDs can be pushed to  
    // a QueueInputDStream  
    val rddQueue = new Queue[RDD[Int]]()  
  
    // Create the QueueInputDStream and use it do some processing  
    val inputStream = ssc.queueStream(rddQueue)  
    val mappedStream = inputStream.map(x => (x % 10, 1))  
    val reducedStream = mappedStream.reduceByKey(_ + _)  
    reducedStream.print()  
    ssc.start()  
  
    // Create and push some RDDs into rddQueue  
    for (i <- 1 to 30) {  
      rddQueue.synchronized {  
        rddQueue += ssc.sparkContext.makeRDD(1 to 1000, 10)  
      }  
      Thread.sleep(1000)  
    }  
    ssc.stop()  
  }  
}

运行程序：
[hadoop@h71 spark-1.6.3-bin-hadoop2.6]$ bin/spark-submit --class QueueStream haha.jar

结果：
-------------------------------------------
Time: 1489753774000 ms
-------------------------------------------
(4,100)
(0,100)
(8,100)
(1,100)
(9,100)
(5,100)
(6,100)
(2,100)
(3,100)
(7,100)

你可能感兴趣的:(spark,Scala)

使用Docker搭建Flink集群 O_1CxH Flink大数据 Kafka大数据 docker flink 容器
目录使用Docker搭建Flink集群docker-compose一键搭建步骤附录参考资料使用Docker搭建Flink集群在学习大数据框架的时候，需要一个真实的环境。我们知道，像spark、flink这些计算框架都有多种运行模式：在本地使用多线程模拟集群真正的分布式集群如果直接在IDE（Intellj）里面编译和运行写好的程序，实际上是用的前一种运行模式；如果想尝试真正的生产环境中任务的提交和管
Spark 和 Flink 信徒_ spark flink 大数据
Spark和Flink都是目前流行的大数据处理引擎，但它们在架构设计、应用场景、性能和生态方面有较大区别。以下是详细对比：1.架构与核心概念方面ApacheSparkApacheFlink计算模型微批（Micro-Batch）为主，但支持结构化流（StructuredStreaming）原生流（TrueStreaming），基于事件驱动处理方式以RDD、DataFrame/Dataset作为核心抽
spark任务运行冰火同学 Spark spark 大数据分布式
运行环境在这里插入代码片[root@hadoop000conf]#java-versionjavaversion"1.8.0_144"Java(TM)SERuntimeEnvironment(build1.8.0_144-b01)[root@hadoop000conf]#echo$JAVA_HOME/home/hadoop/app/jdk1.8.0_144[root@hadoop000conf]#
【Redis】golang操作Redis基础入门寸铁 go 数据库 Redis redis golang 数据库 CRUD 基本操作分布式键值对
【Redis】golang操作Redis基础入门大家好我是寸铁总结了一篇【Redis】golang操作Redis基础入门sparkles:喜欢的小伙伴可以点点关注Redis的作用Redis（RemoteDictionaryServer）是一个开源的内存数据库，它主要用于存储键值对，并提供多种数据结构的支持。Redis的主要作用包括：1.缓存:Redis可以作为缓存系统，将常用的数据缓存在内存中，以
hive spark读取hive hbase外表报错分析和解决 spring208208 hive hive spark hbase
问题现象使用Sparkshell操作hive关联Hbase的外表导致报错；hive使用tez引擎操作关联Hbase的外表时报错。问题1：使用tez或spark引擎，在hive查询时只要关联hbase的hive表就会有问题其他表正常。“org.apache.hadoop.hbase.client.RetriesExhaustedException:Can’tgetthelocations”问题2：s
spark-广播变量哈哈哈哈q +spark hdfs hadoop 大数据 spark
当本地数据极大的时候，可以使用广播变量，使得减少内存。本地集合对象和分布式集合对象（RDD）进行关联的时候，需要将本地集合对象广播变量。本地的数据传输到集群上，会发到每一个线程，每一个分区。每一个进程executor，有多个线程分区，进程内的线程数据共享因此，给每一个线程发送数据会导致数据占用，浪费资源。所有，出现了广播变量，使得只发送给进程代码使用：broadcast=sc.broadcast(
探索数据云的无缝桥梁：Apache Spark 与 Snowflake 的完美结合窦育培
探索数据云的无缝桥梁：ApacheSpark与Snowflake的完美结合spark-snowflakeSnowflakeDataSourceforApacheSpark.项目地址:https://gitcode.com/gh_mirrors/sp/spark-snowflake项目介绍在大数据处理的浩瀚宇宙中，Snowflake以其独特的云数据仓库能力闪耀，而ApacheSpark则是数据分析和
maven插件学习(maven-shade-plugin和maven-antrun-plugin插件) catcher92 java maven maven 学习大数据
整合spark3.3.x和hive2.1.1-cdh6.3.2碰到个问题，就是spark官方支持的hive是2.3.x，但是cdh中的hive确是2.1.x的，项目中又计划用spark-thrift-server，导致编译过程中有部分报错。其中OperationLog这个类在hive2.3中新增加了几个方法，导致编译报错。这个时候有两种解决办法：修改spark源码，注释掉调用OperationLo
使用SparkLLM实现智能聊天：技术原理与实战演示 shuoac java
在本篇文章中，我们将探讨如何使用iFlyTek的SparkLLM模型来实现智能聊天功能。我们将详细介绍SparkLLM的技术背景、核心原理，并通过实际代码展示如何进行实现。另外，还会分析应用场景并给出一些实践建议。技术背景介绍SparkLLM是由iFlyTek提供的一种强大的语言模型，支持多种语言生成任务。它能够理解并生成自然语言，适用于对话系统、内容生成、智能客服等场景。核心原理解析SparkL
Spark 性能优化（三）：RBO 与 CBO LevenBigData spark 性能调优 spark 性能优化 ajax
1.RBO的核心概念在ApacheSpark的查询优化过程中，规则优化（Rule-BasedOptimization,RBO）是Catalyst优化器的一个关键组成部分。它主要依赖于一组固定的规则进行优化，而不是基于统计信息（如CBO-Cost-BasedOptimization）。RBO主要通过一系列逻辑规则（LogicalRules）和物理规则（PhysicalRules）来转换和优化查询计划
python 并行框架_基于python的高性能实时并行机器学习框架之Ray介绍 weixin_39778582 python 并行框架
前言加州大学伯克利分校实时智能安全执行实验室(RISELab)的研究人员已开发出了一种新的分布式框架，该框架旨在让基于Python的机器学习和深度学习工作负载能够实时执行，并具有类似消息传递接口(MPI)的性能和细粒度。这种框架名为Ray，看起来有望取代Spark，业界认为Spark对于一些现实的人工智能应用而言速度太慢了;过不了一年，Ray应该会准备好用于生产环境。目前ray已经发布了0.3.0
java获取hive表所有字段,Hive Sql从表中动态获取空列计数拾亿年 java获取hive表所有字段
我正在使用datastaxspark集成和sparkSQLthrift服务器,它为我提供了一个HiveSQL接口来查询Cassandra中的表.我的数据库中的表是动态创建的,我想要做的是仅根据表名在表的每列中获取空值的计数.我可以使用describedatabase.table获取列名,但在hiveSQL中,如何在另一个为所有列计数null的select查询中使用其输出.更新1：使用Dudu的解决
PySpark查询Dataframe中包含乱码的数据记录的方法 weixin_30777913 python 大数据 spark
首先，用PySpark获取Dataframe中所有非ASCII字符，找到其中的非乱码字符。frompyspark.sqlimportSparkSessionfrompyspark.sql.functionsimportcol,concat_ws,explode,split,coalesce,litfrompyspark.sql.typesimportStringTypespark=SparkSes
spark streaming基础操作天选之子123 大数据 spark 大数据分布式
sparkstreaming基础操作一、什么是sparkstreamingSparkStreaming用于流式数据的处理。SparkStreaming使用离散化流(discretized作为抽象表示，叫作DStream。DStream是随时间推移而收到的数据的序列。在内部，每个时间区间收到的数据都作为RDD存在，而DStream是由这些RDD所组成的序列(因此得名“离散化”)。简单来说，DStre
scala kotlin比较_追随 Kotlin/Scala，看 Java 12-15 的现代语言特性 weixin_39605296 scala kotlin比较 scala list 接受java string
本文原发于我的个人博客：https://hltj.me/java/2020/06/14/java-12-15-lang-features.html。本副本只用于知乎，禁止第三方转载。Java14发布已经过去了三个月，Java15目前也已经到了“RampdownPhaseOne”阶段，其新特性均已敲定。由于12-15都是短期版本，无需考虑也不应该将其用于生产环境。但可以提前了解新特性，以免在下一个L
java 协程 scala_追随 Kotlin/Scala，看 Java 12-15 的现代语言特性小田linda java 协程 scala
Java14发布已经过去了三个月，Java15目前也已经到了“RampdownPhaseOne”阶段，其新特性均已敲定。由于12-15都是短期版本，无需考虑也不应该将其用于生产环境。但可以提前了解新特性，以免在下一个LTS(Java17)正式发布时毫无心理准备。Java12-15引入了一系列改进，本文只讨论语言层面的新特性，它们看起来似曾相识——没错，这些特性让人感觉Java在沿Kotlin/Sc
云计算服务中的“无缝扩展”是什么意思云上的阿七云计算
“无缝扩展”（SeamlessScalability）是云计算服务中的一个重要概念，指的是云平台能够根据需求变化自动、平滑地扩展或缩减资源，而不影响系统的正常运行或用户体验。简单来说，就是云服务可以在负载增加或减少时，灵活地调整计算资源、存储容量或网络带宽，且这一过程对用户来说是透明的，没有明显的中断或影响。为什么“无缝扩展”很重要？在现代的云计算环境中，应用和服务的负载会随着时间、流量或业务需求
timescaladb时序数据库高可用docker镜像使用 handsomestWei 数据库时序数据库 docker 数据库 timescaladb postgresql
timescaladb时序数据库高可用docker镜像使用timescaladb时序数据库高可用，基于bitnami/postgresql-repmgrdocker镜像制作，实现数据同步和故障自动转移主备切换。使用示例参考，附dockercompose配置例。pg-0:image:wjy2020/timescaledb-repmgr:pg14.15-ts2.17.2container_name:"
keepalived+timescaladb主备切换高可用方案 handsomestWei 数据库 keepalived timescaladb postgresql 数据库高可用
keepalived+timescaladb主备切换高可用方案环境和组件依赖ubuntu22.04，docker引擎keepalivedv2.2.4timescaledbdocker镜像wjy2020/timescaledb-repmgr:pg14.15-ts2.17.2，镜像使用参考方案思路在双机分别部署这两个组件，keepalived定时检测timescaladb数据库的主备状态，当数据库状态
flink实时集成利器 - apache seatunnel - 核心架构详解 24k小善 flink apache 架构
SeaTunnel（原名Waterdrop）是一个分布式、高性能、易扩展的数据集成平台，专注于大数据领域的数据同步、数据迁移和数据转换。它支持多种数据源和数据目标，并可以与ApacheFlink、Spark等计算引擎集成。以下是SeaTunnel的核心架构详解：SeaTunnel核心架构SeaTunnel的架构设计分为以下几个核心模块：1.数据源（Source）功能：负责从外部系统读取数据。支持的
DS缩写乱争：当小海豚撞上AI顶流，技术圈也逃不过“撞名”修罗场数据库
DS缩写风云：从“小海豚”到“深度求索”的魔幻现实曾几何时，技术圈提到DS，人们脑海中浮现的是一只灵动的“小海豚”——ApacheDolphinScheduler（简称DS）。这个2019年诞生的分布式任务调度系统，凭借可视化DAG界面、多租户支持和对Hadoop/Spark生态的深度集成，一度是大数据工程师的“梦中情工”。然而，命运的齿轮在2025年初突然加速转动：杭州AI公司DeepSeek（
如何在Java中实现高效的分布式计算框架：从Hadoop到Spark 省赚客app开发者 java hadoop spark
如何在Java中实现高效的分布式计算框架：从Hadoop到Spark大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天我们来探讨如何在Java中实现高效的分布式计算框架，重点介绍Hadoop和Spark这两个在大数据处理领域中广泛使用的技术。一、Hadoop：基础分布式计算框架Hadoop是一个开源的分布式计算框架，最早由Apache开发，旨在处理海量数据。它的核心
Spark源码分析数据年轮 Spark spark源码 spark 大数据源码分析
过程描述:1.通过Shell脚本启动Master，Master类继承Actor类，通过ActorySystem创建并启动。2.通过Shell脚本启动Worker，Worker类继承Actor类，通过ActorySystem创建并启动。3.Worker通过Akka或者Netty发送消息向Master注册并汇报自己的资源信息(内存以及CPU核数等)，以后就是定时汇报，保持心跳。4.Master接受消息
大数据开发语言Scala入门编程小郭 scala 开发语言
大数据开发语言Scala的详解一、引言在大数据和云计算时代，数据的处理和分析变得尤为重要。为了有效地处理和分析这些数据，需要一种强大的编程语言。Scala作为一种多范式的编程语言，结合了面向对象编程和函数式编程的优点，为大数据开发提供了强大的支持。本文将详细介绍Scala及其优缺点、应用场景等。二、Scala概述Scala是由MartinOdersky教授于2003年开发的编程语言，旨在集成面向对
使用Scala进行Web开发：构建你的第一个Web应用帅气而伟大 scala 开发语言后端
在Java领域中，Spring框架以其全面的功能和灵活的配置赢得了广泛的应用。然而，当遇见Scala这一兼具函数式编程和面向对象特性的强大语言时，一个新的世界被开启——这就是SpringScala。虽然VMware,Inc.不再对其进行主动维护，但庆幸的是，社区的力量让这个项目得以延续，由PaulSnively接棒，在http://hub.darcs.net/psnively/spring-sca
Spark 源码 | 脚本分析总结董可伦 spark 源码脚本
前言最初是想学习一下Spark提交流程的源码，比如SparkOnYarn、Standalone。之前只是通过网上总结的文章大概了解整体的提交流程，但是每个文章描述的又不太一样，弄不清楚到底哪个说的准确，比如Client和CLuster模式的区别，Driver到底是干啥的，是如何定义的，为了彻底弄清楚这些疑问，所以决定学习一下相关的源码。因为不管是服务启动还是应用程序启动，都是通过脚本提交的，所以我
Scala语言的系统运维韦慕霖包罗万象 golang 开发语言后端
Scala语言的系统运维引言在今天的科技发展时代，软件系统的复杂性和规模不断增加，因此系统运维的管理和监控显得尤为重要。在众多编程语言中，Scala因其高度的表达力和强大的性能而受到越来越多开发者和运维人员的青睐。本文将探讨Scala语言在系统运维中的应用，涵盖其优势、常用工具及最佳实践等方面，旨在为读者提供一个全面的了解。1.Scala语言概述Scala（可伸缩语言）是运行在Java虚拟机（JV
Scala语言的循环实现赵旖琅包罗万象 golang 开发语言后端
Scala语言中的循环实现Scala是一种现代编程语言，它结合了面向对象和函数式编程的特点，广泛应用于大数据处理、分布式计算和Web开发等领域。循环是程序设计中不可或缺的一部分，在Scala中也有多种方式来实现循环。这篇文章将深入探讨Scala中的循环实现，包括基本的循环结构、函数式循环、递归以及一些最佳实践。1.基本的循环结构在Scala中，最基本的循环结构是for循环和while循环。虽然Sc
3.1初识Flink-wordcount orange大数据技术探索者 flink废弃已迁移走 flink
pom要注意scala版本和你的本地的scala的版本对不上可能会出问题<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-in
Flink_CEP关于订单超时事件处理 weixin_43003792 Flink
packagecomimportjava.utilimportorg.apache.flink.cep.{PatternSelectFunction,PatternTimeoutFunction}importorg.apache.flink.cep.scala.{CEP,PatternStream}importorg.apache.flink.cep.scala.pattern.Patternim
HQL之投影查询归来朝歌 HQL Hibernate 查询语句投影查询
在HQL查询中，常常面临这样一个场景，对于多表查询，是要将一个表的对象查出来还是要只需要每个表中的几个字段，最后放在一起显示？针对上面的场景，如果需要将一个对象查出来： HQL语句写“from 对象”即可 Session session = HibernateUtil.openSession();
Spring整合redis bylijinnan redis
pom.xml <dependencies>  <dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-redi
org.hibernate.NonUniqueResultException: query did not return a unique result: 2 0624chenhong Hibernate
参考：http://blog.csdn.net/qingfeilee/article/details/7052736 org.hibernate.NonUniqueResultException: query did not return a unique result: 2 在项目中出现了org.hiber
android动画效果不懂事的小屁孩 android动画
前几天弄alertdialog和popupwindow的时候，用到了android的动画效果，今天专门研究了一下关于android的动画效果，列出来，方便以后使用。 Android 平台提供了两类动画。一类是Tween动画，就是对场景里的对象不断的进行图像变化来产生动画效果（旋转、平移、放缩和渐变）。第二类就是 Frame动画，即顺序的播放事先做好的图像，与gif图片原理类似。
js delete 删除机理以及它的内存泄露问题的解决方案换个号韩国红果果 JavaScript
delete删除属性时只是解除了属性与对象的绑定，故当属性值为一个对象时，删除时会造成内存泄露（其实还未删除）举例： var person={name:{firstname:'bob'}} var p=person.name delete person.name p.firstname -->'bob' // 依然可以访问p.firstname，存在内存泄露
Oracle将零干预分析加入网络即服务计划蓝儿唯美 oracle
由Oracle通信技术部门主导的演示项目并没有在本月较早前法国南斯举行的行业集团TM论坛大会中获得嘉奖。但是，Oracle通信官员解雇致力于打造一个支持零干预分配和编制功能的网络即服务（NaaS）平台，帮助企业以更灵活和更适合云的方式实现通信服务提供商（CSP）的连接产品。这个Oracle主导的项目属于TM Forum Live!活动上展示的Catalyst计划的19个项目之一。Catalyst计
spring学习——springmvc（二） a-john springMVC
Spring MVC提供了非常方便的文件上传功能。 1，配置Spring支持文件上传： DispatcherServlet本身并不知道如何处理multipart的表单数据，需要一个multipart解析器把POST请求的multipart数据中抽取出来，这样DispatcherServlet就能将其传递给我们的控制器了。为了在Spring中注册multipart解析器，需要声明一个实现了Mul
POJ-2828-Buy Tickets aijuans ACM_POJ
POJ-2828-Buy Tickets http://poj.org/problem?id=2828 线段树，逆序插入 #include<iostream>#include<cstdio>#include<cstring>#include<cstdlib>using namespace std;#define N 200010struct
Java Ant build.xml详解 asia007 build.xml
1,什么是antant是构建工具2,什么是构建概念到处可查到，形象来说，你要把代码从某个地方拿来，编译，再拷贝到某个地方去等等操作，当然不仅与此，但是主要用来干这个3,ant的好处跨平台 --因为ant是使用java实现的，所以它跨平台使用简单--与ant的兄弟make比起来语法清晰--同样是和make相比功能强大--ant能做的事情很多，可能你用了很久，你仍然不知道它能有
android按钮监听器的四种技术百合不是茶 android xml配置监听器实现接口
android开发中经常会用到各种各样的监听器,android监听器的写法与java又有不同的地方; 1,activity中使用内部类实现接口 ,创建内部类实例使用add方法与java类似创建监听器的实例 myLis lis = new myLis(); 使用add方法给按钮添加监听器
软件架构师不等同于资深程序员 bijian1013 程序员架构师架构设计
本文的作者Armel Nene是ETAPIX Global公司的首席架构师，他居住在伦敦，他参与过的开源项目包括 Apache Lucene,，Apache Nutch， Liferay 和 Pentaho等。如今很多的公司
TeamForge Wiki Syntax & CollabNet User Information Center sunjing TeamForge How do Attachement Anchor Wiki Syntax
the CollabNet user information center http://help.collab.net/ How do I create a new Wiki page? A CollabNet TeamForge project can have any number of Wiki pages. All Wiki pages are linked, and
【Redis四】Redis数据类型 bit1129 redis
概述 Redis是一个高性能的数据结构服务器，称之为数据结构服务器的原因是，它提供了丰富的数据类型以满足不同的应用场景，本文对Redis的数据类型以及对这些类型可能的操作进行总结。 Redis常用的数据类型包括string、set、list、hash以及sorted set.Redis本身是K/V系统，这里的数据类型指的是value的类型，而不是key的类型，key的类型只有一种即string
SSH2整合-附源码白糖_ eclipse spring tomcat Hibernate Google
今天用eclipse终于整合出了struts2+hibernate+spring框架。我创建的是tomcat项目，需要有tomcat插件。导入项目以后，鼠标右键选择属性，然后再找到“tomcat”项，勾选一下“Is a tomcat project”即可。具体方法见源码里的jsp图片，sql也在源码里。补充1：项目中部分jar包不是最新版的，可能导
[转]开源项目代码的学习方法 braveCS 学习方法
转自： http://blog.sina.com.cn/s/blog_693458530100lk5m.html http://www.cnblogs.com/west-link/archive/2011/06/07/2074466.html 1）阅读features。以此来搞清楚该项目有哪些特性2）思考。想想如果自己来做有这些features的项目该如何构架3）下载并安装d
编程之美-子数组的最大和（二维） bylijinnan 编程之美
package beautyOfCoding; import java.util.Arrays; import java.util.Random; public class MaxSubArraySum2 { /** * 编程之美子数组之和的最大值（二维） */ private static final int ROW = 5; private stat
读书笔记-3 chengxuyuancsdn jquery笔记 resultMap配置 ibatis一对多配置
1、resultMap配置 2、ibatis一对多配置 3、jquery笔记 1、resultMap配置当<select resultMap="topic_data"> <resultMap id="topic_data">必须一一对应。 (1)<resultMap class="tblTopic&q
[物理与天文]物理学新进展 comsci
如果我们必须获得某种地球上没有的矿石,才能够进行某些能量输出装置的设计和建造,而要获得这种矿石,又必须首先进行深空探测,而要进行深空探测,又必须获得这种能量输出装置,这个矛盾的循环,会导致地球联盟在与宇宙文明建立关系的时候,陷入困境怎么办呢?
Oracle 11g新特性:Automatic Diagnostic Repository daizj oracle ADR
Oracle Database 11g的FDI（Fault Diagnosability Infrastructure）是自动化诊断方面的又一增强。 FDI的一个关键组件是自动诊断库（Automatic Diagnostic Repository-ADR）。在oracle 11g中，alert文件的信息是以xml的文件格式存在的，另外提供了普通文本格式的alert文件。这两份log文
简单排序:选择排序 dieslrae 选择排序
public void selectSort(int[] array){ int select; for(int i=0;i<array.length;i++){ select = i; for(int k=i+1;k<array.leng
C语言学习六指针的经典程序，互换两个数字 dcj3sjt126com c
示例程序，swap_1和swap_2都是错误的，推理从1开始推到2，2没完成，推到3就完成了 # include <stdio.h> void swap_1(int, int); void swap_2(int *, int *); void swap_3(int *, int *); int main(void) { int a = 3; int b =
php 5.4中php-fpm 的重启、终止操作命令 dcj3sjt126com PHP
php 5.4中php-fpm 的重启、终止操作命令: 查看php运行目录命令：which php/usr/bin/php 查看php-fpm进程数：ps aux | grep -c php-fpm 查看运行内存/usr/bin/php -i|grep mem 重启php-fpm/etc/init.d/php-fpm restart 在phpinfo()输出内容可以看到php
线程同步工具类 shuizhaosi888 同步工具类
同步工具类包括信号量（Semaphore）、栅栏（barrier）、闭锁（CountDownLatch）闭锁（CountDownLatch） public class RunMain { public long timeTasks(int nThreads, final Runnable task) throws InterruptedException { fin
bleeding edge是什么意思 haojinghua DI
不止一次，看到很多讲技术的文章里面出现过这个词语。今天终于弄懂了——通过朋友给的浏览软件，上了wiki。我再一次感到，没有辞典能像WiKi一样，给出这样体贴人心、一清二楚的解释了。为了表达我对WiKi的喜爱，只好在此一一中英对照，给大家上次课。 In computer science, bleeding edge is a term that
c中实现utf8和gbk的互转 jimmee c iconv utf8&gbk编码
#include <iconv.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <sys/stat.h> int code_c
大型分布式网站架构设计与实践 lilin530 应用服务器搜索引擎
1.大型网站软件系统的特点？ a.高并发，大流量。 b.高可用。 c.海量数据。 d.用户分布广泛，网络情况复杂。 e.安全环境恶劣。 f.需求快速变更，发布频繁。 g.渐进式发展。 2.大型网站架构演化发展历程？ a.初始阶段的网站架构。应用程序，数据库，文件等所有的资源都在一台服务器上。 b.应用服务器和数据服务器分离。 c.使用缓存改善网站性能。 d.使用应用
在代码中获取Android theme中的attr属性值 OliveExcel android theme
Android的Theme是由各种attr组合而成, 每个attr对应了这个属性的一个引用, 这个引用又可以是各种东西. 在某些情况下, 我们需要获取非自定义的主题下某个属性的内容 (比如拿到系统默认的配色colorAccent), 操作方式举例一则: int defaultColor = 0xFF000000; int[] attrsArray = { andorid.r.
基于Zookeeper的分布式共享锁 roadrunners zookeeper 分布式共享锁
首先，说说我们的场景，订单服务是做成集群的，当两个以上结点同时收到一个相同订单的创建指令，这时并发就产生了，系统就会重复创建订单。等等......场景。这时，分布式共享锁就闪亮登场了。共享锁在同一个进程中是很容易实现的，但在跨进程或者在不同Server之间就不好实现了。Zookeeper就很容易实现。具体的实现原理官网和其它网站也有翻译，这里就不在赘述了。官
两个容易被忽略的MySQL知识 tomcat_oracle mysql
1、varchar(5)可以存储多少个汉字，多少个字母数字？　　相信有好多人应该跟我一样，对这个已经很熟悉了，根据经验我们能很快的做出决定，比如说用varchar(200)去存储url等等，但是，即使你用了很多次也很熟悉了，也有可能对上面的问题做出错误的回答。　　这个问题我查了好多资料，有的人说是可以存储5个字符，2.5个汉字（每个汉字占用两个字节的话），有的人说这个要区分版本，5.0
zoj 3827 Information Entropy(水题) 阿尔萨斯 format
题目链接：zoj 3827 Information Entropy 题目大意：三种底，计算和。解题思路：调用库函数就可以直接算了，不过要注意Pi = 0的时候，不过它题目里居然也讲了。。。limp→0+plogb(p)=0，因为p是logp的高阶。 #include <cstdio> #include <cstring> #include <cmath&