鹏飞九万里

第三代大数据处理方案Flink

Apache Flink

Flink作为第三代流计算引擎，同采取了DAG Stage拆分的思想构建了存粹的流计算框架。被人们称为第三代大数据处理方案。该计算框架和Spark设计理念出发点恰好相反。

Spark: 底层计算引擎批处理模型，在批处理之上构建流 - 流计算实时性较低
Flink:底层计算就是连续的流计算模型，在流计算上模拟批处理 - 既保证流的实时性，有可以实现批处理。

第一代：2006年 Hadoop（HDFS、MapReduce），2014年 9月份 Storm 诞生顶级项目。

第二代：2014年2月份 Spark诞生 Spark RDD/DStream

第三代：2014年12月份Flink诞生。

原因是因为早期人们对大数据分析的认知或者业务场景大都停留在批处理领域。才导致了Flink的发展相比较于Spark较为缓慢，直到2017年人们才慢慢将批处理开始转向流处理。

流计算场景：实时计算领域，系统监控、舆情监控、交通预测、国家电网、疾病预测，银行/金融风控。

Spark架构 vs Flink 架构

**总结：**不难看出Flink在架构的设计优雅程度上其实和Spark是非常相似的。资源管理上Flink同样可以运行在Standalone和yarn、k8s等，在上层上抽象出流处理和批处理两个维度数据的处理方式分别处理unbound和bounded数据。并且在DataStream和DateSet API之上均有对应的实现例如SQL处理、CEP-Event (Complex event processing)、MachineLearing等，也自然被称为第三代大数据处理方案。

Flink 运行与架构

参考：https://ci.apache.org/projects/flink/flink-docs-master/concepts/runtime.html

Flink会使用chaining oprators的方式将一些操作归并到一个subtask中，每个任务就是一个线程。这种chain operator方式类似于Spark DAG拆分。通过该种方式可以优化计算，减少Thread-to-Thread的通信成本。下图描绘了Flink 流计算chain操作

Flink架构角色：JobManager（类似Spark Master）、TaskManager（类似Spark Worker）、Client（类似Spark Driver）

JobManager：任务计算master，负责分布式计算的协调，例如：线程调度、协调Checkpoints、失败恢复等。

There is always at least one Job Manager. A high-availability setup will have multiple JobManagers, one of which one is always the leader, and the others are standby.

TaskManager：主要负责执行流计算的中的任务集合（SubTasks = 线程集），负责流计算过程中数据缓存和数据交换,需要连接JobManager汇报自身状态，以及所负责的任务。

There must always be at least one TaskManager.

Client：虽然类似于Driver，是程序执行入口，只负责发送任务给JobManager，并不负责任务执行期间的调度。一旦提交后，可以关闭client。（注意区分Spark中Driver，因为Spark Driver负责任务调度和恢复）

每一个TaskManager是一个JVM进程，用于执行1~n个subtasks(个子任务都运行在一个独立线程中)，通过Task slots控制TaskManager JVM接受Tasks的数目（Job计算任务数目）。因此一个TaskManager至少有1个Task slots.

每个Task Slot表示一个Task Manager计算资源的一个子集。例如：一个Task Manager有3个slots,意味着每个Slots占用该JVM进程的1/3的内存资源。由于1个Task slot只能分配以一个Job，所以通过slots策略可以到达不同job任务计算间的隔离。就上述案例，如果给一个计算任务分配6 slots，该任务的种任务总数5，分配如下：

一个线程占用一个slots.其中还有一些多余的slot被浪费了，因此在使用Flink程序的时候需要用户精准的知道该job需要多好个Slot，以及任务的并行度。因为Flink可以做到同一个job中Task slots的共享。

默认情况下，Flink任务所需的TaskSlots的数目等于其中一个Task的最大并行度。

Flink 安装部署

前提条件
- HDFS正常启动 (SSH免密码认证)
- JDK1.8+
上传并解压flink

[root@CentOS ~]# tar -zxf flink-1.8.1-bin-scala_2.11.tgz -C /usr/

配置flink-conf.yaml配置文件

[root@CentOS ~]# cd /usr/flink-1.8.1/
[root@CentOS flink-1.8.1]# vi conf/flink-conf.yaml
jobmanager.rpc.address: CentOS
taskmanager.numberOfTaskSlots: 4
[root@CentOS flink-1.8.1]# vi conf/slaves
CentOS

启动flink服务

[root@CentOS flink-1.8.1]# ./bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host CentOS.
Starting taskexecutor daemon on host CentOS.
[root@CentOS flink-1.8.1]# jps
4721 SecondaryNameNode
4420 DataNode
36311 TaskManagerRunner
35850 StandaloneSessionClusterEntrypoint
2730 QuorumPeerMain
3963 Kafka
36350 Jps
4287 NameNode

如果Flink需要将计算数据写入HDFS系统，需要注意Flink安装版本和Hadoop的版本，一般需要下载flink-shaded-hadoop-2-uber-xxxx.jar并且将该jar放置在Flink的lib目录下，这样做的目的是可以通过Flink直接操作HBase、HDFS、YARN都可以。第二种方案在是环境变量中配置HADOOP_CLASSPATH

访问http://centos:8081/#/overview查看flink web UI

快速入门

pom.xml

<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-coreartifactId>
    <version>1.8.1version>
dependency>
<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-clients_2.11artifactId>
    <version>1.8.1version>
dependency>

<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-scala_2.11artifactId>
    <version>1.8.1version>
dependency>
<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-streaming-scala_2.11artifactId>
    <version>1.8.1version>
dependency>

Client代码

import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment}
import org.apache.flink.streaming.api.scala._
object FlinkStreamWordCount {
  def main(args: Array[String]): Unit = {
     //1.创建StreamExecutionEnvironment
    val env=StreamExecutionEnvironment.getExecutionEnvironment
    //2.设置Source
    val lines:DataStream[String]=env.socketTextStream("CentOS",9999)

    //3.对lines数据实现常规转换
    lines.flatMap(_.split("\\s+"))
         .map(WordPair(_,1))
         .keyBy("word")
         .sum("count")
         .print()
      
    //4.执行任务
    env.execute("wordcount")
  }
}

case class WordPair(word:String,count:Int)

任务提交

[root@CentOS flink-1.8.1]# ./bin/flink run --class com.baizhi.demo01.FlinkStreamWordCount -p 3 /root/flink-1.0-SNAPSHOT.jar

查看任务

[root@CentOS flink-1.8.1]# ./bin/flink list
Waiting for response...
------------------ Running/Restarting Jobs -------------------
26.08.2019 04:21:26 : 8b03648cbd94c37a200349ccf3ff0331 : wordcount (RUNNING)
--------------------------------------------------------------
No scheduled jobs.

取消

[root@CentOS flink-1.8.1]# ./bin/flink cancel 8b03648cbd94c37a200349ccf3ff0331

Flink基本结构

创建执行所需的环境 StreamExecutionEnvironment
2）构建DataStream
3）执行DataStream转换算子（lazy）
4）指定计算结果输出
5）执行计算任务env.execute(“job名字”)

创建ExecutionEnviroment

getExecutionEnvironment()

val env=StreamExecutionEnvironment.getExecutionEnvironment

可以根据程序部署环境，自动识别运行上下文。用在本地执行和分布式环境

createLocalEnvironment()

val env=StreamExecutionEnvironment.createLocalEnvironment(4)

指定本地测试环境。

createRemoteEnvironment

val jarFiles="D:\\IDEA_WorkSpace\\BigDataProject\\20190813\\FlinkDataStream\\target\\flink-1.0-SNAPSHOT.jar"
val env=StreamExecutionEnvironment.createRemoteEnvironment("CentOS",8081,jarFiles)

获取任务执行计划

val env=StreamExecutionEnvironment.getExecutionEnvironment
//2.设置Source
val lines:DataStream[String]=env.socketTextStream("CentOS",9999)

//3.对lines数据实现常规转换
lines.flatMap(_.split("\\s+"))
    .map(WordPair(_,1))
    .keyBy("word")
    .sum("count")
    .print()

println(env.getExecutionPlan)

{"nodes":[{"id":1,"type":"Source: Socket Stream","pact":"Data Source","contents":"Source: Socket Stream","parallelism":1},{"id":2,"type":"Flat Map","pact":"Operator","contents":"Flat Map","parallelism":16,"predecessors":[{"id":1,"ship_strategy":"REBALANCE","side":"second"}]},{"id":3,"type":"Map","pact":"Operator","contents":"Map","parallelism":16,"predecessors":[{"id":2,"ship_strategy":"FORWARD","side":"second"}]},{"id":5,"type":"aggregation","pact":"Operator","contents":"aggregation","parallelism":16,"predecessors":[{"id":3,"ship_strategy":"HASH","side":"second"}]},{"id":6,"type":"Sink: Print to Std. Out","pact":"Data Sink","contents":"Sink: Print to Std. Out","parallelism":16,"predecessors":[{"id":5,"ship_strategy":"FORWARD","side":"second"}]}]}

打开网页：https://flink.apache.org/visualizer/将以上的json黏贴到该网页

就可以看到任务执行计划：

Data Source

Source是流计算应用的输入，用户可以通过``StreamExecutionEnvironment.addSource(sourceFunction)给流计算指定输入，其中sourceFunction可以使SourceFunction或者是ParallelSourceFunction|RichParallelSourceFunction `实现自定义的输入Source.当然Flink也提供了一些内建的Source以便于测试使用:

File-based

readTextFile(path)

底层使用的TextInputForamt读取，仅仅读取一次。

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
//2.设置Source
val lines:DataStream[String]=env.readTextFile("hdfs://CentOS:9000/demo/words")
//3.对lines数据实现常规转换
lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print()
//4.执行任务
env.execute("wordcount")

提示如果读取HDFS的文件系统需要额外引入依赖


<dependency>
    <groupId>org.apache.hadoopgroupId>
    <artifactId>hadoop-hdfsartifactId>
    <version>2.9.2version>
dependency>
<dependency>
    <groupId>org.apache.hadoopgroupId>
    <artifactId>hadoop-commonartifactId>
    <version>2.9.2version>
dependency>

readFile(fileInputFormat, path)

 //1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
//2.设置Source
val p="hdfs://CentOS:9000/demo/words"
val inputFormat=new TextInputFormat(new Path(p))//这里的p路劲可以省略
val lines:DataStream[String]=env.readFile(inputFormat,p)
//3.对lines数据实现常规转换
lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print()
//4.执行任务
env.execute("wordcount")

readFile(fileInputFormat, path, watchType, interval, pathFilter, typeInfo)

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
//2.设置Source
val inputFormat=new TextInputFormat(new Path())
val lines:DataStream[String]=env.readFile(inputFormat,"file:///D:/demo/words",
                                          FileProcessingMode.PROCESS_CONTINUOUSLY,1000)
//3.对lines数据实现常规转换
lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print()
//4.执行任务
env.execute("wordcount")

如果文件被修改了，该文件的所有内容会被重新加载。导致数据重复计算。因此一般在流计算的时候，并不直接在文件上修改，而是添加新文件。

Collection-based

val env=StreamExecutionEnvironment.getExecutionEnvironment
//2.设置Source
val lines:DataStream[String]=env.fromElements("this is a demo","hello flink")
//3.对lines数据实现常规转换
lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print()
//4.执行任务
env.execute("wordcount")

Custom Source(Kafka)√

通过addSource 方法添加实现，例如用户可以通过Apache kafka中读取数据。

<dependency>
  <groupId>org.apache.flinkgroupId>
  <artifactId>flink-connector-kafka_2.11artifactId>
  <version>1.8.1version>
dependency>

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment

val props=new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props)
//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)
//3.对lines数据实现常规转换
lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print()
//4.执行任务
env.execute("wordcount")

如果使用SimpleStringSchema仅仅是拿到value，如果用户希望拿到更多信息比如 key/value/partition/offset 用户可以通过自定义KafkaDeserializationSchema的子类定制反序列化

import org.apache.flink.api.common.typeinfo.TypeInformation
import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.flink.api.scala._
class CustomKafkaDeserializationSchema extends KafkaDeserializationSchema[(String,String,Int,Long)]{
  //这个方法永远返回false
  override def isEndOfStream(t: (String, String, Int, Long)): Boolean = false

  //解码出用户需要的数据
  override def deserialize(record: ConsumerRecord[Array[Byte], Array[Byte]]): (String, String, Int, Long) = {
    var key=""
    if(record.key()!=null && record.key().size!=0){
      key=new String(record.key())
    }
    val value=new String(record.value())
    (key,value,record.partition(),record.offset())
  }
  //返回结果类型
  override def getProducedType: TypeInformation[(String, String, Int, Long)] = {
    createTypeInformation[(String, String, Int, Long)]
  }
}

如果Kafka存储的都是json字符串数据，用户可以使用系统自带一些json支持的Schema。推荐使用

JsonNodeDeserializationSchema:要求value必须是json字符串
JSONKeyValueDeserializationSchema(meta)：要求key,value都必须是josn格式，同时可以携带元数据（分区、offset等）

val env=StreamExecutionEnvironment.getExecutionEnvironment

val props=new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

//{"name":"zs","age":18}
val kafkaConsumer=new FlinkKafkaConsumer("topic01",new JSONKeyValueDeserializationSchema(true),props)
val lines:DataStream[ObjectNode]=env.addSource(kafkaConsumer)
lines.print()

//4.执行任务
env.execute("wordcount")
}

Data Sinks

将DataStream的数据写到文件系统、socket、打印输出、外围系统（Kafka|Redis）

File-Based:

write*(测试)

writeAsText/writeAsCsv数据，这些数据不持之checkpoint机制，也就是说只能保证At-least-Once语义的输出，同时写出的数据并不会立即写出到外围系统，在此期间如果程序故障，有可能导致写丢失。

val env=StreamExecutionEnvironment.getExecutionEnvironment
val props=new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props)
//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)

lines.flatMap(_.split("\\s+"))
.map((_,1))
.keyBy(0)
.sum(1)
.writeAsText("file:///D:/results/words",WriteMode.OVERWRITE)
//4.执行任务
env.execute("wordcount")

BucketingSink（生产）

如果用户需要reliable, exactly-once语义方式将DataStream写出到外围系统，用户需要使用flink-connector-filesystem将数据写出到外围系统。

<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-connector-filesystem_2.11artifactId>
    <version>1.8.1version>
dependency>

<dependency>
    <groupId>org.apache.hadoopgroupId>
    <artifactId>hadoop-hdfsartifactId>
    <version>2.9.2version>
dependency>

<dependency>
    <groupId>org.apache.hadoopgroupId>
    <artifactId>hadoop-commonartifactId>
    <version>2.9.2version>
dependency>

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
val props=new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props)
//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)

val bucketingSink = new BucketingSink[(String,Int)]("hdfs://CentOS:9000/BucketSink")
bucketingSink.setBucketer(new DateTimeBucketer[(String, Int)]("yyyy-MM-dd-HH",ZoneId.of("Asia/Shanghai")))
bucketingSink.setBatchSize(1024)//1KB
bucketingSink.setBatchRolloverInterval(20 * 60 * 1000) // this is 20 mins


lines.flatMap(_.split("\\s+"))
.map((_,1))
.keyBy(0)
.sum(1)
.addSink(bucketingSink)
//4.执行任务
env.execute("wordcount")

print

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
val props=new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props)
//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)

lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .print("debug")//输出前缀，如果不指定，默认前缀是 taskid
//4.执行任务
env.execute("wordcount")

RedisSink√

<dependency>
  <groupId>org.apache.bahirgroupId>
  <artifactId>flink-connector-redis_2.11artifactId>
  <version>1.0version>
dependency>

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
val props=new Properties()
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props)
val redisConfig=new FlinkJedisPoolConfig.Builder()
.setHost("CentOS")
.setPort(6379)
.build()

val redisSink= new RedisSink(redisConfig,new WordPairRedisMapper)
//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)
lines.flatMap(_.split("\\s+"))
.map((_,1))
.keyBy(0)
.sum(1)
.addSink(redisSink)
//4.执行任务
env.execute("wordcount")

如果连接的是集群使用 FlinkJedisClusterConfig，哨兵模式 FlinkJedisSentinelConfig

集群

FlinkJedisPoolConfig conf = new FlinkJedisPoolConfig.Builder()
    .setNodes(new HashSet(Arrays.asList(new InetSocketAddress(5601)))).build();

哨兵

val conf = new FlinkJedisSentinelConfig.Builder()
    .setMasterName("master")
    .setSentinels(...)
    .build()

Kafka Sink√

<dependency>
    <groupId>org.apache.flinkgroupId>
    <artifactId>flink-connector-kafka_2.11artifactId>
    <version>1.8.1version>
dependency>

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
val props1=new Properties()
props1.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props1.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val props2=new Properties()
props2.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props1)
val kafkaProducer=new FlinkKafkaProducer[(String,Int)]("topic02",new CustomKeyedSerializationSchema,props2)

//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)
lines.flatMap(_.split("\\s+"))
.map((_,1))
.keyBy(0)
.sum(1)
.addSink(kafkaProducer)
//4.执行任务
env.execute("wordcount")

[root@CentOS kafka_2.11-0.11.0.0]# ./bin/kafka-console-consumer.sh --bootstrap-server CentOS:9092 
            --topic topic02 
            --key-deserializer org.apache.kafka.common.serialization.StringDeserializer 
            --value-deserializer org.apache.kafka.common.serialization.StringDeserializer 
            --property print.key=true

class CustomKeyedSerializationSchema extends KeyedSerializationSchema[(String,Int)]{
  override def serializeKey(t: (String, Int)): Array[Byte] = {
    t._1.getBytes()
  }

  override def serializeValue(t: (String, Int)): Array[Byte] = {
    t._2.toString.getBytes()
  }

  override def getTargetTopic(t: (String, Int)): String = {
    null
  }
}

自定义Sink输出

用户可以更具需求实现SinkFunction不带故障恢复，或者使用RichSinkFunction实现故障恢复（后续章节介绍~）。

class CustomSinkFunction extends SinkFunction[(String,Int)] {

  override def invoke(value: (String, Int), context: SinkFunction.Context[_]): Unit = {
    println(value)
  }

}

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
val props1=new Properties()
props1.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"CentOS:9092")
props1.put(ConsumerConfig.GROUP_ID_CONFIG,"g1")

val kafkaConsumer=new FlinkKafkaConsumer("topic01",new SimpleStringSchema(),props1)

//2.设置Source
val lines:DataStream[String]=env.addSource[String](kafkaConsumer)
lines.flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .sum(1)
    .addSink(new CustomSinkFunction)//
    //4.执行任务
env.execute("wordcount")

Operator 转换

Map

Takes one element and produces one element. A map function that doubles the values of the input stream:

dataStream.map(x=>x*2)

FlatMap

Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words:

dataStream.flatMap(str => str.split("\\s+"))

Filter

Evaluates a boolean function for each element and retains those for which the function returns true.

dataStream.filter(item => !item.contains("error"))

Union

Union of two or more data streams creating a new stream containing all the elements from all the streams. Note: If you union a data stream with itself you will get each element twice in the resulting stream.

val stream1= env.socketTextStream("CentOS",9999)
val stream2= env.socketTextStream("CentOS",8888)
stream1
.union(stream2)
.print()

Connect

“Connects” two data streams retaining their types, allowing for shared state between the two streams.


val stream1= env.socketTextStream("CentOS",9999)
val stream2= env.socketTextStream("CentOS",8888)
stream1.connect(stream2)
.flatMap(
    (line:String)=>line.split("\\s+"),//stream1
    (line:String)=>line.split("\\s+") //stream2
)
.map((_,1))
.keyBy(0)
.sum(1)
.print()

Split/Select

split:Split the stream into two or more streams according to some criterion.

Select:Select one or more streams from a split stream.

var splitStream:SplitStream[String]= env.socketTextStream("CentOS",9999)
.split((line:String)=>{
    if(line.contains("error")){
        List("error")
    }else{
        List("info")
    }
})

splitStream.select("error").print("error:")
splitStream.select("info").print("info:")

以上算子过时了，现在推荐使用side-out-put

val outputTag = new OutputTag[String]("error") {}
var stream=env.socketTextStream("CentOS",9999)
.process(new ProcessFunction[String,String] {
    override def processElement(value: String, ctx: ProcessFunction[String, String]#Context, out: Collector[String]): Unit = {
        if(value.contains("error")){
            ctx.output(outputTag,value)
        }else{
            out.collect(value)
        }
    }
})
stream.print("info:")
stream.getSideOutput(outputTag).print("error")

KeyBy

Logically partitions a stream into disjoint partitions, each partition containing elements of the same key. Internally, this is implemented with hash partitioning.

dataStream.keyBy("someKey") // Key by field "someKey"
dataStream.keyBy(0) // Key by the first element of a Tuple

Reduce

A “rolling” reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value.

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .reduce((v1,v2)=>(v1._1,v1._2+v2._2))
    .print()

Fold

A “rolling” fold on a keyed data stream with an initial value. Combines the current element with the last folded value and emits the new value.

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .fold(("",0))((r,v)=>(v._1,v._2+r._2))
    .print()

Aggregations

max|maxBy/min|minBy|sum

//1 zs 10000 1
//2 ls 15000 1
//3 ww 8000 1
env.socketTextStream("CentOS",9999)
    .map(line=>line.split("\\s+"))
    .map(tokens=>Employee(tokens(0).toInt,tokens(1),tokens(2).toDouble,tokens(3).toInt))
    .keyBy("dept")
    .minBy("salary")
    .print()

11> Employee(1,zs,10000.0,1)
11> Employee(1,zs,10000.0,1)
11> Employee(3,ww,8000.0,1)

//1 zs 10000 1
//2 ls 15000 1
//3 ww 8000 1
env.socketTextStream("CentOS",9999)
    .map(line=>line.split("\\s+"))
    .map(tokens=>Employee(tokens(0).toInt,tokens(1),tokens(2).toDouble,tokens(3).toInt))
    .keyBy("dept")
    .min("salary")
    .print()

11> Employee(1,zs,10000.0,1)
11> Employee(1,zs,10000.0,1)
11> Employee(1,zs,8000.0,1)

State & Fault Tolerance

概述

Apache Flink是构建在Data Stream之上的Stateful Computation,也就是说状态计算是整个Flink计算的核心。因此状态管理是构建Flink的一个比较重要板块。有状态计算使用场景：1.状态检索 2. 窗口聚合或者统计 3.机器学习领域存储训练模型（公式） 4.查询历史数据。Flink通过checkpoint实现state故障容错以及可以使用savepoint实现state计算恢复。Flink程序计算规模可以随意的扩展，在扩展的时候Flink可以重新分发内部状态，Flink同时还支持在运行期间支持外接查询计算状态。Flink提供了State多种存储方案，例如基于内存MemoryStateBacked、FSStateBackDB、RocksDBStateBackend用于存储系统的状态。

State分类

Flink将状态的管理分为keyed State,专门为KeyedStream实现的一套状态管理。除了Keyed Stream的状态的管理其他状态都称为Operator State .

Keyed State

Keyed State is always relative to keys and can only be used in functions and operators on a KeyedStream.

You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of , and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as .

Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.

这种状态必须和Key进行绑定，必须应用在KeyedStream的操作算子中。每个状态是与绑定。因为Key属于Task中并行实例一个（shuffle保证相同的key落入一个实例中处理）,因此可以讲keyed state理解为和绑定。

所有keyed state最终会按照Key Groups进行管理。Flink在做状态分发的时候是以Key Groups为单元进行分发。Key Group单元的数数目等于系统定义的最大并行度。因此每个keye dOperator操作实例会连接1~N个Key Groups实现状态更新。

Operator State

With Operator State (or non-keyed state), each operator state is bound to one parallel operator instance. The Kafka Connector is a good motivating example for the use of Operator State in Flink. Each parallel instance of the Kafka consumer maintains a map of topic partitions and offsets as its Operator State.

The Operator State interfaces support redistributing state among parallel operator instances when the parallelism is changed. There can be different schemes for doing this redistribution.

除了Keyed Stream的状态的管理其他状态都称为Operator State . 和 keyed State不同，每个Operator State的状态只和Operator绑定,该State可以通过获取.该State在Operator实例当中可以通过不同分发策略实现状态的管理。

Managed State和 Raw State

无论是Keyed State 还是 Operator State这些状态在Flink中的存在形式只有两种Managed State和Raw Sate。由于 Managed State所有操作符都支持。并且Flink也提供了丰富的Managed State的操作类型和API接口，Flink可以很好的管理Managed State数据，继而可是使用Flink encoder将数据存储到外围系统（checekepoint）。Raw Sata只应用与一些用户自定义的Operator当中，需要手动指定state序列化，系统在做checkpoint的时候，只是将State数据以字节的形式存储，并不了解数据结果。因此在开发使用中一般只靠考虑使用它Managed State，因为Flink可以针对Managed State实现状态分发和优化存储结构。

一般情况下只考虑使用Managed State

Managed Keyed State

所有的Keyed State是和key绑定，不同的key操作的state一定不一样的。目前Flink给我们提供一下的state结构。一个key可以绑定若干个State，但是一个State只能属于一个key。

state分类	说明
ValueState	存储一个值T，用户可以通过T value()获取T，通过update(T)更新状态
ListState	存储一个序列T元素集合，add(T)、addAll(List)、Iterable get()、update(List)
ReducingState	存储一个值，会自动实现add计算。add(T)，T get()，需要用户提供ReduceFunction
AggregatingState	存储一个值，也是实现聚合计算，但是对IN、OUT类型可以不同。add(IN)，OUT get()、需要用户提供AggregateFunction
FoldingState	等价ReducingState，需要提供FoldFunction（后续版本Flink1.4废弃了）
MapState	存储的是个Map，put(UK, UV)、putAll(Map)、get(UK)、entries(）、keys()、values()

以上所有的状态都有一个clear方法清除对应key的状态。

在使用这些State的前提需要：

RuntimeContext对象获取状态
- 所有的RichFunction中都可以获取到RuntimeContext
```
class MyMapFunction implements MapFunction {
  public Integer map(String value) { return Integer.parseInt(value); }
};
```
```
class MyMapFunction extends RichMapFunction {
  public Integer map(String value) { return Integer.parseInt(value); }
};
```
- State获取
- ValueState getState(ValueStateDescriptor)
- ReducingState getReducingState(ReducingStateDescriptor)
- ListState getListState(ListStateDescriptor)
- AggregatingState getAggregatingState(AggregatingStateDescriptor)
- FoldingState getFoldingState(FoldingStateDescriptor)
- MapState getMapState(MapStateDescriptor)
创建StateDescriptor：ValueStateDescriptor, ListStateDescriptor, ReducingStateDescriptor, FoldingStateDescriptor，AggregatingStateDescriptor 或者MapStateDescriptor
- name属性唯一标示当前key的state
- state处理的数据类型
- 可能需要传递Function比如：AggregateFunction

ValueState

class CountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var state:ValueState[Int]=_
  override def map(value: (String, Int)): (String,Int) = {
    var hoistory:Int= state.value()
    if(hoistory==null){
      hoistory=0
    }
    state.update(hoistory+value._2)
    (value._1,hoistory+value._2)
  }
  override def open(parameters: Configuration): Unit = {
    val vsd = new ValueStateDescriptor[Int]("count",createTypeInformation[Int])
    val runtimeContext = getRuntimeContext()
    state = runtimeContext.getState(vsd)
  }
}

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
    .flatMap(line=>line.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .map(new CountMapFunction)
.print()
//4.执行任务
env.execute("wordcount")

AggregatingState

import org.apache.flink.api.common.functions.AggregateFunction
// IN 输入     ACC 累加器   OUT 输出
class AvgAggFunction extends AggregateFunction[(Int,Double),(Double,Int,Int),(Int,Double)]{
  override def createAccumulator(): (Double, Int, Int) = {
    (0.0,0,-1)
  }

  override def add(value: (Int, Double), accumulator: (Double, Int, Int)): (Double, Int, Int) = {
     var total=accumulator._1 + value._2
     var count=accumulator._2+1
    (total,count,value._1)
  }

  override def getResult(accumulator: (Double, Int, Int)): (Int, Double) = {
    (accumulator._3,accumulator._1/accumulator._2)
  }

  override def merge(a: (Double, Int, Int), b: (Double, Int, Int)): (Double, Int, Int) = {
    (a._1+b._1,a._2+b._2,a._3)
  }
}

import org.apache.flink.api.common.functions.{AggregateFunction, RichMapFunction}
import org.apache.flink.api.common.state.{AggregatingState, AggregatingStateDescriptor}
import org.apache.flink.configuration.Configuration
import org.apache.flink.api.scala._
class AvgMapFunction extends RichMapFunction[(Int,Double),(Int,Double)]{
  var avgState:AggregatingState[(Int,Double),(Int,Double)]=_
  override def map(value: (Int, Double)): (Int, Double) = {
      avgState.add(value)
      avgState.get()
  }

  override def open(parameters: Configuration): Unit = {
    var asd=new AggregatingStateDescriptor("avgcost", new AvgAggFunction, createTypeInformation[(Double,Int,Int)])
    avgState=getRuntimeContext.getAggregatingState(asd)
  }
}

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment
//1 zhansan 2 4.5
env.socketTextStream("CentOS",9999)
    .map(line=>line.split("\\s+"))
    .map(tokens=>(tokens(0).toInt,tokens(2).toInt * tokens(3).toDouble))
    .keyBy(0)
    .map(new AvgMapFunction)
    .print()
//4.执行任务
env.execute("wordcount")

State TTL (Time To Live)

可以给Keyed State指定过期时间，一旦配置TTL，针对对一些过期的state，系统会尽最大努力删除过期的数据，以减少state对内存占用量。

class CountMapFunction extends RichMapFunction[(String,Int),(String,Int)]{
  var state:ValueState[Int]=_
  override def map(value: (String, Int)): (String,Int) = {
    var hoistory:Int= state.value()
    if(hoistory==null){
      hoistory=0
    }
    state.update(hoistory+value._2)
    (value._1,hoistory+value._2)
  }
  //创建state
  override def open(parameters: Configuration): Unit = {
    val vsd = new ValueStateDescriptor[Int]("count",createTypeInformation[Int])

    //1.创建TTLConfig
    val ttlConfig = StateTtlConfig
      .newBuilder(Time.seconds(5)) //这是state存活时间10s
      .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndW9696ite)//设置过期时间更新方式
      .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)//永远不要返回过期的状态
      .build
    //2.开启TTL
    vsd.enableTimeToLive(ttlConfig)

    val runtimeContext = getRuntimeContext()
    state = runtimeContext.getState(vsd)
  }
}

分析一下参数含义：

 val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(5)) //①
    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)//②
    .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)//③
    .build

①个参数是必须指定的，表示state过期时间

②参数表示更新state策略默认OnCreateAndWrite，控制什么时间刷新TTL时间戳

OnCreateAndWrite - 创建、修改都可以改变值
OnReadAndWrite - 读、写都会改变

③当state过期以后，是否将过期的状态返回默认 NeverReturnExpired

NeverReturnExpired - 永远不返回过期值
ReturnExpiredIfNotCleanedUp - 可能返回过期的数据（还没有被清理）。

Notes:

开启过期state特性，系统会存储state的时间戳，因此会提升系统状态存储成本-耗内存

TTL时间属性来自于计算节点

如果在你恢复以前的状态中并没开启TTL特性，然后修改了TTL配置，系统无法正常恢复状态原因参考①，系统会抛出StateMigrationException异常

TTL开启并不影响系统checkpoint和SavePoint，只是告诉Flink该如何对待state

清除Expired State

过期数据会在用户读取的时候被删除。如果一些keyedState已经过期，但是系统没有尝试使用过期的State，这些state系统依然会保留。

Cleanup in full snapshot：仅仅是在状态恢复的时候，系统会将过期的数据丢弃不加载，但是过期数据本身依然存储磁盘。

val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(5)) //这是state存活时间10s
    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)//设置过期时间更新方式
    .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)//永远不要返回过期的状态
    .cleanupFullSnapshot()
    .build

Cleanup in background：会更具后台state backend的实现采取清除策略。

val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(5)) //这是state存活时间10s
    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)//设置过期时间更新方式
    .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)//永远不要返回过期的状态
    .cleanupInBackground()
    .build

基于内存

 val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(5)) //这是state存活时间10s
    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)//设置过期时间更新方式
    .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)//永远不要返回过期的状态
    .cleanupIncrementally(5,true)
    .build

true表示每key进行，系统会尝试迭代100 key状态检查是否过期，如果过期就删除过期数据，false表示只有在更新state的时候才会向后检查100Key，这种检查时被动的Lazy的形式。

RocksDB state管理

RockDB（k-v存储）底层异步压缩状态，会将key相同的数据进行Compact（压缩），以减少state文件大小。但是并不对过期state进行清理，因此可以通过配置CompactFilter让RockDB在compact的时候对过期的state进行排除。这种特性过滤的特性默认是关闭的，如果开启可以再flink-coinf.yaml中配置state.backend.rocksdb.ttl.compaction.filter.enabled: true或者通过API设置

RocksDBStateBackend::enableTtlCompactionFilter

val ttlConfig = StateTtlConfig
    .newBuilder(Time.seconds(5)) //这是state存活时间10s
    .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)//设置过期时间更新方式
    .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)//永远不要返回过期的状态
    .cleanupInRocksdbCompactFilter(1000)//处理完1000个状态查询时候，会启用一次CompactFilter
    .build

这里的1000表示当系统进行1000查询，系统后台会执行一次Compact

Managed Operator State

如果用户希望使用Operator State用户在定义RichFunction的时候需要实现CheckpointedFunction 或者ListCheckpointed

CheckpointedFunction

//对当前状态做快照，存储状态
void snapshotState(FunctionSnapshotContext context) throws Exception;
//状态的初始化。或者是恢复
void initializeState(FunctionInitializationContext context) throws Exception;

以上两个方法分别是在系统做checkpoint/savepoint的时候会调用snapshotState方法。当系统第一次初始化Operator的时候或者是故障恢复的时候系统调用initializeState，在高方法在中一般包含两个逻辑：1.初始化逻辑 2、恢复逻辑

注意Operator State目前仅仅支持 list-style managed Operator State，只能存储 List 数据元素，List元素相互独立，因此系统在做状态分发的时候可以讲List元素分发给不同的Operator实例。

目前Flink 支持Operator State分发策略有两种：

Even-split（均分）：每个Operator实例都会一个Liststate的集合，当系统恢复的时候，系统更具当前Operator实例的并行度对当前的List进行均分。
**Union **：每个Operator实例都会一个Liststate的集合，当系统恢复的时候，系统更具当前Operator实例的都可以拿到所有的状态数据。

缓冲Sink

class BufferSink(threshold: Int = 0) extends SinkFunction[(String,Int)] with CheckpointedFunction {
  @transient
  private var checkpointedState: ListState[(String, Int)] = _
  private val bufferedElements = ListBuffer[(String, Int)]()

  //将结果写出数据
  override def invoke(value: (String, Int), context: SinkFunction.Context[_]): Unit = {
    bufferedElements += value
    if (bufferedElements.size == threshold) {
      for (element <- bufferedElements) {
        println(element)
      }
      bufferedElements.clear()
    }
  }
  //快照逻辑
  override def snapshotState(context: FunctionSnapshotContext): Unit = {
    checkpointedState.clear()//清除上一次状态
    for (element <- bufferedElements) {
      checkpointedState.add(element)
    }
  }
  //初始化状态
  override def initializeState(context: FunctionInitializationContext): Unit = {
    val descriptor = new ListStateDescriptor[(String, Int)]("buffered-elements", createTypeInformation[(String, Int)])
    checkpointedState = context.getOperatorStateStore.getListState(descriptor)
    if(context.isRestored) {//从状态中恢复
      for(element <- checkpointedState.get().asScala) {
        bufferedElements += element
      }
    }
  }
}

如果使用context.getOperatorStateStore.getListState系统会均分数据，如果希望每个实例都拿到副本，可以使用context.getOperatorStateStore.getUnionListState

测试步骤

配置Flink-conf.yaml

#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================

# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled.
#
# Supported backends are 'jobmanager', 'filesystem', 'rocksdb', or the
# .
#
 state.backend: rocksdb

# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
 state.checkpoints.dir: hdfs:///flink-checkpoints

# Default target directory for savepoints, optional.
#
 state.savepoints.dir: hdfs:///flink-savepoints

# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend).
#
 state.backend.incremental: true
 state.backend.rocksdb.ttl.compaction.filter.enabled: true

#==============================================================================
# HistoryServer
#==============================================================================

# The HistoryServer is started and stopped via bin/historyserver.sh (start|stop)

# Directory to upload completed jobs to. Add this directory to the list of
# monitored directories of the HistoryServer as well (see below).
jobmanager.archive.fs.dir: hdfs:///completed-jobs/

# The address under which the web-based HistoryServer listens.
historyserver.web.address: CentOS

# The port under which the web-based HistoryServer listens.
historyserver.web.port: 8082

# Comma separated list of directories to monitor for completed jobs.
historyserver.archive.fs.dir: hdfs:///completed-jobs/

# Interval in milliseconds for refreshing the monitored directories.
historyserver.archive.fs.refresh-interval: 10000

将hadoop_classpath配置到环境变量中
发布任务
使用命令取消任务，并且创建保存点

[root@CentOS flink-1.8.1]# ./bin/flink list -m CentOS:8081
------------------ Running/Restarting Jobs -------------------
27.08.2019 21:07:34 : a623ae600438c52010e73b6f808af8a6 : wordcount (RUNNING)
--------------------------------------------------------------
[root@CentOS flink-1.8.1]# ./bin/flink cancel -s a623ae600438c52010e73b6f808af8a6 
Cancelling job a623ae600438c52010e73b6f808af8a6 with savepoint to default savepoint directory.
Cancelled job a623ae600438c52010e73b6f808af8a6. Savepoint stored in hdfs://CentOS:9000/flink-savepoints/savepoint-a623ae-0f339a1004f0.

测试恢复

ListCheckpointed

该接口是CheckpointedFunction变体，比CheckpointedFunction有更多的限制，在实现状态恢复的时候支持Event-Split 状态。

//需要系统存储状态 
List snapshotState(long checkpointId, long timestamp) throws Exception;
//传入的是状态
void restoreState(List state) throws Exception;

在系统调用checkpoint/savepoint的时候系统会调用snapshotState方法，然后将List持久化.在状态恢复的时候系统会调用restoreState方法。

Broadcast State

除了Keyed State和 Operator State 之外Flink第三种状态是Broadcast State。引Broadcast State是为了支持这样的场
景：Broadcast State是Flink支持的第三种Operator State。使Broadcast State，可以在Flink程序的一个
Stream中输出数据记录，然后将这些数据记录广播（Broadcast）到下游的每个Task中，使得这些数据记
录能够为所有的Task所共享，比如这些用于配置的数据记录（数据量一般比较小）。这样，每个Task在处理其所对应的Stream
中记录的时候，读取这些配置，来满足实际数据处理需要。

首先需要创建一个Datastream可以keyed也可以no-keyed，然后再创建一个Broadcast
Stream，使用Datastream去连接Broadcast Stream（connect方法），这就可以使得Datastream下游的任务都可以拿到Broadcast Stream中的状态。

Keyed Stream 连接 Broadcast Stream 可以使 KeyedBroadcastProcessFunction

class UserOrderKeyedBroadcastProcessFunction(msd:MapStateDescriptor[String,String])
  extends KeyedBroadcastProcessFunction[String,(String,String,Double),(String,String),(String,String,Double)]{
  //处理keyd-stream那一方数据
  override def processElement(value: (String, String, Double), ctx: KeyedBroadcastProcessFunction[String, (String, String, Double), (String, String), (String, String, Double)]#ReadOnlyContext, out: Collector[(String, String, Double)]): Unit = {
    val braodcastState = ctx.getBroadcastState(msd)
    println("=================")
    for(i <- braodcastState.immutableEntries().asScala){
      println(i.getKey+"\t"+i.getValue)
    }
    var name=braodcastState.get(value._1)//根据ID查询用户名
    //           用户名  商品     价格
    out.collect((name,value._2,value._3))
  }
  //处理广播流哪一方的数据
  override def processBroadcastElement(value: (String, String), ctx: KeyedBroadcastProcessFunction[String, (String, String, Double), (String, String), (String, String, Double)]#Context, out: Collector[(String, String, Double)]): Unit = {
    val state: BroadcastState[String, String] = ctx.getBroadcastState(msd)
    //        用户id    用户名
    state.put(value._1,value._2) //将用户信息放置到Map
  }
}

//1.创建StreamExecutionEnvironment
val env=StreamExecutionEnvironment.getExecutionEnvironment

//1 apple 10
val keyedStream=env.socketTextStream("CentOS",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2).toDouble))
.keyBy(0)

//存储 广播 流的状态
val msd=new MapStateDescriptor[String,String]("user-state",createTypeInformation[String],createTypeInformation[String])
//1 zhansan
val broadcast = env.socketTextStream("CentOS", 8888)
.map(line => line.split("\\s+"))
.map(toknes=>(toknes(0), toknes(1)))
.broadcast(msd)

keyedStream.connect(broadcast)
.process(new UserOrderKeyedBroadcastProcessFunction(msd))
.print()


//4.执行任务
env.execute("counter")

non-keyed Stream 连接 Broadcast Stream 可以使用 BroadcastProcessFunction

class UserLevelBroadcastProcessFunction(msd:MapStateDescriptor[String,Int]) extends BroadcastProcessFunction[(String,String),(String,Double),(String,String,Int)] {
  override def processElement(value: (String, String),
                              ctx: BroadcastProcessFunction[(String, String), (String, Double), (String, String, Int)]#ReadOnlyContext,
                              out: Collector[(String, String, Int)]): Unit = {

    out.collect(value._1,value._2,ctx.getBroadcastState(msd).get(value._1))
  }
  //level:0 1 2 3
  override def processBroadcastElement(value: (String, Double),
                                       ctx: BroadcastProcessFunction[(String, String), (String, Double), (String, String, Int)]#Context,
                                       out: Collector[(String, String, Int)]): Unit = {
    val state = ctx.getBroadcastState(msd)
    if(value._2<100){
      state.put(value._1,0)
    }else if(value._2 < 1000){
      state.put(value._1,1)
    }else if(value._2 < 5000){
      state.put(value._1,2)
    }else{
      state.put(value._1,3)
    }

  }
}

object FlinkStreamBroadcaststate {
  def main(args: Array[String]): Unit = {
    //1.创建StreamExecutionEnvironment
    val env=StreamExecutionEnvironment.getExecutionEnvironment

    val msd=new MapStateDescriptor[String,Int]("user-level",createTypeInformation[String],createTypeInformation[Int])
    //1 apple 10
    val broadcaststream=env.socketTextStream("CentOS",9999)
         .map(line=>line.split("\\s+"))
         .map(tokens=>(tokens(0),tokens(2).toDouble))
         .keyBy(0)
         .sum(1)
         .broadcast(msd)
    //1 zhansan
    val userstream = env.socketTextStream("CentOS", 8888)
      .map(line => line.split("\\s+"))
      .map(toknes=>(toknes(0), toknes(1)))

    userstream.connect(broadcaststream).process(new UserLevelBroadcastProcessFunction(msd))
        .print()
    //4.执行任务
    env.execute("counter")
  }
}

Checkpoint (容错-自动) & Savepoint（恢复-手动）

val env=StreamExecutionEnvironment.getExecutionEnvironment
    //开启checkpoint
    env.enableCheckpointing(7000,CheckpointingMode.EXACTLY_ONCE)
    //checkpoint必须在2s内完成，如果完成不了终止
    env.getCheckpointConfig.setCheckpointTimeout(4000)
    //距离上一次的checkpoint完成之后需要等5s 之后再开启下一次的checkpoint
    env.getCheckpointConfig.setMinPauseBetweenCheckpoints(5000)
    env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)//只开启一个checkpoint线程
    //在退出应用时候，不删除checkpoint数据
  env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_RETAIN)
    //必须保证任务可以从checkpoint恢复，恢复不成功任务失败
    env.getCheckpointConfig.setFailOnCheckpointingErrors(true)
    env.socketTextStream("CentOS",9999)
       .flatMap(_.split("\\s+"))
       .map((_,1))
       .keyBy(0)
       .sum(1)
       .print()

    env.execute("wordcount")

Checkpoint是系统自动生成保存点，由于计算过程中的计算恢复。除此之外Flink有提供了另外一种机制，需要人工触发Flink的状态备份，以便可以系统未来回滚到指定的状态。

[root@CentOS flink-1.7.2]# ./bin/flink savepoint   7c46aa11163ecd995c81f12ff92c14cc  hdfs://CentOS:9000/2019-08-29

[root@CentOS flink-1.7.2]# ./bin/flink run -s savepoint恢复目录 -c 全类名 jar包路径

State Backends（状态存储后端）

Flink提供了不同的Sate backend，⽤于指定状态的存储⽅式和位置。根据您的State Backend，State可以
位于Java的堆上或堆外。Flink管理应⽤程序的Sate，这意味着Flink处理内存管理（如果需要可能会溢出
到磁盘）以允许应⽤程序保持⾮常⼤的状态。默认情况下，配置⽂件flink-conf.yaml确定所有Flink作业的
状态后端。但是，可以基于每个作业覆盖默认状态后端，如下所示。

 val env=StreamExecutionEnvironment.getExecutionEnvironment
 env.setStateBackend(... )

（1）MemoryStateBackend：state数据保存在java堆内存中，执⾏checkpoint的时候，会把state的快照数
据保存到jobmanager的内存中，基于内存的state backend在⽣产环境下不建议使⽤。
（2）FsStateBackend：state数据保存在taskmanager的内存中，执⾏checkpoint的时候，会把state的快照
数据保存到配置的⽂件系统中，可以使⽤hdfs等分布式⽂件系统。
（3）RocksDBStateBackend：RocksDB跟上⾯的都略有不同，它会在本地⽂件系统中维护状态，state会
直接写⼊本地rocksdb中。同时它需要配置⼀个远端的filesystem uri（⼀般是HDFS），在做checkpoint的
时候，会把本地的数据直接复制到filesystem中。fail over的时候从filesystem中恢复到本地。RocksDB克
服了state受内存限制的缺点，同时⼜能够持久化到远端⽂件系统中，⽐较适合在⽣产中使⽤。

Window(窗口)

窗口计算是流计算的核心，是将unbounded stream 拆分有限大小，一般这种拆分依据时间或数目去限定一个窗口的大小。然后用户可以基于这些有限大小的窗口实现常规计算。首先我们先来研究一下Flink窗口计算的基本代码架构：

keyed streams

stream
       .keyBy(...)               <-  将non-keyed stream 转换为keyed stream
       .window(...)              <-  必须指定: "assigner" 窗口类型
      [.trigger(...)]            <-  可选: "trigger" (每种类型Window一般都默认 trigger)
      [.evictor(...)]            <-  可选: "evictor" (默认所有Window没有 evictor 策略) 剔除元素
      [.allowedLateness(...)]    <-  可选: "lateness" (默认不处理迟到数据)
      [.sideOutputLateData(...)] <-  可选: "output tag" (将迟到数据单独使用side output输出出去)
       .reduce/aggregate/fold/apply()      <-  必须指定: "function" 窗口
      [.getSideOutput(...)]      <-  可选: "output tag" 通过该方法拿到迟到数据

non-keyed

stream
       .windowAll(...)           <-  必须指定: "assigner" 窗口类型
      [.trigger(...)]            <-  可选: "trigger" (每种类型Window一般都默认 trigger)
      [.evictor(...)]            <-  可选: "evictor" (默认所有Window没有 evictor 策略) 剔除元素
      [.allowedLateness(...)]    <-  可选: "lateness" (默认不处理迟到数据)
      [.sideOutputLateData(...)] <-  可选: "output tag" (将迟到数据单独使用side output输出出去)
       .reduce/aggregate/fold/apply()      <-  必须指定: "function" 窗口
      [.getSideOutput(...)]      <-  可选: "output tag" 通过该方法拿到迟到数据

Window Lifecycle

当有一个元素落入了窗口的时间范围该窗口将创建了，当watermarker没过了当前窗口的end time的时候该窗口会被自动删除。

Flink保证窗口删除只包含一下几种：sliding、tumbling、session 窗口，不包含 global windows，因为global windows是基于元素的个数对窗口划分，并不是基于时间。

每个窗口都有一个Trigger和聚合函数，触发器主要负责触发窗口，聚合函数主要负责做计算。这里面除了 global windows没有触发器以外，其他的所有窗口都有默认触发器。除了以上以外窗口还可以指定Evictor用于在窗口触发以前或者触发以后剔除窗口中的元素。

Keyed和Non-Keyed Windows区别

输入流的类型是keyed还是non-keyed
keyed window，窗口计算更具keyed数据会出现多个窗口计算的并行实例，相同key的元素一定会发送给同一个窗口实例做计算。
non-keyed window，任意时刻只有一个窗口计算实例。

Window Assigners

定义了元素是如何落入到窗口当中去的（窗口的类型）。Flink中已经定义好了一些常见的窗口分配器比如：tumbling windows, sliding windows, session windows 以及global windows，除了global windows之外的所有窗口都是基于时间窗口，这些窗口可以基于EventTime或者ProcessTime ，这些Time Window 有start time 和 end time标示窗口已的范围，该窗口是前闭合后开的。同时该窗口有一个maxTimestamp方法可以计算出该窗口允许的最大时间戳的元素。

Tumbling Windows（滚动）keyed

滚地窗口长度固定，滑动间隔等于窗口长度，窗口元素之间没有交叠。

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .reduce((v1,v2)=>(v1._1,v1._2+v2._2))
    .print()

env.execute("wordcount")

Sliding Windows（滑动）keyed

窗口长度大于窗口滑动间隔，元素存在交叠。

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .window(SlidingProcessingTimeWindows.of(Time.seconds(10),Time.seconds(5)))
    .reduce((v1,v2)=>(v1._1,v1._2+v2._2))
    .print()

env.execute("wordcount")

Session Windows（MergerWindow）

通过计算元素时间间隔，如果间隔小于session gap则会合并到一个窗口中。如果大于时间间隔，当前窗口关闭，后续的元素属于新的窗口。与滚动和滑动不同的时候回话窗口没有固定的窗口大小，底层本质上做的是窗口合并。

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .window(ProcessingTimeSessionWindows.withGap(Time.seconds(5)))
    .reduce((v1,v2)=>(v1._1,v1._2+v2._2))
    .print()

env.execute("wordcount")

val env=StreamExecutionEnvironment.getExecutionEnvironment
//001 5000 100
//002 10000 10
env.socketTextStream("CentOS",9999)
    .map(line=>line.split("\\s+"))
    .map(ts=>(ts(0),ts(1).toLong,ts(2).toDouble))
    .keyBy(0)
    .window(ProcessingTimeSessionWindows.withDynamicGap[(String,Long,Double)](new SessionWindowTimeGapExtractor[(String,Long,Double)]{
        override def extract(element: (String,Long,Double)): Long = {
            println("element:"+element)
            element._2 //毫秒值
        }
    }))
    .reduce((v1,v2)=>(v1._1,v1._2,v1._3+v2._3))
    .print()

env.execute("wordcount")

Global Windows（全局窗口）

会将所有相同key的元素放到一个全局的窗口中，默认该窗口永远都不会闭合（永远都不会触发），因为该窗口没有默认的窗口触发器Trigger，因此需要用户自定义Trigger。

 val env=StreamExecutionEnvironment.getExecutionEnvironment

    env.socketTextStream("CentOS",9999)
      .flatMap(line=>line.split("\\s+"))
      .map((_,1))
      .keyBy(0)
      .window(GlobalWindows.create())
      .trigger(CountTrigger.of(3)) //只用相同的可以累计达到3个触发window
      .reduce((v1,v2)=>(v1._1,v1._2+v2._2))
      .print()

    env.execute("wordcount")

Window Functions

当用户在设置完window assigner需要给这些窗口中的元素指定聚合|计算。WindowFunction的作用就是对Winbdow的元素做计算。

window function存在形式可以是：ReduceFunction、AggregateFunction、FoldFunction（不可以用在session windows中）、ProcessWindowFunction。其中使用ReduceFunction、AggregateFunction效率比较高，但是使用ProcessWindowFunction可以拿到窗口所有元素，这种计算是全量计算，效率比前两个效率低下。但是通过该方法可以拿到Window的元数据信息。由于ProcessWindowFunction在window触发以前，系统需要缓存所有元素，因此对内存消耗大，但是可以配合ReduceFunction、AggregateFunction、FoldFunction减轻对内存占用。

ReduceFunction

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .reduce(new ReduceFunction[(String,Int)]{
        override def reduce(v1: (String, Int), v2: (String, Int)): (String, Int) = {
            (v1._1,v1._2+v2._2)
        }
    })
    .print()

env.execute("wordcount")

AggregateFunction

An AggregateFunction is a generalized version of a ReduceFunction that has three types: an input type (IN), accumulator type (ACC), and an output type (OUT). The input type is the type of elements in the input stream and the AggregateFunction has a method for adding one input element to an accumulator. The interface also has methods for creating an initial accumulator, for merging two accumulators into one accumulator and for extracting an output (of type OUT) from an accumulator.

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(0)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .aggregate(new AggregateFunction[(String,Int),(String,Int),(String,Int)]{
        override def createAccumulator(): (String, Int) = {
            ("",0)
        }

        override def add(value: (String, Int), accumulator: (String, Int)): (String, Int) = {
            (value._1,value._2+accumulator._2)
        }

        override def getResult(accumulator: (String, Int)): (String, Int) = {
            accumulator
        }

        override def merge(a: (String, Int), b: (String, Int)): (String, Int) = {
            (a._1,a._2+b._2)
        }
    })
    .print()

env.execute("wordcount")

ProcessWindowFunction

A ProcessWindowFunction gets an Iterable containing all the elements of the window, and a Context object with access to time and state information, which enables it to provide more flexibility than other window functions. This comes at the cost of performance and resource consumption, because elements cannot be incrementally aggregated but instead need to be buffered internally until the window is considered ready for processing.

该方法拿到的包含窗口所有元素的集合和上下文对象，所以该方法比其他的窗口函数要灵活很多。当然这是以性能和资源消耗为代价的，因为元素不能增量地聚合，而是需要内部进行缓冲，直到窗口被认为可以被处理的时候才会进行处理。

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
.flatMap(_.split("\\s+"))
.map((_,1))
.keyBy(_._1)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.process(new ProcessWindowFunction[(String,Int),(String,Int,Int),String,TimeWindow]{
    override def process(key: String,
                         context: Context,
                         elements: Iterable[(String, Int)],
                         out: Collector[(String, Int,Int)]): Unit = {
        var total=0
        for(i<- elements){
            total += i._2
        }
        //局部更新 该状态和窗口的生命周期绑定
        val windowState= context.windowState.getState[Int](new ValueStateDescriptor[Int](key+"windowCount",createTypeInformation[Int]))
        var currentCount=windowState.value()+total
        windowState.update(currentCount)

        //获取全局状态 和窗口无关
        val globalState= context.globalState.getState[Int](new ValueStateDescriptor[Int](key+"globalcount",createTypeInformation[Int]))
        val globalCount=globalState.value()+total
        globalState.update(globalCount)

        out.collect((key,currentCount,globalCount))
    }
})
.print()

env.execute("wordcount")

ProcessWindowFunction with Incremental Aggregation

A ProcessWindowFunction can be combined with either a ReduceFunction, an AggregateFunction, or a FoldFunction to incrementally aggregate elements as they arrive in the window. When the window is closed, the ProcessWindowFunction will be provided with the aggregated result. This allows it to incrementally compute windows while having access to the additional window meta information of the ProcessWindowFunction.

val env=StreamExecutionEnvironment.getExecutionEnvironment

env.socketTextStream("CentOS",9999)
.flatMap(_.split("\\s+"))
.map((_,1))
.keyBy(_._1)
.window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
.fold(
    ("",0),
    (acc:(String,Int),v:(String,Int))=>(v._1,acc._2+v._2),
    new ProcessWindowFunction[(String,Int),(String,Int,Int),String,TimeWindow] {
        override def process(key: String, context: Context, elements: Iterable[(String, Int)], out: Collector[(String, Int,Int)]): Unit = {

            var total=0
            for(i<- elements){
                total += i._2
            }
            //局部更新 该状态和窗口的生命周期绑定
            val windowState= context.windowState.getState[Int](new ValueStateDescriptor[Int](key+"windowCount",createTypeInformation[Int]))
            var currentCount=windowState.value()+total
            windowState.update(currentCount)

            //获取全局状态 和窗口无关
            val globalState= context.globalState.getState[Int](new ValueStateDescriptor[Int](key+"globalcount",createTypeInformation[Int]))
            val globalCount=globalState.value()+total
            globalState.update(globalCount)

            out.collect((key,currentCount,globalCount))
        }
    }
)
.print()

env.execute("wordcount")

WindowFunction (Legacy)-遗产

In some places where a ProcessWindowFunction can be used you can also use a WindowFunction. This is an older version of ProcessWindowFunction that provides less contextual information and does not have some advances features, such as per-window keyed state.

val env=StreamExecutionEnvironment.getExecutionEnvironment

 env.socketTextStream("CentOS",9999)
    .flatMap(_.split("\\s+"))
    .map((_,1))
    .keyBy(_._1)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
    .apply(new WindowFunction[(String,Int),(String,Int),String,TimeWindow]{
        override def apply(key: String, window: TimeWindow,
                           input: Iterable[(String, Int)],
                           out: Collector[(String, Int)]): Unit = {
            out.collect((key,input.map(_._2).sum))
        }
    })
    .print()

    env.execute("wordcount")

Triggers

触发器决定了窗口什么时候是Ready的以便Window Function处理。每一个WindowAssigner都有一个默认的Trigger。只有当默认的Trigger不满足你的需求的时候，你可以定制自己Trigger.

Trigger定义五大类回调方法，用于响应响应事件：

onElement() 元素落入到Window的时候，回调

onEventTime()当用户注册了 event-time 定时器触发的时候，回调.

onProcessingTime() 当用户注册了 processing-time 定时器触发的时候，回调.

onMerge()该方法是在使用 session window的时候，当窗口合并到时候，该窗口触发器的状态也会合并

clear() 当窗口被移除时候，相应的Trigger的clear会被回调。

注意：前三个方法的返回值都是TriggerResult该返回值决定了当前窗口是否能够被触发。

CONTINUE: 继续保持窗口，不触发 √

FIRE:窗口Ready，可以调用Window Function √

PURGE: 清除窗口的元素，并且丢该该窗口

FIRE_AND_PURGE: 触发窗口的计算，随后将窗口的内容清空。-很少使用

默认的WindowAssigners的触发器

WindowAssigners类型	触发器
event-time window	EventTimeTrigger
processing-time window	ProcessingTimeTrigger
GlobalWindow	NeverTrigger

NeverTrigger

public static class NeverTrigger extends Trigger {
    private static final long serialVersionUID = 1L;

    @Override
    public TriggerResult onElement(Object element, long timestamp, GlobalWindow window, TriggerContext ctx) {
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onEventTime(long time, GlobalWindow window, TriggerContext ctx) {
        return TriggerResult.CONTINUE;
    }

    @Override
    public TriggerResult onProcessingTime(long time, GlobalWindow window, TriggerContext ctx) {
        return TriggerResult.CONTINUE;
    }

    @Override
    public void clear(GlobalWindow window, TriggerContext ctx) throws Exception {}

    @Override
    public void onMerge(GlobalWindow window, OnMergeContext ctx) {
    }
}

DeltaTrigger

val env=StreamExecutionEnvironment.getExecutionEnvironment
//001 70
env.socketTextStream("CentOS",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1).toDouble))
.keyBy(_._1)
.window(GlobalWindows.create())
.trigger(DeltaTrigger.of[(String,Double),GlobalWindow](20.0,new DeltaFunction[(String,Double)] {//如果差值大于20就触发窗口
    override def getDelta(oldDataPoint: (String, Double), newDataPoint: (String, Double)): Double = {
        println(oldDataPoint+"\t"+newDataPoint)
        newDataPoint._2-oldDataPoint._2
    }
},createTypeInformation[(String,Double)].createSerializer(env.getConfig)))
.process(new ProcessWindowFunction[(String,Double),(String,Int,Int),String,GlobalWindow]{
    override def process(key: String,
                         context: Context,
                         elements: Iterable[(String, Double)],
                         out: Collector[(String, Int,Int)]): Unit = {
        elements.foreach(println)
    }
})
.print()

env.execute("wordcount")

Evictors(剔除)

Flink’s windowing model allows specifying an optional Evictor in addition to the WindowAssigner and the Trigger. This can be done using the evictor(...) method (shown in the beginning of this document). The evictor has the ability to remove elements from a window after the trigger fires and before and/or after the window function is applied.

void evictBefore(Iterable> elements, int size, W window, EvictorContext evictorContext);
void evictAfter(Iterable> elements, int size, W window, EvictorContext evictorContext);

val env=StreamExecutionEnvironment.getExecutionEnvironment
//001 70
env.socketTextStream("CentOS",9999)
    .map(line=>line.split("\\s+"))
    .map(tokens=>(tokens(0),tokens(1).toDouble))
    .keyBy(_._1)
    .window(ProcessingTimeSessionWindows.withGap(Time.seconds(10)))
    .evictor(CountEvictor.of(3))
    .process(new ProcessWindowFunction[(String,Double),(String,Int,Int),String,TimeWindow]{
        override def process(key: String,
                             context: Context,
                             elements: Iterable[(String, Double)],
                             out: Collector[(String, Int,Int)]): Unit = {

            elements.foreach(println)
        }
    })
    .print()

env.execute("wordcount")

Event Time 窗口

Flink支持多种时间计量方式：

Processing Time: 运行算子执行节点系统时钟（默认时间策略）
Event time：事件时间，通常这些数据是内嵌在Event当中。
Ingestion time：数据进入到计算集群时间（Flink Source）

相比较这三种处理方式，Ingestion time和Processing Time都无法处理迟到或者过期的数据。因此如果用户使用EventTime的时候相比较前两种可能复杂一些，需要用户指定一个Watermarker生成策略，用于计算窗口的触发时间。

设置时间策略

val env=StreamExecutionEnvironment.getExecutionEnvironment
//设置时间策略
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime|ProcessingTime|EventTime)

以上两种策略均不需要系统维护watermarker，因此以上两种策略在使用的时候基本一样。如果用户使用了EventTime策略，在流处理当中必须手动指定Watermarker的生成策略。

Watermarker

指定的watermarker生成策略：

AssignerWithPeriodicWatermarks

//计算当前水位线，系统定期 调用
Watermark getCurrentWatermark();
//从元素中抽取 EvetTime
long extractTimestamp(T element, long previousElementTimestamp);

val maxOrderness:Long=2000 //允许最大乱序 2s
var maxCurrentTimestamp:Long=0
val sdf=new SimpleDateFormat("HH:mm:ss")
//定期计算一次最新水位线
override def getCurrentWatermark: Watermark = {
    val w=maxCurrentTimestamp-maxOrderness
    new Watermark(w)
}
//抽取当前Event的时间
override def extractTimestamp(element: (String, String, Double, Long), previousElementTimestamp: Long): Long = {
    maxCurrentTimestamp=Math.max(maxCurrentTimestamp,element._4)
    println("currentwatermarker:"+ sdf.format(maxCurrentTimestamp-maxOrderness)+",crentEventTime:"+sdf.format(element._4))
    element._4
}

通过设置env.getConfig.setAutoWatermarkInterval(1000)，设置系统计算水位线的频率(推荐)

AssignerWithPunctuatedWatermarks（数据过来就触发一次水位线计算）

//Event产生的时候，系统计算一次水位线
Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
//从元素中抽取 EvetTime
long extractTimestamp(T element, long previousElementTimestamp);

val maxOrderness:Long=2000 //允许最大延迟 2s
var maxCurrentTimestamp:Long=0
val sdf=new SimpleDateFormat("HH:mm:ss")
//定期计算一次最新水位线
override def getCurrentWatermark: Watermark = {
    val w=maxCurrentTimestamp-maxOrderness
    new Watermark(w)
}
//抽取当前Event的时间
override def extractTimestamp(element: (String, String, Double, Long), previousElementTimestamp: Long): Long = {
    maxCurrentTimestamp=Math.max(maxCurrentTimestamp,element._4)
    println("Thread:"+Thread.currentThread().getId+"\tW:"+ sdf.format(maxCurrentTimestamp-maxOrderness)+",crentEventTime:"+sdf.format(element._4))
    element._4
}

案例测试

val env=StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
//设置时间策略
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.getConfig.setAutoWatermarkInterval(1000)
//001 zs 4.5 时间戳
env.socketTextStream("CentOS",9999)
.map(line=>line.split("\\s+"))
.map(tokens=>(tokens(0),tokens(1),tokens(2).toDouble,tokens(3).toLong))
.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks[(String, String, Double, Long)] {
    val maxOrderness:Long=2000 //允许最大乱序 2s
    var maxCurrentTimestamp:Long=0
    val sdf=new SimpleDateFormat("HH:mm:ss")
    //定期计算一次最新水位线
    override def getCurrentWatermark: Watermark = {
        val w=maxCurrentTimestamp-maxOrderness
        new Watermark(w)
    }
    //抽取当前Event的时间
    override def extractTimestamp(element: (String, String, Double, Long), previousElementTimestamp: Long): Long = {
        maxCurrentTimestamp=Math.max(maxCurrentTimestamp,element._4)
        println("currentwatermarker:"+ sdf.format(maxCurrentTimestamp-maxOrderness)+",crentEventTime:"+sdf.format(element._4))
        element._4
    }
})
.keyBy(_._1)
.timeWindow(Time.seconds(5))
.process(new ProcessWindowFunction[(String,String,Double,Long),(String,String,Double,Long),String,TimeWindow]{
    override def process(key: String,
                         context: Context,
                         elements: Iterable[(String,String,Double,Long)],
                         out: Collector[(String,String,Double,Long)]): Unit = {
        val sdf=new SimpleDateFormat("HH:mm:ss")
        val start=sdf.format(context.window.getStart)
        val end=sdf.format(context.window.getEnd)
        val waterMarker=sdf.format(context.currentWatermark)
        println(s"=========${start} \tw:${waterMarker}=========")
        elements.foreach(println)
        println(s"=========${end} \tw:${waterMarker}=========")
        println()
        println()
    }
})
.print()

env.execute("wordcount")

注意为了测试效果方便，这需要设置并行度为1

Watermarks in Parallel Streams

如果在多个流中都含有watermarker，在做时间计算的时候以小的时间为准。

Late Elements

默认Flink迟到的数据会被丢弃 w(T) >w1( window end 时间 T‘)，后续再有数据落入w1，这些数据默认会丢弃。
```
w=max(EventTime)-ordernesstime
```
Flink支持迟到数据处理，w(T) - w1(T’)< late时间，该元素还可以加入到当前窗口计算。

.timeWindow(Time.seconds(5))
.allowedLateness(Time.seconds(2))//允许最大迟到时间

如果W(T)-w1(T’)> late时间，Flink做法是丢弃，当然用户可以使用Sideout机制，Flink会自动将迟到的数据写到side out流中

.timeWindow(Time.seconds(5))
.allowedLateness(Time.seconds(2))
.sideOutputLateData(lateTag)
.reduce/flod/aggreate/apply
.getSideOutput

val env=StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
//设置时间策略
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.getConfig.setAutoWatermarkInterval(1000)

val lateTag=new OutputTag[(String,String,Double,Long)]("late")
//001 zs 4.5 时间戳
val windowStream = env.socketTextStream("CentOS", 9999)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1), tokens(2).toDouble, tokens(3).toLong))
.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks[(String, String, Double, Long)] {
    val maxOrderness: Long = 2000 //允许最大延迟 2s
    var maxCurrentTimestamp: Long = 0
    val sdf = new SimpleDateFormat("HH:mm:ss")

    //定期计算一次最新水位线
    override def getCurrentWatermark: Watermark = {
        val w = maxCurrentTimestamp - maxOrderness
        new Watermark(w)
    }

    //抽取当前Event的时间
    override def extractTimestamp(element: (String, String, Double, Long), previousElementTimestamp: Long): Long = {
        maxCurrentTimestamp = Math.max(maxCurrentTimestamp, element._4)
        println("Thread:" + Thread.currentThread().getId + "\tW:" + sdf.format(maxCurrentTimestamp - maxOrderness) + ",crentEventTime:" + sdf.format(element._4))
        element._4
    }
})
.keyBy(_._1)
.timeWindow(Time.seconds(5))
.allowedLateness(Time.seconds(2))
.sideOutputLateData(lateTag)
.process(new ProcessWindowFunction[(String, String, Double, Long), (String, String, Double, Long), String, TimeWindow] {
    override def process(key: String,
                         context: Context,
                         elements: Iterable[(String, String, Double, Long)],
                         out: Collector[(String, String, Double, Long)]): Unit = {
        val sdf = new SimpleDateFormat("HH:mm:ss")
        val start = sdf.format(context.window.getStart)
        val end = sdf.format(context.window.getEnd)
        val waterMarker = sdf.format(context.currentWatermark)
        println(s"=========${start} \tw:${waterMarker}=========")
        elements.foreach(println)
        println(s"=========${end} \tw:${waterMarker}=========")
        println()
        println()
    }
})

windowStream.print()
windowStream.getSideOutput(lateTag).print("late:")

env.execute("wordcount")

Window Join

窗口的Join只join连接两个流①有共同key（连接条件）②数据落入同一个时间窗口。然后被join的数据会传递``JoinFunction和FlatJoinFunction`常见窗口join的代码结构：

stream.join(otherStream)
    .where()
    .equalTo()
    .window()
    .apply()

需要留意点：

两个流的连接形式是Inner-join，如果尤其在一个流的数据没能和另外一个流join上，默认系统不输出任何信息。
所有落如到窗口的元素被join之后，所有元素的时间都变成当前窗口的最大时间，例如如果窗口[5,10）中的元素参与join，所有元素的时间戳将变成9（Window EndTime）.

Tumbling Window Join

When performing a tumbling window join, all elements with a common key and a common tumbling window are joined as pairwise combinations and passed on to a JoinFunction or FlatJoinFunction. Because this behaves like an inner join, elements of one stream that do not have elements from another stream in their tumbling window are not emitted!

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setParallelism(1)
//设置时间特征
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//设置watermarker计算频率 1s
fsEnv.getConfig.setAutoWatermarkInterval(1000)

//1 zhansan 1567392721000
val userDataStream = fsEnv.socketTextStream("CentOS", 9999)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1),tokens(2).toLong))
.assignTimestampsAndWatermarks(new UserWaterMarker)
//1 apple 4.5 1567392721000
val orderDataStream = fsEnv.socketTextStream("CentOS", 8888)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1), tokens(2).toDouble,tokens(3).toLong))
.assignTimestampsAndWatermarks(new OrderWaterMarker)

userDataStream.join(orderDataStream)
.where(user=>user._1)
.equalTo(order=>order._1)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.apply((user,order)=>{
    (user._1,user._2,order._2,order._3)
})
.print()

fsEnv.execute("UserOrderJoin")

class OrderWaterMarker extends AssignerWithPeriodicWatermarks[(String,String,Double,Long)]{
    val maxOrderness:Long=2000
    var currentMaxTimestamp:Long=0
    var sdf=new SimpleDateFormat("HH:mm:ss")
    override def getCurrentWatermark: Watermark = {
        return new Watermark(currentMaxTimestamp-maxOrderness)
    }

    override def extractTimestamp(element: (String,String,Double,Long), previousElementTimestamp: Long): Long = {
        currentMaxTimestamp=Math.max(currentMaxTimestamp,element._4)
        println(s"Watermark:${sdf.format(currentMaxTimestamp-maxOrderness)}\tEventTime:${sdf.format(element._4)}")
        element._4
    }
}

class UserWaterMarker extends AssignerWithPeriodicWatermarks[(String,String,Long)]{
    val maxOrderness:Long=2000
    var currentMaxTimestamp:Long=0
    var sdf=new SimpleDateFormat("HH:mm:ss")
    override def getCurrentWatermark: Watermark = {
        return new Watermark(currentMaxTimestamp-maxOrderness)
    }

    override def extractTimestamp(element: (String, String, Long), previousElementTimestamp: Long): Long = {
        currentMaxTimestamp=Math.max(currentMaxTimestamp,element._3)
        println(s"Watermark:${sdf.format(currentMaxTimestamp-maxOrderness)}\tEventTime:${sdf.format(element._3)}")
        element._3
    }
}

Sliding Window Join

When performing a sliding window join, all elements with a common key and common sliding window are joined as pairwise combinations and passed on to the JoinFunction or FlatJoinFunction. Elements of one stream that do not have elements from the other stream in the current sliding window are not emitted! Note that some elements might be joined in one sliding window but not in another!

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setParallelism(1)
//设置时间特征
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//设置watermarker计算频率 1s
fsEnv.getConfig.setAutoWatermarkInterval(1000)

//1 zhansan 1567392721000
val userDataStream = fsEnv.socketTextStream("CentOS", 9999)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1),tokens(2).toLong))
.assignTimestampsAndWatermarks(new UserWaterMarker)
//1 apple 4.5 1567392721000
val orderDataStream = fsEnv.socketTextStream("CentOS", 8888)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1), tokens(2).toDouble,tokens(3).toLong))
.assignTimestampsAndWatermarks(new OrderWaterMarker)

userDataStream.join(orderDataStream)
.where(user=>user._1)
.equalTo(order=>order._1)
.window(SlidingEventTimeWindows.of(Time.seconds(4),Time.seconds(2)))
.apply((user,order)=>{
    (user._1,user._2,order._2,order._3)
})
.print()

fsEnv.execute("UserOrderJoin")

Session Window Join

When performing a session window join, all elements with the same key that when “combined” fulfill the session criteria are joined in pairwise combinations and passed on to the JoinFunction or FlatJoinFunction. Again this performs an inner join, so if there is a session window that only contains elements from one stream, no output will be emitted!

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setParallelism(1)
//设置时间特征
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//设置watermarker计算频率 1s
fsEnv.getConfig.setAutoWatermarkInterval(1000)

//1 zhansan 1567392721000
val userDataStream = fsEnv.socketTextStream("CentOS", 9999)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1),tokens(2).toLong))
.assignTimestampsAndWatermarks(new UserWaterMarker)
//1 apple 4.5 1567392721000
val orderDataStream = fsEnv.socketTextStream("CentOS", 8888)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1), tokens(2).toDouble,tokens(3).toLong))
.assignTimestampsAndWatermarks(new OrderWaterMarker)

userDataStream.join(orderDataStream)
.where(user=>user._1)
.equalTo(order=>order._1)
.window(EventTimeSessionWindows.withGap(Time.seconds(2)))
.apply((user,order)=>{
    (user._1,user._2,order._2,order._3)
})
.print()

fsEnv.execute("UserOrderJoin")

Interval Join

The interval join joins elements of two streams (we’ll call them A & B for now) with a common key and where elements of stream B have timestamps that lie in a relative time interval to timestamps of elements in stream A.

This can also be expressed more formally as b.timestamp ∈ [a.timestamp + lowerBound; a.timestamp + upperBound] ora.timestamp + lowerBound <= b.timestamp <= a.timestamp + upperBound

如果当前水位线淹没可orange流中元素的时间区间，说明该元素不可能和后续任意元素实现jion，因此后续再有数据落入该区间范围系统默认会丢该该数据。

val fsEnv = StreamExecutionEnvironment.getExecutionEnvironment
fsEnv.setParallelism(1)
//设置时间特征
fsEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
//设置watermarker计算频率 1s
fsEnv.getConfig.setAutoWatermarkInterval(1000)

//1 zhansan 1567392721000
val userDataStream = fsEnv.socketTextStream("CentOS", 9999)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1),tokens(2).toLong))
.assignTimestampsAndWatermarks(new UserWaterMarker)
.keyBy(_._1)
//1 apple 4.5 1567392721000
val orderDataStream = fsEnv.socketTextStream("CentOS", 8888)
.map(line => line.split("\\s+"))
.map(tokens => (tokens(0), tokens(1), tokens(2).toDouble,tokens(3).toLong))
.assignTimestampsAndWatermarks(new OrderWaterMarker)
.keyBy(_._1)

userDataStream.intervalJoin(orderDataStream)
.between(Time.seconds(-2),Time.seconds(2))
.process(new ProcessJoinFunction[(String,String,Long),(String,String,Double,Long),(String,String,String,Double)] {
    override def processElement(left: (String, String, Long),
                                right: (String, String, Double, Long),
                                ctx: ProcessJoinFunction[(String, String, Long), (String, String, Double, Long), (String, String, String, Double)]#Context,
                                out: Collector[(String, String, String, Double)]): Unit = {
        val timestamp = ctx.getTimestamp
        val userts = ctx.getLeftTimestamp
        val orderts = ctx.getRightTimestamp
        println(s"${timestamp},${userts},${orderts},${left.toString()},${right.toString()}")
        println()
    }
}).print()
fsEnv.execute("IntervalJoin")

你可能感兴趣的:(第三代大数据处理方案Flink)

单体架构、微服务组件与解决方案 Forever Nore 分布式 SpringCloud 架构微服务面试
资料：微服务、MQ资料链接:https://pan.baidu.com/s/1nzCJ-hNw854uFJQf6jWobg提取码:yyds单体架构微服务拆分与改进将单体项目拆分成微服务项目1、拆分原则
融合网络实训室初步建设方案设想武汉唯众智创网络融合网络实训室融合网络融合网络实验室网络融合实训室网络融合实验室
一、引言在数字化浪潮席卷全球的当下，网络技术已然成为推动社会发展和经济增长的关键力量。从日常的生活购物到企业的运营管理，从便捷的社交沟通到前沿的科研探索，网络技术无处不在，深刻地改变着人们的生活与工作方式。随着5G、物联网、云计算、大数据等新兴技术的迅猛发展，网络技术领域对于专业人才的需求呈现出爆发式增长。据权威机构预测，未来几年，网络技术相关岗位的人才缺口将持续扩大。这些岗位不仅要求从业者具备扎
国央企AI落地：以智能客服系统为突破口的详细实施方案探讨数商云网络 B2B系统数字化电商平台人工智能大数据架构 java 微服务 spring
一、引言随着人工智能（AI）技术的飞速发展和广泛应用，国央企作为国民经济的重要支柱，正积极探索AI技术在企业管理、业务运营等方面的应用。智能客服系统作为AI技术的一个重要应用领域，具有提升服务效率、增强用户体验、降低运营成本等显著优势，成为国央企AI落地的重要突破口。本文将详细探讨国央企如何以智能客服系统为突破口，实施AI技术的落地应用，并结合数商云在智能客服系统领域的实践经验，为国央企提供一套切
软件测试工程师面试题（含答案）美团程序员自动化测试软件测试软件测试面试软件测试软件测试面试面试职场和发展
面试题列表1、自我介绍，涉及工作经历答：基本信息+擅长测试方向+个人突出亮点+以往工作经历等等。2、在公司中测试的流程是什么答：测试流程：需求评审>测试计划>测试计划>测试方案>编写用例>执行用例>回归测试>提交缺陷报告>提交测试报告3、举例一个项目，在项目中做了什么答：可以聊聊做了性能、自动化、工具开发，测试平台开发、功能等自己擅长的地方。4、如何提升测试效率，如何保障测试质量答：测试人员应该从
国产芯片替代方案：解析沁恒USB桥接芯片转四串/双串/单串口禾川兴 13242400688 沁恒直替系列单片机嵌入式硬件桥接模式
沁恒国产USB桥接芯片：高兼容性方案加速国产化替代随着USB通信接口在工业控制、消费电子、智能设备等领域的广泛应用，国产芯片厂商沁恒微电子（WCH）推出了一系列高性能USB转串口芯片，以高集成度、低成本、全兼容设计实现对国际品牌芯片的完美替代。本文针对沁恒多款USB桥接芯片与国际主流型号进行对比，展现其技术竞争力与国产化价值。‌一、沁恒USB桥接芯片核心优势‌‌全兼容设计‌硬件引脚定义、封装尺寸、
突破反爬终极指南：如何用Python实现100%隐形数据抓取（附实战代码）煜bart 机器人人工智能 web3.py
引言：当爬虫遭遇铜墙铁壁2023年Q2最新统计显示，全球Top100网站中89%部署了AI驱动的反爬系统，传统爬虫存活率暴跌至17%。本文将揭秘一套基于深度伪装技术的爬虫方案，在最近三个月实测中保持100%成功率，成功突破Cloudflare、Distil等顶级防护系统。---###一、指纹伪装：让爬虫"隐身"的核心科技####1.1浏览器指纹深度克隆（代码实现）```pythonfromsele
深度合成算法备案十大雷区拆解 AI产品备案人工智能算法语言模型 ai
最近后台收到了很多小伙伴的私信，基本上都是在问算法备案被打回了；哪部分的材料有什么问题；不清楚驳回原因等等。今天结合大家最关心的问题，为大家详细剖析一下备案过程中常见的十大难题及解决方法。一、备案主体性质界定不明不少企业在备案过程中往往难以明确自身是否属于备案主体范围，尤其是涉及技术提供与应用服务的交叉领域，无法判断自身是否属于“具有舆论属性或者社会动员能力”主体。解决方案：仔细研读相关政策法规，
大模型和数据要素赋能实体零售行业数字化转型建设和实施方案优享智库大模型数据要素数据治理数据仓库主数据零售
大模型和数据要素赋能实体零售行业数字化转型建设和实施方案更多参考公众号：优享智库引言项目背景与意义数字化转型目标与期望实施方案概述零售行业现状及挑战实体零售行业现状数字化转型面临的挑战市场需求与趋势分析大模型与数据要素赋能策略大模型技术及应用场景数据要素采集、整合与治理赋能策略制定与实施路径数字化转型关键技术与解决方案人工智能技术及应用大数据分析与挖掘技术云计算、物联网等技术支持定制化解决方案设计
使用Unity引擎开发的Windows 11系统3D打地鼠游戏的方案 1079986725 手机游戏开发者 Windows 游戏 java 玩游戏
创建Unity项目：使用UnityHub新建3D项目设置目标平台为Windows场景搭建：csharp//地鼠控制器WhackAMole.csusingUnityEngine;usingSystem.Collections;publicclassWhackAMole:MonoBehaviour{publicfloatpopupDuration=1.5f;publicfloatminHideTime
Ubuntu安装docker-compose-plugin报错“无法定位软件包”终极解决方案川星弦 ubuntu docker linux
摘要：本文针对Ubuntu系统中安装docker-compose-plugin时出现的E:无法定位软件包错误，提供两种解决方案——官方源修复与国内镜像源替换法，并附赠镜像加速配置技巧。一、问题背景在Ubuntu系统通过aptinstall安装Docker生态工具时，常因软件源配置问题导致以下报错：E:无法定位软件包docker-compose-plugin此问题多由Docker官方软件源未正确添加
Android中实现多线程的几种方式 Ever69 Android《葵花宝典》android
目录1.基础线程（Thread）2.Handler与Looper3.AsyncTask（已废弃，仅作了解）4.ExecutorService（线程池）5.IntentService（已废弃，推荐WorkManager）6.Kotlin协程（Coroutines，现代推荐方案）7.HandlerThread对比总结最佳实践建议在Android中，实现多线程编程主要有以下几种方式，每种方式都有其适用场
【DevOps】Backstage介绍及如何在Azure Kubernetes Service上进行部署小涵 Azure云企业实践分享 devops azure kubernetes 容器 docker backstage
【DevOps】Backstage介绍及如何在AzureKubernetesService上进行部署推荐超级课程：本地离线DeepSeekAI方案部署实战教程【完全版】Docker快速入门到精通Kubernetes入门到大师通关课AWS云服务快速入门实战目录【DevOps】Backstage介绍及如何在AzureKubernetesService上进行部署Backstage介绍在AKS上部署Bac
Spring Boot 多级缓存实战：基于 Redis+Redisson 构建高并发解决方案 Isaac_Gao 缓存 spring boot redis
SpringBoot多级缓存实战：基于Redis+Redisson构建高并发解决方案本文适合人群：中高级Java开发工程师、系统架构师、对高并发场景优化感兴趣的技术人员一、为什么需要多级缓存？在百万级并发的电商系统中，我们曾遇到这样的性能瓶颈：本地缓存导致各节点数据不一致单纯依赖Redis造成带宽瓶颈缓存雪崩导致DB被打挂多级缓存架构通过结合本地缓存与分布式缓存的优势，实现了：热点数据纳秒级访问分
手机与电脑大文件无线传输方案刘阿去电脑技巧
手机红米k30,需要es文件管理器电脑win10手机访问电脑文件:电脑打开热点,手机连接.在ftp菜单访问,如下从电脑访问手机文件从es文件浏览器打开从pc打开,添加ftp位置就可以.,访问速度还可以,不用装其他软件
学校打算用十万购买一台服务器，大家有什么推荐吗？ m0_59732961 云服务器阿里云
上云吧！上云是趋势，先po几条有的没的：北京市国税局与阿里云达成战略合作共同推进“智慧税务”建设...阿里云为12306提供技术支持...浙江启动“十万企业上云”计划...为什么要上云：1、自购服务器很可能会遇到技术壁垒，面对技术问题没有成熟的解决方案；2、运维成本低，几乎没什么运维成本；3、云服务器三副本可靠性高，自己买一台服务器还要考虑到容灾的问题吧；4、灵活扩展在线升降配，不会造成资源浪费冗
django自动添加接口文档 LCY133 #django项目实战2023 django sqlite 数据库
以下是使用Django和django-rest-swagger（或替代方案drf-yasg）生成API接口文档的详细指南。由于django-rest-swagger已停止维护，推荐使用drf-yasg（支持Swagger2.0和OpenAPI3.0），但两种方法均会说明：一、方案选择与安装1.方案对比库名维护状态支持规范功能特点django-rest-swagger已弃用Swagger2.0旧项目
从0到1构建AI深度学习视频分析系统--基于YOLO 目标检测的动作序列检查系统：（2）消息队列与消息中间件 shiter 人工智能系统解决方案与技术架构人工智能深度学习音视频
文章大纲原始视频队列Python内存视频缓存优化方案（4GB以内）一、核心参数设计二、内存管理实现三、性能优化策略四、内存占用验证五、高级优化技巧六、部署建议检测结果队列YOLO检测结果队列技术方案一、技术选型矩阵二、核心实现代码三、性能优化策略四、可视化方案对比五、部署建议逻辑判定队列时间片图论时间序列大模型引入参考文献原始视频队列想要在单机内存中缓存1-5分钟的视频片段，python技术栈的话
【从零开始学习计算机科学】软件工程（五）软件设计贫苦游商学习软件工程软件开发软件设计敏捷开发极限编程软件需求
【从零开始学习计算机科学】软件工程（五）软件设计软件设计概述良好的设计具有三大特性设计主要包含的方面设计中的一些概念设计的方法与策略体系结构设计体系结构设计的基本问题：体系结构的设计模式体系结构设计的过程构建级设计面向对象构件设计用户接口设计用户接口设计原则：用户接口分析的目标：设计的评审软件设计概述软件的分析偏重于问题域，描述软件要做什么，而设计则偏重于解决方案，描述软件究竟要如何做。设计创建了
京准电钟：关于NTP网络时间同步系统应用方案北京华人开创公司北斗卫星授时 NTP时间同步卫星同步时钟网络大数据时间同步 NTP 网络授时授时服务卫星授时服务
京准电钟：关于NTP网络时间同步系统应用方案京准电钟：关于NTP网络时间同步系统应用方案一、背景与需求分析在现代信息化系统中，网络设备、服务器、终端设备的时间同步是保障业务连续性、数据一致性和安全审计的核心基础。时间不同步可能导致以下问题：日志记录时间混乱，影响故障排查；分布式系统事务冲突或数据不一致；安全证书验证失败或攻击行为难以追溯；工业控制、金融交易等高精度场景的时间敏感操作异常。需求目标：
京准电钟推荐：智能交通系统NTP时间同步服务设计方案北京华人开创公司时钟系统 NTP时间同步卫星同步时钟 ntp 时钟同步时间同步网络授时网络校时
京准电钟推荐：智能交通系统NTP时间同步服务设计方案京准电钟推荐：智能交通系统NTP时间同步服务设计方案针对智能交通系统的NTP（NetworkTimeProtocol）时间同步方案设计，需确保交通设备（如信号灯、摄像头、传感器、服务器等）的时间高度一致，以提高系统协同效率和数据分析准确性。以下是完整的方案框架：一、需求分析同步精度核心设备（如信号控制机、边缘服务器）需达到**毫秒级（1-10ms
京准电钟分享：医院网络内NTP时间同步服务器作用是什么？北京华人开创公司北斗卫星授时 NTP时间同步 GPS对时装置 NTP 时间同步服务器网络时间服务器 NTP时间服务器网络系统时钟同步
京准电钟分享：医院网络内NTP时间同步服务器作用是什么？京准电钟分享：医院网络内NTP时间同步服务器作用是什么？时间同步技术必定将是整个大数据处理系统的重要支撑和保障。时间同步技术使数据产生与处理系统的所有节点具有全局的、统一的标准时间，从而使系统中的所有各种消息、事件、节点、数据等具备正确的逻辑性、协调性以及可追溯性。大数据产生与处理系统是各种计算设备集群的，计算设备将统一、同步的标准时间用于记
时间同步装置（卫星时钟同步）工作原理介绍北京华人开创公司卫星同步时钟 NTP时间同步大数据自动驾驶神经网络
时间同步装置（卫星时钟同步）工作原理介绍时间同步装置（卫星时钟同步）工作原理介绍微软从Windows2000开始，系统就支持使用NTP同步的方式获取时间，Windows系统默认的时间源都来自time.windows.com。为了用户使用的方便，time.windows.com以及大多数公网NTP时间服务器没有使用NTP加密方案传输时间，而是使用明文传输。这种公网时间同步方案是非常不可靠的，容易被黑
Linux系统下装R包又慢又容易报错？ NameError_sfj 鸡毛蒜皮 linux r语言
短话短说：Linux安装默认从源码安装，因此在终端中使用Rconsole装包时会从源码重新编译，这个过程十分耗时，且很容易出错。解决方案有二：1）使用Rstudio/Rstudio-server，因为Rstudio支持预编译安装，直接下载安装编译好的R包，省时省力；2）使用包管理工具，如conda，通过condainstall直接安装R包的预编译版本短话长说版本：打开Linux终端、键入R进入Rc
电力时间同步系统，京准电钟电子助力增效北京华人开创公司时钟系统卫星同步时钟北斗卫星授时时钟同步时间同步北斗卫星授时授时服务器 NTP时间服务器卫星时钟服务器
电力时间同步系统，京准电钟电子助力增效电力时间同步系统，京准电钟电子助力增效电力时间同步系统是保障电网稳定运行的关键技术，其核心在于为全网的设备提供统一、高精度的时间基准。以下从技术方案、系统设计要点及挑战与解决方案等方面进行详细阐述：一、主要技术方案卫星同步技术GPS/北斗授时：通过接收卫星信号（如GPS或北斗）获取高精度时间源，精度可达微秒级（1μs），适用于故障录波、继电保护等场景。北斗系统
204页数字化转型：集团企业信息化规划方案公众号：智慧方案文库精选解决方案（附下载）大数据 database
建立统一共享的信息平台，集团总部能实时监控下属单位的库存、产量、成本、资金流等关键信息，有效利用大数据技术平台为管理层提供全面、及时、准确的决策信息支持。n推动生产、销售、新业务领域的自动化、数字化、网络化、信息化、集成化，为今后打造智能制造、智慧农业奠定坚实基础；n推进IT基础设施建设与提升，采用全新的架构设计理念，建成组件化、集中化、服务化、协同化的统一云平台，提供高质量、可重用的平台服务，营
【Vue3+Vite指南】全局引入SCSS文件后出现Undefined mixin？一招解决命名空间陷阱！积水成江前端 scss 前端 html5 vue.js
【Vue3+Vite全局引入SCSS指南】解决Undefinedmixin错误的完整方案本文目录前置准备：安装SCSS环境问题现象与错误分析根本原因：Sass模块化的命名空间三大解决方案详解方案1:显式命名空间调用方案2:全局暴露命名空间方案3:主文件聚合导出操作验证步骤扩展：@use与@import对比最佳实践与避坑指南常见问题FAQ️前置准备：安装SCSS环境{#-前置准备}步骤1：安装Sas
方案精读：185页PPT基于IPD流程的研发项目管理讲座智慧化智能化数字化方案项目经理售前工程师技能提升 IPD流程体系 IPD研发管理 ipd项目管理 IPD流程管理 IPD端到端 IPD流程细则 IPD基础知识
（本解读资料未包含于绑定资源内）绑定资源文档清单：2024版基于华为IPD与质量管理体系融合的研发质量管理(63页).pptxIPD流程操作细则(55页）.pptxIPD的基础知识介绍（54页）.pptIPD端到端流程培训方案【115页PPT】.ppt华为IPDCMM项目管理培训教材（41页）.pptx华为IPD流程体系设计方法论（123页）.pptx华为IPD项目管理“六步一法”.ppt华为的I
数字化建设经营管理平台解决方案（34页PPT）（文末有下载方式）极客11 大数据人工智能物联网
数字化建设经营管理平台解决方案详细解读详细资料请看本解读文章的最后内容。在当今数字化浪潮中，企业经营管理平台的构建已成为提升企业核心竞争力的关键。本文将对《数字化建设经营管理平台解决方案》进行详细解读，深入剖析其核心理念、业务场景、平台建设方案及核心功能，帮助企业更好地理解并应用这一解决方案。一、企业经营管理平台建设的理念企业经营管理平台的建设理念围绕“三驾马车”展开，即战略规划与决策、执行反馈和
Redis 持久化方案对比贝克街的小码农 Java实战方案 redis 数据库缓存
Redis提供了两种主要的持久化方案：RDB（RedisDatabaseBackup）和AOF（Append-OnlyFile）。每种方案都有其优缺点，适用于不同的场景。以下是它们的对比及实际操作方案。1.RDB持久化1.1概述RDB是Redis默认的持久化方式。它通过生成数据集的快照（snapshot）来保存数据。快照是二进制文件，保存了某个时间点的完整数据。1.2优点性能高：RDB是快照方式，
Windows 下 MySQL 命令行操作全指南：端口修改、服务管理与实用技巧 mysql后端命令行
引言MySQL作为最流行的关系型数据库之一，在Windows环境下的配置与管理是开发者必备技能。本文将以命令行操作为核心，详解如何通过命令修改MySQL端口号、启停服务、自定义服务名等实用操作，并结合常见问题解决方案，帮助用户快速掌握MySQL在Windows中的高效管理方法。一、MySQL服务启停1.通过命令行启停启动MySQL服务：netstartmysql8#根据实际服务名调整（如mysql
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring