【目录】

Flink 应用开发相关知识
1.1 基础处理语义
1.1.1 Streams
1.1.2 State
1.1.3 Time
1.2 多层次API
1.2.1 ProcessFunction
1.2.2 DataStreamAPI
1.2.3 SQL/Table API

Fink Architecure
2.3.1 stateful

Fink Operation

学习方法&课后作业

Apache Flink Definition

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams.

重点：分布式、有状态、有界和无界的数据计算

Flink Application (Flink 应用开发相关知识)

基础处理语义

Streams

Unbounded streams: have a start but no defined end
Bounded streams: have a defined start and end

State

Stateful: 一个sum或者count的聚合操作，在处理过程中需要keep之前的数据，比如统计pv，uv，需要把之前的数据装在系统中，那么这些数据就是stateful的所以整个计算就是stateful的。
Unstateful: 一个select 操作或者where操作，数据不需要留在系统中，类似写一个计算的代码，不带状态、不带数据，数据、状态这些都存储在DB中。

核心价值：

Incremental processing：需要state机制去keep状态（统计5分钟的pv，uv等）来实现Flink的Incremental processing
Exactly-once Semantics：在运行失败或者出现意外情况的时候可以实现Exactly-once Semantics，在架构处会详细讲

Time

When referring to time in a streaming program (for example to define windows), one can refer to different notions of time:

Event Time is the time when an event was created. It is usually described by a timestamp in the events
Ingestion time is the time when an event enters the Flink dataflow at the source operator.
Processing Time is the local time at each operator that performs a time-based operation.

Note: 为什么会把时间单独拿出来讲：

因为类比批处理，除非是和时间关系很紧密的业务，否则我们不会那么关注时间，但是在流计算中，比如我们度量流计算处理的进度，由于数据本身是unbounded stream，所以计算是持续不停的，不会像批处理一样计算完了任务就
结束了，但对于流计算来说这个任务本质上无法结束，当我们想度量流计算进展到哪一步的时候，我们就需要时间，用event time和process time来对比最新的业务时间，就可以得出实时计算的滞后性，这是表征实时计算是否正常健康的关键指标。

多层次API

每种API从下到上的简洁性(抽象能力)逐渐升高、表达性逐渐降低，即对底层逻辑做了更抽象的封装

ProcessFunction(最细粒度): the most expressive function interfaces that Flink offers
DataStreamAPI(常用的流式操作指令，比如窗口操作等): provides primitives for many common stream processing operations, such as windowing, record-at-a-time transformations, and enriching events by querying an external data store
SQL/Table API(可以对table执行sql操作，批流统一，提供给数据分析用户使用的最佳方式): relational APIs

Flink Architecture(Flink 基本架构以及核心逻辑)

有界和无界数据流

Flink一套框架处理两种数据集合

部署灵活

Flink、支持多种部署方式

极高伸缩性

峰值达17亿条/s，无需任何业务语义调整

极致流式处理性能

本地状态存取，极致性能优化，将状态抽象在框架，支持本地存取而非像storm这种框架，状态的存取在第三方如hbase、redis，所以会加速处理性能，并且减少本地的latency

Stateful

Stateful Flink applications are optimized for local state access. Flink guarantees exactly-once state consistency in case of failures by periodically and asynchronously checkpointing the local state to durable storage.

Flink Operation(Flink 运维管理相关内容)

7X24高可用

一致性checkpoint

业务应用监控运维

web ui、Metric

学习方法：

先实践再理论。先学习应用，尝试构建复杂的Flink Application，比如风控系统，在线机器学习系统
横向拓展。在构建复杂Flink生产业务后，横向使用学习Storm、Spark、DataFlow等系统，知识是演化过来的，必有前置和铺垫，多横向看看，打开视野
关注下Apache Flink以及FlinkChina社区，多交流、多提问、多输出

课后作业：

画一个Flink的思维导图

【笔记】1、Flink基础