Chapter 2 Data Processing Using the DataStream API Event time and watermarks

Event time and watermarks

Flink Streaming API takes inspiration from Google Data Flow model. It supports different concepts of time for its streaming API. In general, there three places where we can capture time in a streaming environment. They are as follows
Flink Streaming API 的灵感源于Google Data Flow模型。它支持不同的时间概念。一般来说,有三个地方可以捕获到时间。分别为:

Event time

The time at which event occurred on its producing device. For example in IoT project, the time at which sensor captures a reading. Generally these event times needs to embed in the record before they enter Flink. At the time processing, these timestamps are extracted and considering for windowing. Event time processing can be used for out of order events.
事件时间是设备产生事件的时间,比如:在IOT项目中,传感器捕获读事件的时间。通常,这些时间在事件还没有进入Flink之前,需要先嵌入到记录里。在处理过程中,提取这些时间戮并考虑时间窗口。事件时间处理可以用于无序事件

Processing time

Processing time is the time of machine executing the stream of data processing. Processing time windowing considers only that timestamps where event is getting processed.Processing time is simplest way of stream processing as it does not require any synchronization between processing machines and producing machines. In distributed asynchronous environment processing time does not provide determinism as it is dependent on the speed at which records flow in the system.
(Processing time 是机器处理数据流的时间。处理时间窗口只考虑事件开始被处理时的时间戮。处理时间是最简单的流处理方式,因为它不需要处理机生产机之间的同步。在分布式异步环境中,处理时间是不确定的,因为这取决于记录在系统中的流动速度。)

Ingestion time

This is time at which a particular event enters Flink. All time based operations refer to this timestamp. Ingestion time is more expensive operation than processing but it gives predictable results. Ingestion time programs cannot handle any out of order events as it assigns timestamp only after the event is entered the Flink system.Here is an example which shows how to set event time and watermarks. In case of ingestion time and processing time, we just need to the time characteristics and watermark generation is taken care automatically. Following is a code snippet for the same.

(这个时间是事件进入flink的时间。所有基于时间的操作都会引用这个时间。Ingestion timeprocessing time更耗时,但它会给出一个可预见的结果。基于Ingestion time的程序不能处理任何乱序事件,因为它会在事件进入Flink系统之后指定时间戮。下面有一个例子,这个例子显示了如何设置event timewatermark。在ingestion timeprocessing time的场景下,我们只需要设置时间特征(Timecharacteriatic),水印会自动生成。下面代码是其中的一个代码片段)
译者注:关于水印的文章http://vishnuviswanath.com/flink_eventtime.html

In Java:

final SreamExecutionEnvironment env=StzeamExecutionEnvizonment. getExecutionEnvironment ();

env.setStreamTimeCharacteristic (Timecharacteriatic.ProceasinqTime);
OR
env.setStreamTimeCharacteristic (Timecharacterietic. Inceationtime);

In Scala:

val env = streamExecutionEnvironment.gerExecutionEnvronment 
env.setStreamTimeCharacteziatic (Timecharacteristic. ProceaaingTime)
OR
env.setStreamTimeCharasteristic (TimeCharacteristic. IngestionTime)

In case of event time stream programs, we need to specify the way to assign watermarks and timestamps. There are two ways of assigning watermarks and timestamps:

  • Directly from data source attribute .
  • Using a timestamp assigner
    evnet time的程序中,我们需要指定水印和时间戮的生成方式。有两种方式指定水印和时间戮。
  • 直接从数据源的属性中获取
  • 使用时间戮分配器

To work with event time streams, we need to assign the time characteristic as follows
处理event time流,我们象下面这种方式指定time characteristic

In Java:

final StreamExecutionEnvironment env =streamExecutionEnvironment.getExecutionEnvironment ();
env.setStreamrimeCharacteriatic (Timecharacteristic.EventTime):

In Scala:

val env = streamExecutionEnvironment. getExecut ionEnvironment;
 env.setStreamrimeCharacteriatic  (Timechazacterigtic. Event Time)

It is always best to store event time while storing the record in source. Flink also supports some pre-defined timestamp extractors and watermark generators. Refer to https://ci.ap ache.org/projects/flink/flink-docs-release-1.2/dev/event_timestamp_extractors.html

把记录存到source的同时存event time总是最好的。Flink 也支持一些pre-defined的时间戮提取器和水印生成器,参见...

你可能感兴趣的:(Chapter 2 Data Processing Using the DataStream API Event time and watermarks)