涂作权的博客

1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function

1.20.透过窗口看无限数据流----Flink的Window全面解析
1.20.1.Quick Start
1.20.1.1.是什么？
1.20.1.2.如何用？
1.20.1.2.1.Keyed Windows
1.20.1.2.2.Non-Keyed Windows
1.20.1.2.3.简写window操作
1.20.2.Window Assigners
1.20.2.1.分类
1.20.2.2.使用介绍
1.20.2.2.1.Tumbling Windows
1.20.2.2.2.Sliding Windows
1.20.2.2.3.Session Windows
1.20.2.2.4.Global Windows
1.20.3.Window Functions
1.20.3.1.分类
1.20.3.2.使用介绍
1.20.3.2.1.ReduceFunction
1.20.3.2.2.AggregateFunction
1.20.3.2.3.FoldFunction
1.20.3.2.4.ProcessWindowFunction
1.20.3.2.5.增量聚合函数和ProcessWindowFunction整合
1.20.3.2.6.AggregateFunction与ProcessWindowFunction组合
1.20.4.window 生命周期解读
1.20.4.1.生命周期图解
1.20.4.2.分配器(Window Assigners)
1.20.4.3.触发器(Triggers)
1.20.4.4.清除器(Evictors)

1.20.透过窗口看无限数据流----Flink的Window全面解析

以下转自：https://www.cnblogs.com/jmx-bigdata/p/13708868.html

窗口是流式计算中非常常用的算子之一，通过窗口可以将无限流切分成有限流，然后在每个窗口之上使用计算函数，可以实现非常灵活的操作。Flink提供了丰富的窗口操作，除此之外，用户还可以根据自己的处理场景自定义窗口。通过本文，你可以了解到：

窗口的基本概念和简单使用
内置Window Assigners的分类、源码及使用
Window Function的分类及使用
窗口的组成部分及生命周期源码解读
完整的窗口使用Demo案例

1.20.1.Quick Start

1.20.1.1.是什么？

Window(窗口)是处理无界流的核心算子，Window可以将数据流分为固定大小的"桶(buckets)"(即通过按照固定时间或长度将数据流切分成不同的窗口)，在每一个窗口上，用户可以使用一些计算函数对窗口内的数据进行处理，从而得到一定时间范围内的统计结果。比如统计每隔5分钟输出最近一小时内点击量最多的前N个商品，这样就可以使用一个小时的时间窗口将数据限定在固定时间范围内，然后可以对该范围内的有界数据执行聚合处理。

根据作用的数据流(DataStream、KeyedStream)，Window可以分为两种：Keyed Windows与Non-Keyed Windows。其中Keyed Windows是在KeyedStream上使用window(…)操作，产生一个WindowedStream。Non-Keyed Windows是在DataStream上使用windowAll(…)操作，产生一个AllWindowedStream。具体的转换关系如下图所示。注意：一般不推荐使用AllWindowedStream，因为在普通流上进行窗口操作，会将所有分区的流都汇集到单个的Task中，即并行度为1，从而会影响性能。
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第1张图片$

1.20.1.2.如何用？

上面我们介绍了什么是窗口，那么该如何使用窗口呢?具体如下面的代码片段：

1.20.1.2.1.Keyed Windows

stream
    .keyBy(...)                       // keyedStream上使用window
    .window(...)                      // 必选: 指定窗口分配器( window assigner)
    [.trigger(...)]                   // 可选: 指定触发器(trigger),如果不指定，则使用默认值
    [.evictor(...)]                   // 可选: 指定清除器(evictor),如果不指定，则没有
    [.allowedLateness(...)]           // 可选: 指定是否延迟处理数据，如果不指定，默认使用0 
    [.sideOutputLateData(...)]        // 可选: 配置side output，如果不指定，则没有
    .reduce/aggregate/fold/apply()    // 必选: 指定窗口计算函数
    [.getSideOutput(...)]             // 可选: 从side output中获取数据

1.20.1.2.2.Non-Keyed Windows

Stream
  .windowAll(...)                // 必选: 指定窗口分配器( window assigner)
  [.trigger(...)]                  // 可选: 指定触发器(trigger),如果不指定，则使用默认值
  [.evictor(...)]                  // 可选: 指定清除器(evictor),如果不指定，则没有
  [.allowedLateness(...)]          // 可选: 指定是否延迟处理数据，如果不指定，默认使用0
  [.sideOutputLateData(...)]       // 可选: 配置side output，如果不指定，则没有
  .reduce/aggregate/fold/apply()   // 必选: 指定窗口计算函数
  [.getSideOutput(...)]            // 可选: 从side output中获取数据

1.20.1.2.3.简写window操作

上面的代码片段中，要在keyedStream上使用window(…)或者在DataStream上使用windowAll(…)，需要传入一个window assigner的参数，关于window assigner下文会进行详细解释。如下面代码片段：

// ---------------------------------------------------------------
// Keyed Windows
// ---------------------------------------------------------------
stream
     .keyBy(id)
     .window(TumblingEventTimeWindows.of(Time.seconds(5)))        // 5S的滚动窗口
     .reduce(MyReduceFunction)


// ---------------------------------------------------------------
// Non-Keyed Windows
// ---------------------------------------------------------------
stream
     .windowAll(TumblingEventTimeWindows.of(Time.seconds(5)))      // 5S的滚动窗口
     .reduce(MyReduceFunction)

上面的代码可以简写为：

// ---------------------------------------------------------------
// Keyed Windows
// --------------------------------------------------------------
stream.
     .keyBy(id)
     .timeWindow(Time.seconds(5))       //5S的滚动窗口
     .reduce(MyReduceFunction)         

// ------------------------------------------------------------------
// Non-Keyed Windows
// -----------------------------------------------------------------
stream
     .timeWindowAll(Time.seconds(5)) // 5S的滚动窗口
     .reduce(MyReduceFunction)

关于上面的简写，以KeyedStream为例，对于看一下具体的KeyedStream源码片段，可以看出底层调用的还是非简写时的代码。关于timeWindowAll()的代码也是一样的，可以参考DataStream源码，这里不再赘述。
//会根据用户的使用的时间类型，调用不同的内置window Assigner

public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
	if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime)  {
	     return window(TumblingProcessingTimeWindows.of(size));
	} else {
	     return window(TumblingEventTimeWindows.of(size));
	}
}

1.20.2.Window Assigners

1.20.2.1.分类

WindowAssigner负责将输入的数据分配到一个或多个窗口，Flink内置了许多WindowAssigner，这些WindowAssigner可以满足大部分的使用场景。比如tumbling windows, sliding windows, session windows , global windows。如果这些内置的WindowAssigner不能满足你的需求，可以通过继承WindowAssigner类实现自定义的WindowAssigner。

上面的WindowAssigner是基于时间的(time-based windows)，除此之外，Flink还提供了基于数量的窗口(count-based windows),即根据窗口的元素数量定义窗口大小，这种情况下，如果数据存在乱序，将导致窗口计算结果不确定。本文重点介绍基于时间的窗口使用，由于篇幅有限，关于基于数量的窗口将不做讨论。
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第2张图片$

1.20.2.2.使用介绍

下面将会对Flink内置的四种基于时间的window assigner，进行一一分析。

1.20.2.2.1.Tumbling Windows

图解
Tumbling Windows(滚动窗口)是将数据分配到确定的窗口中，根据固定时间或大小进行切分，每个窗口有固定的大小且窗口之间不存在重叠(如下图所示)。这种比较简单，适用于按照周期统计某一指标的场景。

关于时间的选择，可以使用Event Time或者Processing Time，分别对应的window assigner为：TumblingEventTimeWindows、TumblingProcessingTimeWindows。用户可以使用window assigner的of(size)方法指定时间间隔，其中时间单位可以是Time.milliseconds(x)、Time.seconds(x)或Time.minutes(x)等。
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第3张图片$

使用

// 使用EventTime
datastream
    .keyBy(id)
    .window(TumblingEventTimeWindows.of(Time.seconds(10)))
    .process(new MyProcessFunction())
// 使用processing-time
datastream
    .keyBy(id)
    .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
    .process(new MyProcessFunction())

1.20.2.2.2.Sliding Windows

图解
Sliding Windows(滑动窗口)在滚动窗口之上加了一个滑动窗口的时间，这种类型的窗口是会存在窗口重叠的(如下图所示)。滚动窗口是按照窗口固定的时间大小向前滚动，而滑动窗口是根据设定的滑动时间向前滑动。窗口之间的重叠部分的大小取决于窗口大小与滑动的时间大小，当滑动时间小于窗口时间大小时便会出现重叠。当滑动时间大于窗口时间大小时，会出现窗口不连续的情况，导致数据可能不属于任何一个窗口。当两者相等时，其功能就和滚动窗口相同了。滑动窗口的使用场景是：用户根据设定的统计周期来计算指定窗口时间大小的指标，比如每隔5分钟输出最近一小时内点击量最多的前 N 个商品。

关于时间的选择，可以使用Event Time或者Processing Time，分别对应的window assigner为：SlidingEventTimeWindows、SlidingProcessingTimeWindows。用户可以使用window assigner的of(size)方法指定时间间隔，其中时间单位可以是Time.milliseconds(x)、Time.seconds(x)或Time.minutes(x)等。
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第4张图片$

使用

// 使用EventTime
datastream
   .keyBy(id)
   .window(SlidingEventTimeWindows.of(Time.seconds(10), Time.seconds(5)))
   .process(new MyProcessFunction())

// 使用processing-time
datastream
   .keyBy(id)
   .window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
   .process(new MyProcessFunction())

1.20.2.2.3.Session Windows

图解
Session Windows(会话窗口)主要是将某段时间内活跃度较高的数据聚合成一个窗口进行计算，窗口的触发的条件是Session Gap，是指在规定的时间内如果没有数据活跃接入，则认为窗口结束，然后触发窗口计算结果。需要注意的是如果数据一直不间断地进入窗口，也会导致窗口始终不触发的情况。与滑动窗口、滚动窗口不同的是，Session Windows不需要有固定窗口大小(window size)和滑动时间(slide time)，只需要定义session gap，来规定不活跃数据的时间上限即可。如下图所示。Session Windows窗口类型比较适合非连续型数据处理或周期性产生数据的场景，根据用户在线上某段时间内的活跃度对用户行为数据进行统计。

关于时间的选择，可以使用Event Time或者Processing Time，分别对应的window assigner为：EventTimeSessionWindows和ProcessTimeSessionWindows。用户可以使用window assigner的withGap()方法指定时间间隔，其中时间单位可以是Time.milliseconds(x)、Time.seconds(x)或Time.minutes(x)等。
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第5张图片$

使用

// 使用EventTime
datastream
    .keyBy(id)
    .window((EventTimeSessionWindows.withGap(Time.minutes(15)))
    .process(new MyProcessFunction())

// 使用processing-time
datastream
   .keyBy(id)
   .window(ProcessingTimeSessionWindows.withGap(Time.minutes(15)))
   .process(new MyProcessFunction())

注意：由于session window的开始时间与结束时间取决于接收的数据。windowassigner不会立即分配所有的元素到正确的窗口，SessionWindow会为每个接收的元素初始化一个以该元素的时间戳为开始时间的窗口，使用session gap作为窗口大小，然后再合并重叠部分的窗口。所以， session window 操作需要指定用于合并的 Trigger 和 Window Function，比如ReduceFunction, AggregateFunction, or ProcessWindowFunction。

1.20.2.2.4.Global Windows

图解
Global Windows(全局窗口)将所有相同的key的数据分配到单个窗口中计算结果，窗口没有起始和结束时间，窗口需要借助于Triger来触发计算，如果不对Global Windows指定Triger，窗口是不会触发计算的。因此，使用Global Windows需要非常慎重，用户需要非常明确自己在整个窗口中统计出的结果是什么，并指定对应的触发器，同时还需要有指定相应的数据清理机制，否则数据将一直留在内存中。
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第6张图片$

使用

datastream
    .keyBy(id)
    .window(GlobalWindows.create())
    .process(new MyProcessFunction())

1.20.3.Window Functions

1.20.3.1.分类

Flink提供了两大类窗口函数，分别为增量聚合函数和全量窗口函数。其中增量聚合函数的性能要比全量窗口函数高，因为增量聚合窗口是基于中间结果状态计算最终结果的，即窗口中只维护一个中间结果状态，不要缓存所有的窗口数据。相反，对于全量窗口函数而言，需要对所以进入该窗口的数据进行缓存，等到窗口触发时才会遍历窗口内所有数据，进行结果计算。如果窗口数据量比较大或者窗口时间较长，就会耗费很多的资源缓存数据，从而导致性能下降。

增量聚合函数
包括：ReduceFunction、AggregateFunction和FoldFunction

全量窗口函数
包括：ProcessWindowFunction

1.20.3.2.使用介绍

1.20.3.2.1.ReduceFunction

输入两个相同类型的数据元素按照指定的计算方法进行聚合，然后输出类型相同的一个结果元素。要求输入元素的数据类型与输出元素的数据类型必须一致。实现的效果是使用上一次的结果值与当前值进行聚合。具体使用案例如下：

package com.toto.demo.test;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;


public class ReduceFunctionExample {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        // 模拟数据
        SingleOutputStreamOperator<Tuple3<Long, Integer, Long>> input = env.fromElements(
                Tuple3.of(1L, 10, 1588491228L),
                Tuple3.of(1L, 15, 1588491229L),
                Tuple3.of(1L, 20, 1588491238L),
                Tuple3.of(1L, 25, 1588491248L),
                Tuple3.of(2L, 10, 1588491258L),
                Tuple3.of(2L, 30, 1588491268L),
                Tuple3.of(2L, 20, 1588491278L))
                .assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple3<Long, Integer, Long>>() {
                    @Override
                    public long extractAscendingTimestamp(Tuple3<Long, Integer, Long> element) {
                        return element.f2 * 1000;
                    }
                });

        input.map(new MapFunction<Tuple3<Long,Integer,Long>, Tuple2<Long,Integer>>() {
            @Override
            public Tuple2<Long, Integer> map(Tuple3<Long, Integer, Long> value) throws Exception {
                // 根据第一个元素分组，求第二个元素的累计和
                return Tuple2.of(value.f0, value.f1);
            }
        })
        .keyBy(0)
        .window(TumblingEventTimeWindows.of(Time.seconds(10)))
        .reduce(new ReduceFunction<Tuple2<Long, Integer>>() {
            @Override
            public Tuple2<Long, Integer> reduce(Tuple2<Long, Integer> value1, Tuple2<Long, Integer> value2) throws Exception {
                // 根据第一个元素分组，求第二个元素的累计和
                return Tuple2.of(value1.f0, value1.f1 + value2.f1);
            }
        }).print();

        env.execute("ReduceFunctionExample");
    }

}

1.20.3.2.2.AggregateFunction

与ReduceFunction相似，AggregateFunction也是基于中间状态计算结果的增量计算函数，相比ReduceFunction，AggregateFunction在窗口计算上更加灵活，但是实现稍微复杂，需要实现AggregateFunction接口，重写四个方法。其最大的优势就是中间结果的数据类型和最终的结果类型不依赖于输入的数据类型。关于AggregateFunction的源码，如下所示：

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.api.common.functions;

import org.apache.flink.annotation.PublicEvolving;

import java.io.Serializable;

/**
 * Example: Average and Weighted Average
 *
 * {@code
 * // the accumulator, which holds the state of the in-flight aggregate
 * public class AverageAccumulator {
 *     long count;
 *     long sum;
 * }
 *
 * // implementation of an aggregation function for an 'average'
 * public class Average implements AggregateFunction {
 *
 *     public AverageAccumulator createAccumulator() {
 *         return new AverageAccumulator();
 *     }
 *
 *     public AverageAccumulator merge(AverageAccumulator a, AverageAccumulator b) {
 *         a.count += b.count;
 *         a.sum += b.sum;
 *         return a;
 *     }
 *
 *     public AverageAccumulator add(Integer value, AverageAccumulator acc) {
 *         acc.sum += value;
 *         acc.count++;
 *         return acc;
 *     }
 *
 *     public Double getResult(AverageAccumulator acc) {
 *         return acc.sum / (double) acc.count;
 *     }
 * }
 *
 * // implementation of a weighted average
 * // this reuses the same accumulator type as the aggregate function for 'average'
 * public class WeightedAverage implements AggregateFunction {
 *
 *     public AverageAccumulator createAccumulator() {
 *         return new AverageAccumulator();
 *     }
 *
 *     public AverageAccumulator merge(AverageAccumulator a, AverageAccumulator b) {
 *         a.count += b.count;
 *         a.sum += b.sum;
 *         return a;
 *     }
 *
 *     public AverageAccumulator add(Datum value, AverageAccumulator acc) {
 *         acc.count += value.getWeight();
 *         acc.sum += value.getValue();
 *         return acc;
 *     }
 *
 *     public Double getResult(AverageAccumulator acc) {
 *         return acc.sum / (double) acc.count;
 *     }
 * }
 * }
 *
 * @param   The type of the values that are aggregated (input values)
 * @param  The type of the accumulator (intermediate aggregate state).
 * @param  The type of the aggregated result
 */
@PublicEvolving
public interface AggregateFunction<IN, ACC, OUT> extends Function, Serializable {

   /**
* 创建一个新的累加器
    * Creates a new accumulator, starting a new aggregate.
    *
    * The new accumulator is typically meaningless unless a value is added
    * via {@link #add(Object, Object)}.
    *
    * 
The accumulator is the state of a running aggregation. When a program has multiple
    * aggregates in progress (such as per key and window), the state (per key and window)
    * is the size of the accumulator.
    *
    * @return A new accumulator, corresponding to an empty aggregate.
    */
   ACC createAccumulator();

   /**
* 将新的数据与累加器进行聚合，返回一个新的累加器
    * Adds the given input value to the given accumulator, returning the
    * new accumulator value.
    *
    * 
For efficiency, the input accumulator may be modified and returned.
    *
    * @param value The value to add
    * @param accumulator The accumulator to add the value to
    *
    * @return The accumulator with the updated state
    */
   ACC add(IN value, ACC accumulator);

   /**
* 从累加器中计算最终结果并返回
    * Gets the result of the aggregation from the accumulator.
    *
    * @param accumulator The accumulator of the aggregation
    * @return The final aggregation result.
    */
   OUT getResult(ACC accumulator);

   /**
* 合并两个累加器并返回结果
    * Merges two accumulators, returning an accumulator with the merged state.
    *
    * This function may reuse any of the given accumulators as the target for the merge
    * and return that. The assumption is that the given accumulators will not be used any
    * more after having been passed to this function.
    *
    * @param a An accumulator to merge
    * @param b Another accumulator to merge
    *
    * @return The accumulator with the merged state
    */
   ACC merge(ACC a, ACC b);
}

具体使用代码案例：

package com.toto.demo.test;

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.api.common.functions.AggregateFunction;

public class AggregateFunctionExample {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment
                .getExecutionEnvironment().setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        // 模拟数据源
        SingleOutputStreamOperator<Tuple3<Long,Integer,Long>> input = env.fromElements(
                Tuple3.of(1L, 11, 1588491228L),
                Tuple3.of(1L, 15, 1588491229L),
                Tuple3.of(1L, 20, 1588491238L),
                Tuple3.of(1L, 25, 1588491248L),
                Tuple3.of(2L, 10, 1588491258L),
                Tuple3.of(2L, 30, 1588491257L),
                Tuple3.of(2L, 20, 1588491278L)
        ).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple3<Long, Integer, Long>>() {
            @Override
            public long extractAscendingTimestamp(Tuple3<Long, Integer, Long> element) {
                return element.f2 * 1000;
            }
        });

        input.keyBy(0)
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                .aggregate(new MyAggregateFunction()).print();
        env.execute("AggregateFunctionExample");

        /** 输出结果为：
         * (1,26)     为第一条和第二条数据得到的结果（窗口范围内得到的结果）
         * (1,20)     第三条数据的结果
         * (1,25)     第四条数据的结果
         * (2,40)     第五条和第六条结果 （在一个窗口内的）
         * (2,20)     第七条
         */
    }

    private static class MyAggregateFunction implements AggregateFunction<Tuple3<Long, Integer, Long>,Tuple2<Long,Integer>,Tuple2<Long,Integer>> {

        /**
         * 创建一个累加器,初始化值
         * @return
         */
        @Override
        public Tuple2<Long, Integer> createAccumulator() {
            return Tuple2.of(0L, 0);
        }

        /**
         * @param value 输入的元素值
         * @param accumulator 中间结果值
         * @return
         */
        @Override
        public Tuple2<Long, Integer> add(Tuple3<Long, Integer, Long> value, Tuple2<Long, Integer> accumulator) {
            System.out.println("-----------------");
            System.out.println(value.f2);
            System.out.println("-----------------");
            return Tuple2.of(value.f0,value.f1 + accumulator.f1);
        }

        /**
         * 获取计算结果值
         * @param accumulator
         * @return
         */
        @Override
        public Tuple2<Long, Integer> getResult(Tuple2<Long, Integer> accumulator) {
            return Tuple2.of(accumulator.f0, accumulator.f1);
        }

        /**
         * 合并中间结果值
         * @param a 中间结果值a
         * @param b 中间结果值b
         * @return
         */
        @Override
        public Tuple2<Long, Integer> merge(Tuple2<Long, Integer> a, Tuple2<Long, Integer> b) {
            return Tuple2.of(a.f0, a.f1 + b.f1);
        }
    }
}

1.20.3.2.3.FoldFunction

FoldFunction定义了如何将窗口中的输入元素与外部的元素合并的逻辑,该接口已标记过时，建议用户使用AggregateFunction来替换使用FoldFunction。

以下案例在flink-1.12版本中有api没了。

public class FoldFunctionExample {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        // 模拟数据源
        SingleOutputStreamOperator<Tuple3<Long, Integer, Long>> input = env.fromElements(
                Tuple3.of(1L, 10, 1588491228L),
                Tuple3.of(1L, 15, 1588491229L),
                Tuple3.of(1L, 20, 1588491238L),
                Tuple3.of(1L, 25, 1588491248L),
                Tuple3.of(2L, 10, 1588491258L),
                Tuple3.of(2L, 30, 1588491268L),
                Tuple3.of(2L, 20, 1588491278L)).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple3<Long, Integer, Long>>() {
            @Override
            public long extractAscendingTimestamp(Tuple3<Long, Integer, Long> element) {
                return element.f2 * 1000;
            }
        });

        input.keyBy(0)
             .window(TumblingEventTimeWindows.of(Time.seconds(10)))
             .fold("用户",new FoldFunction<Tuple3<Long, Integer, Long>,String>() {
                 @Override
                 public String fold(String accumulator, Tuple3<Long, Integer, Long> value) throws Exception {
                    // 为第一个元素的值拼接一个"用户"字符串,进行输出
                     return accumulator + value.f0 ;
                 }
             }).print();

        env.execute("FoldFunctionExample");

    }
}

1.20.3.2.4.ProcessWindowFunction

前面提到的ReduceFunction和AggregateFunction都是基于中间状态实现增量计算的窗口函数。有些时候需要使用整个窗口的所有数据进行计算，比如求中位数和众数。另外，ProcessWindowFunction的Context对象可以访问窗口的一些元数据信息，比如窗口结束时间、水位线等。

ProcessWindowsFunction能够更加灵活地支持基于窗口全部数据元素的结果计算。

在系统内部，由ProcessWindowFunction处理的窗口会将所有已分配的数据存储到ListState中，通过将数据收集起来且提供对于窗口的元数据及其他一些特性的访问和使用，应用场景比ReduceFunction和AggregateFunction更加广泛。关于ProcessWindowFunction抽象类的源码，如下所示：

package org.apache.flink.streaming.api.functions.windowing;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.functions.AbstractRichFunction;
import org.apache.flink.api.common.state.KeyedStateStore;
import org.apache.flink.streaming.api.windowing.windows.Window;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;

/**
 * Base abstract class for functions that are evaluated over keyed (grouped) windows using a context
 * for retrieving extra information.
 *
 * @param   :输入的数据类型
 * @param  :输出的数据类型
 * @param  :key的数据类型
 * @param     :window的类型
 */
@PublicEvolving
public abstract class ProcessWindowFunction<IN, OUT, KEY, W extends Window> extends AbstractRichFunction {

   private static final long serialVersionUID = 1L;

   /**
    * 计算窗口数据，输出0个或多个元素
    *
    * @param key  窗口的的key
    * @param context 窗口的上下文
    * @param elements 窗口内的所有元素
    * @param out 输出元素的collector对象
    *
    * @throws Exception The function may throw exceptions to fail the program and trigger recovery.
    */
   public abstract void process(KEY key, Context context, Iterable<IN> elements, Collector<OUT> out) throws Exception;

   /**
* 当窗口被销毁时，删除状态
    * Deletes any state in the {@code Context} when the Window expires
    * (the watermark passes its {@code maxTimestamp} + {@code allowedLateness}).
    *
    * @param context The context to which the window is being evaluated
    * @throws Exception The function may throw exceptions to fail the program and trigger recovery.
    */
   public void clear(Context context) throws Exception {}

   /**
* context可以访问窗口的元数据信息
    * The context holding window metadata.
    */
   public abstract class Context implements java.io.Serializable {
      
/**
       *  返回当前被计算的窗口
       * Returns the window that is being evaluated.
       */
      public abstract W window();

      /** Returns the current processing time. */
      public abstract long currentProcessingTime();

      /** Returns the current event-time watermark. */
      public abstract long currentWatermark();

      /**
       * State accessor for per-key and per-window state.
       *
       * NOTE:If you use per-window state you have to ensure that you clean it up
       * by implementing {@link ProcessWindowFunction#clear(Context)}.
       */
      public abstract KeyedStateStore windowState();
      /**
       * State accessor for per-key global state.
       */
      public abstract KeyedStateStore globalState();

      /**
       * Emits a record to the side output identified by the {@link OutputTag}.
       *
       * @param outputTag the {@code OutputTag} that identifies the side output to emit to.
       * @param value The record to emit.
       */
      public abstract <X> void output(OutputTag<X> outputTag, X value);
   }
}

具体的使用案例如下：

package com.toto.demo.test;

import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;

public class ProcessWindowFunctionExample {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env =
                StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        //模拟数据源
        SingleOutputStreamOperator<Tuple3<Long,Integer,Long>> input = env.fromElements(
                Tuple3.of(1L, 10, 1588491228L),
                Tuple3.of(1L, 15, 1588491229L),
                Tuple3.of(1L, 20, 1588491238L),
                Tuple3.of(1L, 25, 1588491248L),
                Tuple3.of(2L, 10, 1588491258L),
                Tuple3.of(2L, 30, 1588491268L),
                Tuple3.of(2L, 20, 1588491278L)
        ).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple3<Long, Integer, Long>>() {
            @Override
            public long extractAscendingTimestamp(Tuple3<Long, Integer, Long> element) {
                return element.f2 * 1000;
            }
        });

        input.keyBy(t -> t.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                .process(new MyProcessWindowFunction())
                .print();

        env.execute("ProcessWindowFunction demo");
    }

    private static class MyProcessWindowFunction extends ProcessWindowFunction<Tuple3<Long, Integer, Long>,
            Tuple3<Long,String,Integer>, Long, TimeWindow> {

        @Override
        public void process(
                Long aLong,
                Context context,
                Iterable<Tuple3<Long, Integer, Long>> elements,
                Collector<Tuple3<Long, String, Integer>> out) throws Exception {
            int count = 0;
            for (Tuple3<Long, Integer, Long> in : elements) {
                count++;
            }
            //统计每个窗口数据个数，加上窗口输出
            out.collect(Tuple3.of(aLong, "" + context.window(), count));
        }
    }

}

//输出结果：
(1,TimeWindow{start=1588491220000, end=1588491230000},2)
(1,TimeWindow{start=1588491230000, end=1588491240000},1)
(1,TimeWindow{start=1588491240000, end=1588491250000},1)
(2,TimeWindow{start=1588491250000, end=1588491260000},1)
(2,TimeWindow{start=1588491260000, end=1588491270000},1)
(2,TimeWindow{start=1588491270000, end=1588491280000},1)

1.20.3.2.5.增量聚合函数和ProcessWindowFunction整合

ProcessWindowFunction提供了很强大的功能，但是唯一的缺点就是需要更大的状态存储数据。在很多时候，增量聚合的使用是非常频繁的，那么如何实现既支持增量聚合又支持访问窗口元数据的操作呢？可以将ReduceFunction和AggregateFunction与ProcessWindowFunction整合在一起使用。通过这种组合方式，分配给窗口的元素会立即被执行计算，当窗口触发时，会把聚合的结果传给ProcessWindowFunction，这样ProcessWindowFunction的process方法的Iterable参数被就只有一个值，即增量聚合的结果。

ReduceFunction与ProcessWindowFunction组合

package com.toto.demo.test;

import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;

public class ReduceProcessWindowFunction {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        // 模拟数据源
        SingleOutputStreamOperator<Tuple3<Long, Integer, Long>> input = env.fromElements(
                Tuple3.of(1L, 10, 1588491228L),
                Tuple3.of(1L, 15, 1588491229L),
                Tuple3.of(1L, 20, 1588491238L),
                Tuple3.of(1L, 25, 1588491248L),
                Tuple3.of(2L, 10, 1588491258L),
                Tuple3.of(2L, 30, 1588491268L),
                Tuple3.of(2L, 20, 1588491278L)
        ).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple3<Long, Integer, Long>>() {
            @Override
            public long extractAscendingTimestamp(Tuple3<Long, Integer, Long> element) {
                return element.f2 * 1000;
            }
        });

        input.map(new MapFunction<Tuple3<Long,Integer,Long>, Tuple2<Long, Integer>>() {
            @Override
            public Tuple2<Long, Integer> map(Tuple3<Long, Integer, Long> value) throws Exception {
                return Tuple2.of(value.f0, value.f1);
            }
        }).keyBy(t -> t.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                .reduce(new MyReduceFunction(),new MyProcessWindowFunction())
                .print();

        env.execute("ProcessWindowFunctionExample");
    }

    private static class MyReduceFunction implements ReduceFunction<Tuple2<Long, Integer>> {
        @Override
        public Tuple2<Long, Integer> reduce(Tuple2<Long, Integer> value1, Tuple2<Long, Integer> value2) throws Exception {
            //增量求和
            return Tuple2.of(value1.f0,value1.f1 + value2.f1);
        }
    }

    private static class MyProcessWindowFunction extends ProcessWindowFunction<Tuple2<Long,Integer>,Tuple3<Long,Integer,String>,Long,TimeWindow> {
        @Override
        public void process(Long aLong, Context ctx, Iterable<Tuple2<Long, Integer>> elements, Collector<Tuple3<Long, Integer, String>> out) throws Exception {
            // 将求和之后的结果附带窗口结束时间一起输出
            out.collect(Tuple3.of(aLong,elements.iterator().next().f1,"window_end" + ctx.window().getEnd()));
        }
    }

}

//输出结果：
(1,25,window_end1588491230000)
(1,20,window_end1588491240000)
(1,25,window_end1588491250000)
(2,10,window_end1588491260000)
(2,30,window_end1588491270000)
(2,20,window_end1588491280000)

1.20.3.2.6.AggregateFunction与ProcessWindowFunction组合

package com.toto.demo.test;

import org.apache.flink.api.common.functions.AggregateFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AscendingTimestampExtractor;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;

public class AggregateProcessWindowFunction {

    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment().setParallelism(1);
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        // 模拟数据源
        SingleOutputStreamOperator<Tuple3<Long, Integer, Long>> input = env.fromElements(
                Tuple3.of(1L, 10, 1588491228L),
                Tuple3.of(1L, 15, 1588491229L),
                Tuple3.of(1L, 20, 1588491238L),
                Tuple3.of(1L, 25, 1588491248L),
                Tuple3.of(2L, 10, 1588491258L),
                Tuple3.of(2L, 30, 1588491268L),
                Tuple3.of(2L, 20, 1588491278L)
        ).assignTimestampsAndWatermarks(new AscendingTimestampExtractor<Tuple3<Long, Integer, Long>>() {
            @Override
            public long extractAscendingTimestamp(Tuple3<Long, Integer, Long> element) {
                return element.f2 * 1000;
            }
        });

        input.keyBy(t -> t.f0)
                .window(TumblingEventTimeWindows.of(Time.seconds(10)))
                .aggregate(new MyAggregateFunction(),new MyProcessWindowFunction())
                .print();

        env.execute("AggregateFunctionExample");
    }

    private static class MyAggregateFunction implements AggregateFunction<Tuple3<Long, Integer, Long>,
            Tuple2<Long, Integer>, Tuple2<Long, Integer>> {

        /**
         * 创建一个累加器，初始值
         * @return
         */
        @Override
        public Tuple2<Long, Integer> createAccumulator() {
            return Tuple2.of(0L, 0);
        }

        /**
         * @param value 输入的元素值
         * @param accumulator 中间结果值
         * @return
         */
        @Override
        public Tuple2<Long, Integer> add(Tuple3<Long, Integer, Long> value, Tuple2<Long, Integer> accumulator) {
            return Tuple2.of(value.f0, value.f1 + accumulator.f1);
        }

        /**
         * 获取计算结果值
         * @param accumulator
         * @return
         */
        @Override
        public Tuple2<Long, Integer> getResult(Tuple2<Long, Integer> accumulator) {
            return Tuple2.of(accumulator.f0, accumulator.f1);
        }

        /**
         * 合并中间结果值
         * @param a 中间结果值a
         * @param b 中间结果值b
         * @return
         */
        @Override
        public Tuple2<Long, Integer> merge(Tuple2<Long, Integer> a, Tuple2<Long, Integer> b) {
            return Tuple2.of(a.f0, a.f1 + b.f1);
        }
    }

    private static class MyProcessWindowFunction extends
        ProcessWindowFunction<Tuple2<Long,Integer>,Tuple3<Long,Integer,String>,Long,TimeWindow> {

        @Override
        public void process(Long aLong, Context context, Iterable<Tuple2<Long, Integer>> elements,
                            Collector<Tuple3<Long, Integer, String>> out) throws Exception {
            //将求和之后的结果附带窗口结束时间一起输出
            out.collect(Tuple3.of(aLong,elements.iterator().next().f1,"window_end " + context.window().getEnd()));
        }
    }

    /**
     * 输出结果：
     * (1,25,window_end 1588491230000)
     * (1,20,window_end 1588491240000)
     * (1,25,window_end 1588491250000)
     * (2,10,window_end 1588491260000)
     * (2,30,window_end 1588491270000)
     * (2,20,window_end 1588491280000)
     */
}

1.20.4.window 生命周期解读

1.20.4.1.生命周期图解

窗口从创建到执行窗口计算再到被清除，需要经过一系列的过程，这个过程就是窗口的生命周期。

首先，当一个元素进入窗口算子之前，会由WindowAssigner分配该元素进入哪个或哪几个窗口，如果窗口不存在，则创建窗口。

其次，数据进入了窗口，这时要看有没有使用增量聚合函数，如果使用了增量聚合函数ReduceFunction或AggregateFunction，新加入窗口的元素会立即触发增量计算，计算的结果作为窗口的内容。如果没有使用增量聚合函数，则会将进入窗口的数据存储到ListState状态中，进一步等待窗口触发时，遍历窗口元素进行聚合计算。

然后，每个元素在进入窗口之后会传递至该窗口的触发器，触发器决定了窗口何时被执行计算及何时需要清除自身和保存的内容。触发器可以根据已分配的元素或注册的计时器来决定某些特定时刻执行窗口计算或清除窗口内容。

最后，触发器成功触发之后的操作取决于使用的窗口函数，如果使用的是增量聚合函数，如ReduceFunction或AggregateFunction，则会直接输出聚合的结果。如果只包含一个全量窗口函数，如ProcessWindowFunction，则会作用窗口的所有元素，执行计算，输出结果。如果组合使用了ReduceFunction和ProcessWindowFunction，即组合使用了增量聚合窗口函数和全量窗口函数，全量窗口函数会作用于增量聚合函数的聚合值，然后再输出最终的结果。

情况1：仅使用增量聚合窗口的函数
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第7张图片$
情况2：仅使用全量窗口函数
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第8张图片$
情况3：组合使用增量聚合窗口函数与全量窗口函数
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第9张图片$

1.20.4.2.分配器(Window Assigners)

WindowAssigner的作用是将输入的元素分配到一个或多个窗口，当WindowAssigner将第一个元素分配到窗口时，就会创建该窗口，所以一个窗口一旦被创建，窗口中必然至少有一个元素。Flink内置了很多WindowAssigners,本文主要讨论基于时间的WindowAssigners，这些分配器都继承了WindowAssigner抽象类。关于常用的分配器，上文已经做了详细解释。下面先来看一下继承关系图：
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第10张图片$

接下来，将会对WindowAssigner抽象类的源码进行分析，具体代码如下：

package org.apache.flink.streaming.api.windowing.assigners;

import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.api.common.typeutils.TypeSerializer;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.triggers.Trigger;
import org.apache.flink.streaming.api.windowing.windows.Window;

import java.io.Serializable;
import java.util.Collection;

/**
* WindowAssigner分配一个元素到0个或多个窗口,在一个窗口算子内部，元素是按照key进行分组的(使用KeyedStream),相同key和window的元素集合称之为一个pane(格子)
*
 * A {@code WindowAssigner} assigns zero or more {@link Window Windows} to an element.
 *
 * In a window operation, elements are grouped by their key (if available) and by the windows to
 * which it was assigned. The set of elements with the same key and window is called a pane.
 * When a {@link Trigger} decides that a certain pane should fire the
 * {@link org.apache.flink.streaming.api.functions.windowing.WindowFunction} is applied
 * to produce output elements for that pane.
 *
 * @param  The type of elements that this WindowAssigner can assign windows to.
 * @param  The type of {@code Window} that this assigner assigns.
 */
@PublicEvolving
public abstract class WindowAssigner<T, W extends Window> implements Serializable {
   private static final long serialVersionUID = 1L;

   /**
* 返回一个向其分配元素的窗口集合
    * Returns a {@code Collection} of windows that should be assigned to the element.
    *
    * @param element The element to which windows should be assigned.
    * @param timestamp The timestamp of the element.
    * @param context The {@link WindowAssignerContext} in which the assigner operates.
    */
   public abstract Collection<W> assignWindows(T element, long timestamp, WindowAssignerContext context);

   /**
* 返回一个与该WindowAssigner相关的默认trigger(触发器)
*
    * Returns the default trigger associated with this {@code WindowAssigner}.
    */
   public abstract Trigger<T, W> getDefaultTrigger(StreamExecutionEnvironment env);

   /**
* 返回一个窗口序列化器
*
    * Returns a {@link TypeSerializer} for serializing windows that are assigned by
    * this {@code WindowAssigner}.
    */
   public abstract TypeSerializer<W> getWindowSerializer(ExecutionConfig executionConfig);

   /**
* 如果元素是基于event time分配到窗口的，则返回true
*
    * Returns {@code true} if elements are assigned to windows based on event time,
    * {@code false} otherwise.
    */
   public abstract boolean isEventTime();

   /**
* 该Context允许访问当前的处理时间processing time
*
    * A context provided to the {@link WindowAssigner} that allows it to query the
    * current processing time.
    *
    * This is provided to the assigner by its containing
    * {@link org.apache.flink.streaming.runtime.operators.windowing.WindowOperator},
    * which, in turn, gets it from the containing
    * {@link org.apache.flink.streaming.runtime.tasks.StreamTask}.
    */
   public abstract static class WindowAssignerContext {

      /**
* 返回当前的处理时间
       * Returns the current processing time.
       */
      public abstract long getCurrentProcessingTime();

   }
}

1.20.4.3.触发器(Triggers)

数据接入窗口后，窗口是否触发WindowFunciton计算，取决于窗口是否满足触发条件。Triggers就是决定窗口何时触发计算并输出结果的条件，Triggers可以根据时间或者具体的数据条件进行触发，比如进入窗口元素的个数或者进入窗口的某些特定的元素值等。前面讨论的内置WindowAssigner都有各自默认的触发器，当使用的是Processing Time时，则当处理时间超过窗口结束时间时会被触发。当使用Event Time时，当水位线超过窗口结束时间时会被触发。

Flink在内部提供很多内置的触发器，常用的主要有EventTimeTrigger、ProcessTimeTrigger以及CountTrigger等。每种触发器都对应于不同的Window Assigner，例如Event Time类型的Windows对应的触发器是EventTimeTrigger，其基本原理是判断当前的Watermark是否超过窗口的EndTime，如果超过则触发对窗口内数据的计算，反之不触发计算。关于上面分析的内置WindowAssigner的默认trigger，可以从各自的源码中看到，具体罗列如下：
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第11张图片$
这些Trigger都继承了Trigger抽象类，具体的继承关系，如下图：
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第12张图片$

关于这些内置的Trigger的具体解释如下：

Trigger	解释
EventTimeTrigger	当前的Watermark是否超过窗口的EndTime，如果超过则触发对窗口内数据的计算，反之不触发计算；
ProcessTimeTrigger	当前的Processing Time是否超过窗口的EndTime，如果超过则触发对窗口内数据的计算，反之不触发计算。
ContinuousEventTimeTrigger	根据间隔时间周期性触发窗口或者Window的结束时间小于当前EventTime，触发窗口计算。
ContinuousProcessingTimeTrigger	根据间隔时间周期性触发窗口或者Window的结束时间小于当前ProcessTime，触发窗口计算；
CountTrigger	根据窗口的数据条数是否超过设定的阈值确定是否触发窗口计算。
DeltaTrigger	根据窗口的数据计算出来的Delta指标是否超过指定的阈值，判断是否触发窗口计算。
PurgingTrigger	可以将任意触发器作为参数转换为Purge类型触发器，计算完成后数据将被清理。

关于抽象类Trigger的源码解释如下：

/**
 * @param  元素的数据类型
 * @param  Window的类型
 */
@PublicEvolving
public abstract class Trigger<T, W extends Window> implements Serializable {

	private static final long serialVersionUID = -4104633972991191369L;
	/**
	 * 每个元素被分配到窗口时都会调用该方法，返回一个TriggerResult枚举
	 * 该枚举包含很多触发的类型：CONTINUE、FIRE_AND_PURGE、FIRE、PURGE
	 *
	 * @param element   进入窗口的元素
	 * @param timestamp 进入窗口元素的时间戳
	 * @param window    窗口
	 * @param ctx       上下文对象，可以注册计时器(timer)回调函数
	 * @return
	 * @throws Exception
	 */
	public abstract TriggerResult onElement(T element, long timestamp, W window, TriggerContext ctx) throws Exception;
	
/**
	 * 当使用TriggerContext注册的processing-time计时器被触发时,会调用该方法
	 *
	 * @param time   触发计时器的时间戳
	 * @param window 计时器触发的window
	 * @param ctx    上下文对象，可以注册计时器(timer)回调函数
	 * @return
	 * @throws Exception
	 */
	public abstract TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception;
	
/**
	 * 当使用TriggerContext注册的event-time计时器被触发时,会调用该方法
	 *
	 * @param time   触发计时器的时间戳
	 * @param window 计时器触发的window
	 * @param ctx    上下文对象，可以注册计时器(timer)回调函数
	 * @return
	 * @throws Exception
	 */
	public abstract TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception;
	
/**
	 * 如果触发器支持合并触发器状态，将返回true
	 *
	 * @return
	 */
	public boolean canMerge() {
		return false;
	}

	/**
	 * 当多个窗口被合并成一个窗口时，会调用该方法
	 *
	 * @param window 合并之后的window
	 * @param ctx    上下文对象，可以注册计时器回调函数，也可以访问状态
	 * @throws Exception
	 */
	public void onMerge(W window, OnMergeContext ctx) throws Exception {
		throw new UnsupportedOperationException("This trigger does not support merging.");
	}
	
/**
	 * 清除所有Trigger持有的窗口状态
	 * 当窗口被销毁时，调用该方法
	 *
	 * @param window
	 * @param ctx
	 * @throws Exception
	 */
	public abstract void clear(W window, TriggerContext ctx) throws Exception;
	
/**
	 * Context对象，传给Trigger的方法参数中，用于注册计时器回调函数和处理状态
	 */
	public interface TriggerContext {
		// 返回当前处理时间
		long getCurrentProcessingTime();
		
MetricGroup getMetricGroup();
		
// 返回当前水位线时间戳
		long getCurrentWatermark();
		
// 注册一个processing-time的计时器
		void registerProcessingTimeTimer(long time);
		
// 注册一个EventTime计时器
		void registerEventTimeTimer(long time);
		
//  删除一个processing-time的计时器
		void deleteProcessingTimeTimer(long time);
		
// 删除一个EventTime计时器
		void deleteEventTimeTimer(long time);
		
/**
		 * 提取状态当前Trigger的窗口和Key的状态
		 */
		<S extends State> S getPartitionedState(StateDescriptor<S, ?> stateDescriptor);

		// 与getPartitionedState功能相同，该方法已被标记过时
		@Deprecated
		<S extends Serializable> ValueState<S> getKeyValueState(String name, Class<S> stateType, S defaultState);
		
// 同getPartitionedState功能，该方法已被标记过时
		@Deprecated
		<S extends Serializable> ValueState<S> getKeyValueState(String name, TypeInformation<S> stateType, S defaultState);
	}
	// TriggerContext的扩展
	public interface OnMergeContext extends TriggerContext {
		// 合并每个window的状态，状态必须支持合并
		<S extends MergingState<?, ?>> void mergePartitionedState(StateDescriptor<S, ?> stateDescriptor);
	}
}

上面的源码可以看出,每当触发器调用时，会产生一个TriggerResult对象，该对象是一个枚举类，其包括的属性决定了作用在窗口上的操作是什么。总共有四种行为：CONTINUE、FIRE_AND_PURGE、FIRE、PURGE，关于每种类型的具体含义，我们先看一下TriggerResult源码：

/**
 * 触发器方法的结果类型，决定在窗口上执行什么操作，比如是否调用window function
 * 或者是否需要销毁窗口
 * 注意：如果一个Trigger返回的是FIRE或者FIRE_AND_PURGE，但是窗口中没有任何元素，则窗口函数不会被调用
 */
public enum TriggerResult {

	// 什么都不做，当前不触发计算，继续等待
	CONTINUE(false, false),

	// 执行 window function，输出结果，之后清除所有状态
	FIRE_AND_PURGE(true, true),

	// 执行 window function，输出结果，窗口不会被清除，数据继续保留
	FIRE(true, false),
    
	// 清除窗口内部数据，但不触发计算
	PURGE(false, true);
	
}

1.20.4.4.清除器(Evictors)

Evictors是一个可选的组件，其主要作用是对进入WindowFuction前后的数据进行清除处理。Flink内置了三种Evictors：分别为CountEvictor、DeltaEvictor、TimeEvitor。如果用户不指定Evictors，也不会有默认值。

CountEvictor：保持在窗口中具有固定数量的元素，将超过指定窗口元素数量的数据在窗口计算前剔除；
DeltaEvictor：通过定义DeltaFunction和指定threshold，并计算Windows中的元素与最新元素之间的Delta大小，如果超过threshold则将当前数据元素剔除；
TimeEvictor：通过指定时间间隔，将当前窗口中最新元素的时间减去Interval，然后将小于该结果的数据全部剔除，其本质是将具有最新时间的数据选择出来，删除过时的数据。

Evictors继承关系图如下：
$1.20_Flink的Window全面解析\Keyed Windows\Window Assigners\Tumbling,Sliding,Session,Global,Window Function_第13张图片$
关于Evictors接口的源码，如下：

/**
 * 在WindowFunction计算之前或者之后进行清除窗口元素
 * @param  元素的数据类型
 * @param  窗口类型
 */
@PublicEvolving
public interface Evictor<T, W extends Window> extends Serializable {
	/**
	 * 选择性剔除元素，在windowing function之前调用
	 * @param elements 窗口中的元素
	 * @param size  窗口中元素个数
	 * @param window 窗口
	 * @param evictorContext
	 */
	void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);
	/**
	 * 选择性剔除元素，在windowing function之后调用
	 * @param elements 窗口中的元素.
	 * @param size 窗口中元素个数.
	 * @param window 窗口
	 * @param evictorContext
	 */
	void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext);
	// 传递给Evictor方法参数的值
	interface EvictorContext {
		// 返回当前processing time
		long getCurrentProcessingTime();
		MetricGroup getMetricGroup();
		// 返回当前的水位线时间戳
		long getCurrentWatermark();
	}
}

你可能感兴趣的:(#,Flink,flink,大数据)

Flink相关面试题努力的搬砖人. 面试 java 后端 flink
以下是150道ApacheFlink面试题及其详细回答，涵盖了Flink的基础知识、核心架构、API使用、性能调优等多个方面，每道题目都尽量详细且简单易懂：Flink基础概念类1.什么是ApacheFlink？ApacheFlink是一个开源的流处理和批处理框架，能够实现快速、可靠、可扩展的大数据处理。它既可以处理无界的数据流，也可以处理有界的数据批，提供了低延迟和高吞吐量的实时数据处理能力。Fl
2017安全之势：云、大数据、IoT、人工智能 weixin_34392906 人工智能大数据嵌入式
“新技术让信息系统变成了孙悟空，开始无所不能，但安全仍是它的‘紧箍咒’！怎样解开这个‘紧箍咒’？各路安全厂商各显其能，但似乎路漫漫兮离目标还很遥远。”三未信安董事长张岳公在ZD至顶网《百位意见领袖寄语2017》中说出了这样一句话，我觉着很有道理。安全是一个永恒的话题，如果说它与新的信息技术相生相克也不过分。即便如此，我们更要尽可能的减少安全带来的束缚。2017已经到来，不妨来看看至顶网与业界大咖总
直方图梯度提升：大数据时代的极速决策引擎万事可爱^ 大数据机器学习深度学习直方图梯度提升 GBDT 算法
一、为什么需要直方图梯度提升？在Kaggle竞赛的冠军解决方案中，超过70%的获奖方案都使用了梯度提升算法。但当数据量突破百万级时，传统梯度提升树（GBDT）面临三大致命瓶颈：训练耗时剧增：每个特征的分割点计算都需要全量数据排序内存消耗爆炸：存储排序后的特征值需要额外空间处理效率低下：无法有效利用现代CPU的多核特性而梯度提升决策树（GBDT）作为集成学习的代表算法，通过迭代构建决策树实现预测能力
从原理到实践：Go 语言内存优化策略深度解析叶间清风1998 服务器 linux 网络
目录一、引言二、Go语言内存管理基础原理2.1栈与堆内存分配2.2垃圾回收机制剖析三、内存优化策略与实践3.1合理使用指针传递3.2避免不必要的内存分配3.3优化切片与映射的使用3.4控制变量作用域3.5减少闭包导致的变量逃逸四、内存优化工具与性能分析4.1pprof工具的使用4.2其他性能分析辅助手段五、不同场景下的内存优化案例分析5.1高并发Web服务场景5.2大数据处理与分析场景六、总结与展
硅谷企业的大数据平台架构什么样？看看Twitter、Airbnb、Uber的实践大数据v 分布式数据库大数据编程语言 hadoop
导读：本文分析一下典型硅谷互联网企业的大数据平台架构。作者：彭锋宋文欣孙浩峰来源：大数据DT（ID：hzdashuju）01Twitter的大数据平台架构Twitter是最早一批推进数字化运营的硅谷企业之一，其公司运营和产品迭代的很多功能是由其底层的大数据平台提供的。图7-2所示为Twitter大数据平台的基本示意图。▲图7-2Twitter大数据平台架构Twitter的大数据平台开发比较早，很多
【图像预处理】瞬间记忆深度学习 python
(4条消息)图像预处理方法总结_AI强仔的博客-CSDN博客对图像进行预处理的一些常见方法包括：调整图像大小和分辨率，以便适应模型的输入要求。对图像进行裁剪或填充，以使其大小和比例符合要求。调整图像的亮度、对比度和饱和度等图像属性。进行图像平滑或锐化操作，以去除噪声或增强图像特征。进行图像归一化或标准化，以确保各个特征在相同的尺度上。应用数据增强技术，如旋转、平移、缩放、翻转等，以扩大数据集，提高
大数据学习（75）-大数据组件总结 viperrrrrrr 大数据 impala yarn hdfs hive CDH mapreduce
大数据学习系列专栏：哲学语录:用力所能及，改变世界。如果觉得博主的文章还不错的话，请点赞+收藏⭐️+留言支持一下博主哦一、CDHCDH（ClouderaDistributionIncludingApacheHadoop)是由Cloudera公司提供的一个集成了ApacheHadoop以及相关生态系统的发行版本。CDH是一个大数据平台，简化和加速了大数据处理分析的部署和管理。CDH提供Hadoop的
大数据点燃智能制造变革之火——从数据到价值的跃迁 Echo_Wish 大数据高阶实战秘籍大数据制造
大数据点燃智能制造变革之火——从数据到价值的跃迁在全球制造业向智能化转型的浪潮中，大数据已然成为点燃变革的关键火种。从车间到供应链，从设备到产品生命周期，制造业正通过大数据分析找到隐形的效率优化机会，打破传统生产模式的桎梏。作为Echo_Wish，今天我将和大家探讨大数据如何融入智能制造，助力实现生产效率和业务价值的双重飞跃。一、智能制造的核心诉求：数据驱动的决策与执行智能制造的目标是通过数据驱动
Sqoop安装部署愿与狸花过一生大数据 sqoop hadoop hive
ApacheSqoop简介Sqoop（SQL-to-Hadoop）是Apache开源项目，主要用于：将关系型数据库中的数据导入Hadoop分布式文件系统（HDFS）或相关组件（如Hive、HBase）。将Hadoop处理后的数据导出回关系型数据库。核心特性批量数据传输支持从数据库表到HDFS/Hive的全量或增量数据迁移。并行化处理基于MapReduce实现并行导入导出，提升大数据量场景的效率。自
AI预测体彩排3新模型百十个定位预测+胆码预测+杀和尾+杀和值2025年3月21日第25弹 GIS小天体彩排3 人工智能机器学习彩票算法
前面由于工作原因停更了很长时间，停更期间很多彩友一直私信我何时恢复发布每日预测，目前手头上的项目已经基本收尾，接下来恢复发布。当然，也有很多朋友一直咨询3D超级助手开发的进度，在这里统一回复下。由于本人既精通编程+大数据分析，也热衷于彩票研究，所以很多彩友通过一些渠道找到了我。目前，加我的已有不少彩友，分成了3类人群：第一类：平时不懂数据分析，买彩全靠瞎猜乱蒙，这些朋友希望借助我的技术和方法来给他
Zynq PL端IP核之AXI DMA Mazy.v fpga开发嵌入式硬件 arm开发单片机
1.AXIDMA简介Zynq提供了两种DMA，一种是PS中的DMA控制器，通过GP口与PL端连接，另一种是PL中的AXIDMAIP核（软核），通过HP口与PS端连接。Zynq有4个HP接口，每一个HP接口都包含控制和数据FIFO，这些FIFO为大数据量突发传输提供缓冲，让HP接口成为理想的高速数据传输接口。AXIDMAIP内核在AXI4内存映射和AXI4StreamIP接口之间提供高带宽直接储存访
揭秘时空大数据：详细介绍、真实应用场景和数据示例解析陈书予 GIS开发（时空大数据）前端大数据 python 时序数据库
时空大数据(SpatialBigData)是指利用空间环境和时间环境信息，以及数字技术，从多种来源获取的海量、动态的、多维的数据，对空间环境和时间环境进行实时监测，并基于复杂的数据分析和挖掘，获取有价值的信息。时空大数据示例：1）社会网络数据：Twitter、Facebook、Instagram等社交媒体上的海量数据，可以通过时间、空间、主题等来提取有价值的信息。2）遥感图像数据：通过遥感技术从卫
python基于Django的旅游景点数据分析及可视化的设计与实现 7blk7 qq2295116502 python django 数据分析
目录项目介绍技术栈具体实现截图Scrapy爬虫框架关键技术和使用的工具环境等的说明解决的思路开发流程爬虫核心代码展示系统设计论文书写大纲详细视频演示源码获取项目介绍大数据分析是现下比较热门的词汇，通过分析之后可以得到更多深入且有价值的信息。现实的科技手段中，越来越多的应用都会涉及到大数据随着大数据时代的到来，数据挖掘、分析与应用成为多个行业的关键,本课题首先介绍了网络爬虫的基本概念以及技术实现方法
存算一体与存算分离：架构设计的深度解析与实现方案克里斯蒂亚诺罗纳尔多阿维罗大数据数据库
随着数据量的不断增大和对计算能力的需求日益提高，存算一体作为一种新型架构设计理念，在大数据处理、云计算和人工智能等领域正逐步引起广泛关注。在深入探讨存算一体之前，我们需要先了解存储和计算的基本概念，以及存算分离和存算一体之间的区别。什么是存算一体？存算一体，顾名思义，是将数据存储与计算资源紧密结合，形成一个统一的架构。在这种架构下，存储和计算不仅在物理层面上结合，更在架构设计上深度融合。具体来说，
LakeHouse湖仓一体成为下一站灯塔，数仓、数据湖架构即将退出群聊科杰科技大数据数据仓库
摘要：当前的大数据技术应用趋势表明，客户对单一的数据湖和数仓架构并不满意。近年来几乎所有的数据仓库都增加了对Parquet和ORC格式的外部表支持，这使数仓用户可以从相同的SQL引擎查询数据湖表，但它不会使数据湖表更易于管理，也不会消除仓库中数据的ETL复杂性、陈旧性和高级分析挑战。KeenDataLakeHouse（湖仓一体）作为新一代大数据技术架构，将逐渐取代单一数据湖和数仓架构，成为大数据架
Flink命令行启动Job任务平凡的运维之路 linux 程序人生
Flink非交互式运行Job任务Flink命令行启动Job任务具体命令flink参数说明-c,--class-d,--detached后台运行-p,--parallelism并行度[test@xxx~]$flinkrun-d-cclass_nameJob-p3./flink-statics-1.0.jar-zookeeper"10.130.41.51:2181,10.130.41.52:2181,
数据让农业更聪明——用大数据激活田间地头 Echo_Wish 大数据大数据
数据让农业更聪明——用大数据激活田间地头在农业领域，随着人口增长和气候变化的影响，如何提升生产力始终是个关键话题。大数据，这个曾经只属于科技领域的概念，如今已悄然进入田间地头。今天，我以Echo_Wish的视角，和大家聊聊大数据如何赋能农业生产力，帮农民在阳光下也能掌握“科技的钥匙”。认识农业中的大数据什么是农业中的“大数据”？简单来说，就是收集和分析有关土地、气候、作物、病虫害以及市场需求等方面
GraphCube、Spark和深度学习技术赋能快消行业关键运营环节 weixin_30777913 开发语言大数据深度学习人工智能 spark
在快消品（FMCG）行业，需求计划（DemandPlanning）、库存管理（InventoryManagement）和需求供应管理（DemandSupplyManagement）是影响企业整体效率和利润水平的关键运营环节。GraphCube图多维数据集技术、Spark大数据分析处理技术和深度学习技术的结合，为这些环节提供了智能化、动态化和实时化的解决方案，显著提升业务运营效率和企业利润。一、技术
快速启动flink项目 for your wish flink java 大数据
按照这个步骤1分钟内创建完成idea-----File----new---Project------Maven----Createfromarchetype----AddArchetype弹出框：GroupId填org.apache.flinkArtifactId填flink-quickstart-javaVersion填1.14.0选中刚刚添加的Archetype，点Next填写你要创建的这个f
从 0 到 1 构建 Python 分布式爬虫，实现搜索引擎全攻略七七知享 Python python 分布式爬虫搜索引擎算法程序人生网络爬虫
从0到1构建Python分布式爬虫，实现搜索引擎全攻略在大数据与信息爆炸的时代，搜索引擎已然成为人们获取信息的关键入口。你是否好奇，像百度、谷歌这般强大的搜索引擎，背后是如何精准且高效地抓取海量网页数据的？本文将带你一探究竟，以Python为工具，打造属于自己的分布式爬虫，进而搭建一个简易搜索引擎，完整呈现从底层代码编写到系统搭建的全过程。通过本文的实践，我们成功打造了Python分布式爬虫，并以
第三十篇维度建模：从理论到落地的企业级实践随缘而动，随遇而安数据库 sql 数据仓库大数据数据库架构
目录一、维度建模核心理论体系1.1Kimball方法论四大支柱1.2关键概念对比矩阵二、四步建模法全流程解析2.1选择业务过程（以电商为例）2.2声明原子粒度（订单案例）2.3维度设计规范时间维度（含财年逻辑）SCDType2完整实现（Hudi）2.4事实表类型与设计三、企业级建模实战：电商用户分析3.1业务矩阵分析3.2模型实现代码四、高级建模技巧4.1多星型模式关联4.2大数据场景优化五、性能
【Flink】flink启动任务，taskmanager.out 文件增涨非常快九师兄 flink 大数据
1.概述flink启动任务，taskmanager.out文件增涨非常快，这个文件大小怎么限定？测试了很多办法发现都不起作用这个问题可以试试：【Flink】Flink1.11.2onYARN滚动日志配置但是后面我发现不是这个导致的，是slf4j依赖冲突，jar包删除就可以了
IDEA本地启动flink 任务 Direction_Wind intellij-idea flink java
1pom中添加org.apache.flinkflink-clients_${scala.binary.version}${flink.version}org.apache.flinkflink-runtime-web_${scala.binary.version}${flink.version}2下载flink-dist包并3打印日志中搜索localhost可以找到flink的管理页面
Flink启动任务 swg321321 flink 大数据
Flink以本地运行作为解读例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录Flink前言StreamExecutionEnvironmentLocalExecutorMiniClusterStreamGraph二、使用步骤1.引入库2.读入数据总结前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发
计算机专业毕业设计题目推荐（新颖选题）本科计算机人工智能专业相关毕业设计选题大全✅ 会写代码的羊毕设选题课程设计人工智能毕业设计毕设题目毕业设计题目 ai AI编程
文章目录前言最新毕设选题（建议收藏起来）本科计算机人工智能专业相关的毕业设计选题毕设作品推荐前言2025全新毕业设计项目博主介绍：✌全网粉丝10W+,CSDN全栈领域优质创作者，博客之星、掘金/华为云/阿里云等平台优质作者。技术范围：SpringBoot、Vue、SSM、HLMT、Jsp、PHP、Nodejs、Python、爬虫、数据可视化、小程序、大数据、机器学习等设计与开发。主要内容：免费功能
深陷“大数据杀熟”漩涡的飞猪，庄卓然如何力挽狂澜？财经三剑客大数据
在线旅游市场（OTA）的蓬勃发展为消费者带来了诸多便利，然而，在这股数字化浪潮中，飞猪旅行却因其频繁陷入“大数据杀熟”的争议而备受瞩目。这一行为不仅损害了消费者的合法权益，更让飞猪的品牌形象蒙上了一层阴影。近年来，飞猪平台上关于价格乱象的投诉屡禁不止。在黑猫投诉平台上，与“飞猪”相关的投诉累计已超9万条，其中直接以“飞猪杀熟”为关键词的投诉便达数百条。消费者们纷纷反映，在飞猪平台上预订机票、酒店等
API item_get 在电商平台的核心作用以及如何测试 index_all 数据供应商京东api java 大数据开发语言
在电商行业蓬勃发展的今天，跨平台运营已成为众多商家的必然选择。然而，随之而来的数据孤岛问题却成为了制约电商企业进一步发展的瓶颈。为了解决这一问题，电商大数据平台应运而生，而item_getAPI作为获取商品详情的关键接口，在其中扮演着至关重要的角色。本文将深入探讨item_getAPI在跨平台电商数据整合中的应用与优势，为电商企业在数据驱动的道路上提供有力支持。一、跨平台电商数据整合的挑战在跨平台
“四预”驱动数字孪生水利：让智慧治水守护山河安澜 GeoSaaS 实景三维智慧城市人工智能 gis 大数据安全
近年来，从黄河秋汛到海河特大洪水，从珠江流域性洪灾到长江罕见骤旱，极端天气频发让水安全问题备受关注。如何实现“治水于未发”？数字孪生水利以“预报、预警、预演、预案”（四预）为核心，正在掀起一场水利治理的智慧革命。一、数字孪生水利：从物理世界到虚拟镜像的跃迁数字孪生水利并非简单的“数字建模”，而是通过高精度传感器、大数据、人工智能等技术，在虚拟空间构建与物理流域完全映射的“数字分身”，实现水情、工情
数智读书笔记系列021《大数据医疗》：探索医疗行业的智能变革 Allen_Lyb 数智读书笔记大数据健康医疗人工智能 python
一、书籍介绍《大数据医疗》由徐曼、沈江、余海燕合著，由机械工业出版社出版。徐曼是南开大学商学院副教授，在大数据驱动的智能决策研究领域颇有建树，尤其在大数据驱动的医疗与健康决策方面有着深入研究，曾获天津优秀博士论文、教育部博士研究生新人奖。沈江等作者也在相关学术和实践领域有着丰富的经验和深厚的专业知识。这本书系统且深入地探讨了大数据技术在医疗领域的应用与变革，对推动医疗行业的智能化发展具有重要的理论
OpenEuler kinit报错找不到文件的解决办法久违的太阳其他故障处理服务器运维
客户一套华为大数据集群平台,在一台arm平台openEuler服务器上面安装完集群客户端之后,使用kinit认证出现报错Nosuchfileordirectory:最终定位是操作系统/lib64缺少ld包导致,执行下面的命令恢复：ln-sv/lib/ld-linux-aarch64.so.1/lib64/ld-linux-aarch64.so.1
js动画html标签（持续更新中） 843977358 html js 动画 media opacity
1.jQuery 效果 - animate() 方法改变 "div" 元素的高度： $(".btn1").click(function(){ $("#box").animate({height:"300px
springMVC学习笔记 caoyong springMVC
1、搭建开发环境 a>、添加jar文件，在ioc所需jar包的基础上添加spring-web.jar,spring-webmvc.jar b>、在web.xml中配置前端控制器 <servlet> &nbs
POI中设置Excel单元格格式 107x poi style 列宽合并单元格自动换行
引用：http://apps.hi.baidu.com/share/detail/17249059 POI中可能会用到一些需要设置EXCEL单元格格式的操作小结：先获取工作薄对象: HSSFWorkbook wb = new HSSFWorkbook(); HSSFSheet sheet = wb.createSheet(); HSSFCellStyle setBorder = wb.
jquery 获取A href 触发js方法的this参数无效的情况一炮送你回车库 jquery
html如下： <td class=\"bord-r-n bord-l-n c-333\"> <a class=\"table-icon edit\" onclick=\"editTrValues(this);\">修改</a> </td>" j
md5 3213213333332132 MD5
import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; public class MDFive { public static void main(String[] args) { String md5Str = "cq
完全卸载干净Oracle11g sophia天雪 orale数据库卸载干净清理注册表
完全卸载干净Oracle11g A、存在OUI卸载工具的情况下：第一步：停用所有Oracle相关的已启动的服务；第二步：找到OUI卸载工具：在“开始”菜单中找到“oracle_OraDb11g_home”文件夹中 &
apache 的access.log 日志文件太大如何解决 darkranger apache
CustomLog logs/access.log common 此写法导致日志数据一致自增变大。直接注释上面的语法 #CustomLog logs/access.log common 增加： CustomLog "|bin/rotatelogs.exe -l logs/access-%Y-%m-d.log
Hadoop单机模式环境搭建关键步骤 aijuans 分布式
Hadoop环境需要sshd服务一直开启，故，在服务器上需要按照ssh服务，以Ubuntu Linux为例，按照ssh服务如下： sudo apt-get install ssh sudo apt-get install rsync 编辑HADOOP_HOME/conf/hadoop-env.sh文件，将JAVA_HOME设置为Java
PL/SQL DEVELOPER 使用的一些技巧 atongyeye java sql
1 记住密码这是个有争议的功能，因为记住密码会给带来数据安全的问题。但假如是开发用的库，密码甚至可以和用户名相同，每次输入密码实在没什么意义，可以考虑让PLSQL Developer记住密码。位置：Tools菜单－－Preferences－－Oracle－－Logon HIstory－－Store with password 2 特殊Copy 在SQL Window
PHP：在对象上动态添加一个新的方法 bardo 方法动态添加闭包
有关在一个对象上动态添加方法，如果你来自Ruby语言或您熟悉这门语言，你已经知道它是什么...... Ruby提供给你一种方式来获得一个instancied对象，并给这个对象添加一个额外的方法。好！不说Ruby了，让我们来谈谈PHP PHP未提供一个“标准的方式”做这样的事情，这也是没有核心的一部分... 但无论如何，它并没有说我们不能做这样
ThreadLocal与线程安全 bijian1013 java java多线程 threadLocal
首先来看一下线程安全问题产生的两个前提条件： 1.数据共享，多个线程访问同样的数据。 2.共享数据是可变的，多个线程对访问的共享数据作出了修改。实例：定义一个共享数据： public static int a = 0;
Tomcat 架包冲突解决征客丶 tomcat Web
环境： Tomcat 7.0.6 win7 x64 错误表象：【我的冲突的架包是：catalina.jar 与 tomcat-catalina-7.0.61.jar 冲突，不知道其他架包冲突时是不是也报这个错误】严重: End event threw exception java.lang.NoSuchMethodException: org.apache.catalina.dep
【Scala三】分析Spark源代码总结的Scala语法一 bit1129 scala
Scala语法 1. classOf运算符 Scala中的classOf[T]是一个class对象，等价于Java的T.class,比如classOf[TextInputFormat]等价于TextInputFormat.class 2. 方法默认值 defaultMinPartitions就是一个默认值，类似C++的方法默认值
java 线程池管理机制 BlueSkator java线程池管理机制
编辑 Add Tools jdk线程池一、引言第一：降低资源消耗。通过重复利用已创建的线程降低线程创建和销毁造成的消耗。第二：提高响应速度。当任务到达时，任务可以不需要等到线程创建就能立即执行。第三：提高线程的可管理性。线程是稀缺资源，如果无限制的创建，不仅会消耗系统资源，还会降低系统的稳定性，使用线程池可以进行统一的分配，调优和监控。
关于hql中使用本地sql函数的问题（问-答） BreakingBad HQL 存储函数
转自于：http://www.iteye.com/problems/23775 问：我在开发过程中，使用hql进行查询（mysql5）使用到了mysql自带的函数find_in_set()这个函数作为匹配字符串的来讲效率非常好，但是我直接把它写在hql语句里面（from ForumMemberInfo fm,ForumArea fa where find_in_set(fm.userId,f
读《研磨设计模式》-代码笔记-迭代器模式-Iterator bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.Arrays; import java.util.List; /** * Iterator模式提供一种方法顺序访问一个聚合对象中各个元素，而又不暴露该对象内部表示 * * 个人觉得，为了不暴露该
常用SQL chenjunt3 oracle sql C++c C#
--NC建库 CREATE TABLESPACE NNC_DATA01 DATAFILE 'E:\oracle\product\10.2.0\oradata\orcl\nnc_data01.dbf' SIZE 500M AUTOEXTEND ON NEXT 50M EXTENT MANAGEMENT LOCAL UNIFORM SIZE 256K ; CREATE TABLESPA
数学是科学技术的语言 comsci 工作活动领域模型
从小学到大学都在学习数学，从小学开始了解数字的概念和背诵九九表到大学学习复变函数和离散数学，看起来好像掌握了这些数学知识，但是在工作中却很少真正用到这些知识，为什么？最近在研究一种开源软件-CARROT2的源代码的时候，又一次感觉到数学在计算机技术中的不可动摇的基础作用，CARROT2是一种用于自动语言分类（聚类）的工具性软件，用JAVA语言编写，它
Linux系统手动安装rzsz 软件包 daizj linux sz rz
1、下载软件 rzsz-3.34.tar.gz。登录linux，用命令 wget http://freeware.sgi.com/source/rzsz/rzsz-3.48.tar.gz下载。 2、解压 tar zxvf rzsz-3.34.tar.gz 3、安装 cd rzsz-3.34 ; make posix 。注意：这个软件安装与常规的GNU软件不
读源码之:ArrayBlockingQueue dieslrae java
ArrayBlockingQueue是concurrent包提供的一个线程安全的队列,由一个数组来保存队列元素.通过 takeIndex和 putIndex来分别记录出队列和入队列的下标,以保证在出队列时不进行元素移动. //在出队列或者入队列的时候对takeIndex或者putIndex进行累加,如果已经到了数组末尾就又从0开始,保证数
C语言学习九枚举的定义和应用 dcj3sjt126com c
枚举的定义 # include <stdio.h> enum WeekDay { MonDay, TuesDay, WednesDay, ThursDay, FriDay, SaturDay, SunDay }; int main(void) { //int day; //day定义成int类型不合适 enum WeekDay day = Wedne
Vagrant 三种网络配置详解 dcj3sjt126com vagrant
Forwarded port Private network Public network Vagrant 中一共有三种网络配置，下面我们将会详解三种网络配置各自优缺点。端口映射(Forwarded port)，顾名思义是指把宿主计算机的端口映射到虚拟机的某一个端口上，访问宿主计算机端口时，请求实际是被转发到虚拟机上指定端口的。Vagrantfile中设定语法为： c
16.性能优化-完结 frank1234 性能优化
性能调优是一个宏大的工程，需要从宏观架构(比如拆分，冗余，读写分离，集群，缓存等)，软件设计（比如多线程并行化，选择合适的数据结构），数据库设计层面（合理的表设计，汇总表，索引，分区，拆分，冗余等）以及微观（软件的配置，SQL语句的编写，操作系统配置等）根据软件的应用场景做综合的考虑和权衡，并经验实际测试验证才能达到最优。性能水很深，笔者经验尚浅，赶脚也就了解了点皮毛而已，我觉得
Word Search hcx2013 search
Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or ve
Spring4新特性——Web开发的增强 jinnianshilongnian spring spring mvc spring4
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
CentOS安装配置tengine并设置开机启动 liuxingguome centos
yum install gcc-c++ yum install pcre pcre-devel yum install zlib zlib-devel yum install openssl openssl-devel Ubuntu上可以这样安装 sudo aptitude install libdmalloc-dev libcurl4-opens
第14章工具函数（上） onestopweb 函数
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Xelsius 2008 and SAP BW at a glance blueoxygen BO Xelsius
Xelsius提供了丰富多样的数据连接方式，其中为SAP BW专属提供的是BICS。那么Xelsius的各种连接的优缺点比较以及Xelsius是如何直接连接到BEx Query的呢？以下Wiki文章应该提供了全面的概览。 http://wiki.sdn.sap.com/wiki/display/BOBJ/Xcelsius+2008+and+SAP+NetWeaver+BW+Co
oracle表空间相关 tongsh6 oracle
在oracle数据库中，一个用户对应一个表空间，当表空间不足时，可以采用增加表空间的数据文件容量，也可以增加数据文件，方法有如下几种： 1.给表空间增加数据文件 ALTER TABLESPACE "表空间的名字" ADD DATAFILE '表空间的数据文件路径' SIZE 50M; &nb
.Net framework4.0安装失败 yangjuanjava .net windows
上午的.net framework 4.0，各种失败，查了好多答案，各种不靠谱，最后终于找到答案了和Windows Update有关系，给目录名重命名一下再次安装，即安装成功了！下载地址：http://www.microsoft.com/en-us/download/details.aspx?id=17113 方法： 1.运行cmd，输入net stop WuAuServ 2.点击开