Kafka Hopping Time Window

跳跃时间窗口Hopping time windows是基于时间间隔的窗口。它们为固定大小(可能)重叠的窗口建模。跳跃窗口由两个属性定义:窗口的size及其前进间隔advance。前进间隔指定一个窗口相对于前一个窗口向前移动多少。

案例:每隔五秒钟统计当前时间的前三十秒内 value的最大值、最小值和平均值。

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.example.demo.vo.WindowResult;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Suppressed;
import org.apache.kafka.streams.kstream.TimeWindows;
import org.springframework.stereotype.Component;

import java.time.Duration;
import java.time.Instant;
import java.util.Properties;
import java.util.concurrent.TimeUnit;

import static org.apache.kafka.streams.kstream.Suppressed.BufferConfig.unbounded;

@Component
public class HoppingTimeWindowDemo {

    public static void main(String[] args) {
        final static int TIME_WINDOW_SECONDS = 30;
        final static int ADVANCED_BY_SECONDS = 10;

        Properties props = new Properties();
        props.put(StreamsConfig.APPLICATION_ID_CONFIG, "hopping-time-window");
        props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
        props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass());
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
        props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);

        StreamsBuilder builder = new StreamsBuilder();

        KStream<String, String> valueStream = builder.stream("value");

        valueStream.selectKey((key, value) -> {
            String newKey;
            try {
                JSONObject json = JSON.parseObject(value);
                newKey = json.getString("uid");
            } catch (Exception ex) {
                return key;
            }
            return newKey;
        }).groupByKey().windowedBy(TimeWindows.of(Duration.ofSeconds(TIME_WINDOW_SECONDS)).advanceBy(Duration.ofSeconds(ADVANCED_BY_SECONDS)).grace(Duration.ZERO))
                .aggregate(() -> {
                    WindowResult result = new WindowResult(0, 0, 0, 0, 0, 0, null);
                    return JSONObject.toJSONString(result);
                }, (aggKey, newValue, aggValue) -> {
                    // topic中的消息格式为{"uid": 1,"value":19}
                    WindowResult result = JSONObject.parseObject(aggValue, WindowResult.class);
                    Long newValueLong = null;
                    try {
                        JSONObject json = JSON.parseObject(newValue);
                        newValueLong = json.getLong("value");
                    } catch (ClassCastException ex) {

                    }
                    if (result.getMin() == 0 || result.getMin() > newValueLong) {
                        result.setMin(newValueLong);
                    }
                    if (result.getMax() == 0 || result.getMax() < newValueLong) {
                        result.setMax(newValueLong);
                    }
                    result.setUid(Integer.valueOf(aggKey));
                    result.setCount(result.getCount() + 1);
                    result.setSum(result.getSum() + newValueLong);
                    result.setAvg(result.getSum() / result.getCount());
                    return JSONObject.toJSONString(result);
                }, Materialized.with(Serdes.String(), Serdes.String()))
                .suppress(Suppressed.untilWindowCloses(unbounded())).toStream().mapValues((key, value) -> {
                //获取窗口的结束时间,由于时区问题,需要给获取到的时间加上八小时
            Instant end = key.window().endTime().plusMillis(TimeUnit.HOURS.toMillis(8));
            WindowResult result = JSONObject.parseObject(value, WindowResult.class);
            result.setWindowEnd(end.toString());
            return JSONObject.toJSONString(result);
        }).to("window-result");

        final KafkaStreams streams = new KafkaStreams(builder.build(), props);

        streams.start();

    }
}

.grace(Duration.ZERO) : 设立一个数据晚到的期限,这个期限过了之后时间窗口才关闭。这里Duration.ZERO表示当窗口的截止时间到了后立即关闭窗口

suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded())) : 抑制住上游流的输出,直到当前时间窗口关闭后,才向下游发送数据

总结:
1.同一个时刻存在的窗口数为 窗口size 除以 advance.
2.同一个key的最后一条消息总是不会不会被统计到(这个可能是我代码的问题,如果你有解决方法希望你可以告诉我)。

你可能感兴趣的:(Kafka)