Flink DataStream

一、读取数据

1.1、从内存中读取

DataStreamSource<Integer> ds = env.fromElements(1, 2, 3, 4);
DataStreamSource<Integer> source = env.fromCollection(Arrays.asList(1, 2, 3));

1.2、从文件中读取

从文件中读取需要引入相应的POM依赖

<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-files</artifactId>
            <version>1.17.0</version>
        </dependency>
FileSource<String> fileSource = FileSource.forRecordStreamFormat(new TextLineInputFormat(), new Path("input/word.txt")).build();
        env.fromSource(fileSource, WatermarkStrategy.noWatermarks(), "filesource").print();

1.3、从kafka中读取

从kafka中读取需要引入相应的POM依赖

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-kafka</artifactId>
    <version>1.17.0</version>
</dependency>
KafkaSource<String> dataSource = KafkaSource.<String>builder()
                .setBootstrapServers("hadoop1,hadoop2")
                .setGroupId("消费者组")
                .setTopics("队列")
                .setValueOnlyDeserializer(new SimpleStringSchema())
                .setStartingOffsets(OffsetsInitializer.latest())
                .build();

1.4、使用datagen生成数据

官方提供的数据生成方式

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-connector-datagen</artifactId>
    <version>1.17.0</version>
</dependency>
DataGeneratorSource<String> dataGeneratorSource = new DataGeneratorSource<>(
                // 数据
                aLong -> "Number:" + aLong,
                // 数据条数
                10,
                // 数据生成频率
                RateLimiterStrategy.perSecond(1),
                // 返回的数据类型
                Types.STRING);
        env.fromSource(dataGeneratorSource, WatermarkStrategy.noWatermarks(), "data-generator").print();

你可能感兴趣的:(#,flink,flink,大数据)