flink--demo

1.pom.xml


  org.apache.flink
  flink-java
  1.6.0


  org.apache.flink
  flink-streaming-java_2.11
  1.6.0


  org.apache.flink
  flink-clients_2.11
  1.6.0

Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2.11 and flink-streaming-java_2.11 with flink-streaming-scala_2.11.

2.以文件作为数据源

readTextFile(Stringpath):默认读取TextInputFormat格式,每行作为一个字符串;

readTextFileWithValue(Stringpath):返回StringValues,StringValues作为mutable字符串;

readCsvFile(Stringpath):返回Java POJOS或者tuples;

readFileofPremitives(path, delimiter, class):解析一行数据到指定的class;

readHadoopFile(FileInputFormat, Key, Value, path):读取Hadoop文件,指定路径、文件格式以及key、value class;具体参见下边的图;

readSequenceFile(Key, Value, path):读取SequenceFile格式的文件,同样需指定key、value的class。
val env = ExecutionEnvironment.getExecutionEnvironment

// get input data
val text = env.readTextFile("/path/to/file")

val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
  .map { (_, 1) }
  .groupBy(0)
  .sum(1)

counts.writeAsCsv(outputPath, "\n", " ")

3.以控制台输入为数据源

$ nc -l 9000

abc,sad,as

asd,a

bv

然后提交 Flink 程序

$ ./bin/flink run examples/streaming/SocketWordCount.jar --port 9000

$ bin/flink run examples/streaming/SocketWordCount.jar \
  --hostname slave01 \
  --port 9000
package cn.com.xxx.zzy;

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;

/**
 * Created with IntelliJ IDEA.
 * To change this template use File | Settings | File Templates.
 */
public class SocketWordCount {

    public static void main(String[] args) throws Exception {
        // the port to connect to
        final int port;
        try {
            final ParameterTool params = ParameterTool.fromArgs(args);
            port = params.getInt("port");
        } catch (Exception e) {
            System.err.println("No port specified. Please run 'SocketWordCount --port '");
            return;
        }

        // get the execution environment
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // get input data by connecting to the socket
        DataStream text = env.socketTextStream("localhost", port, "\n");

        // parse the data, group it, window it, and aggregate the counts
        DataStream windowCounts = text
                .flatMap(new FlatMapFunction() {
                    @Override
                    public void flatMap(String value, Collector out) {
                        for (String word : value.split("\\s")) {
                            out.collect(new WordWithCount(word, 1L));
                        }
                    }
                })
                .keyBy("word")
                .timeWindow(Time.seconds(5), Time.seconds(1))
                .reduce(new ReduceFunction() {
                    @Override
                    public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                        return new WordWithCount(a.word, a.count + b.count);
                    }
                });

        // print the results with a single thread, rather than in parallel
        windowCounts.print().setParallelism(1);

        env.execute("Socket WordCount");

    }

    // Data type for words with count
    public static class WordWithCount {

        public String word;
        public long count;

        public WordWithCount() {
        }

        public WordWithCount(String word, long count) {
            this.word = word;
            this.count = count;
        }

        @Override
        public String toString() {
            return word + " : " + count;
        }
    }
}

4. 以java集合为数据源

fromCollection(Collection)

fromCollection(Iterator, Class):也可以读取iterator,其数据本身的类型为指定的class;

fromElements(T):读取sequence对象;

fromParallelCollection(SplittableIterator, Class):读取并行iterator;

generateSequence(from, to):读取一定范围的sequnce对象。
package com.gr.dologic
import java.util

import org.apache.flink.api.java.aggregation.Aggregations
import org.apache.flink.api.scala._
object flink1 {
  def main(args: Array[String]): Unit = {

    val env = ExecutionEnvironment.getExecutionEnvironment
    val list = new util.ArrayList[Int]();
    list.add(1);
    list.add(2);
    list.add(3);
    val stream = env.fromElements(1,2,3,4,3,4,3,3,5).map(arr=>{
      arr
    }).filter(_>=2).map{x=>(x,1)}.groupBy(0).aggregate(Aggregations.SUM,1)//.sum(1)
    stream.print
    // env.execute()
    /*Exception in thread "main" java.lang.RuntimeException: No new data sinks have been defined since the last execution.
    The last execution refers to the latest call to 'execute()', 'count()', 'collect()', or 'print()'.
    参照此文,原因是print()方法自动会调用execute()方法,造成错误,所以注释掉env.execute()即可*/

  }
}

5.以kafka为数据源

val properties = new Properties();

properties.setProperty("bootstrap.servers","localhost:9092");

properties.setProperty("zookeeper.connect","localhost:2181");

properties.setProperty("group.id","test");

val stream = env.addSource(new FlinkKafkaConsumer09[String]("mytopic",new SimpleStringSchema(),properties))//.print

对温度计算平均值
DataStream> keyedStream = env.addSource(new FlinkKafkaConsumer09[String]("mytopic",new SimpleStringSchema(),properties)).flatMap(new Splitter()).keyBy(0)
.timeWindow(Time.second(300))
.apply(new WindowFunction,Tuple2,Tuple,TimeWindow>() {
    public void apply(Tuple key,TimeWindow window,Iterable> input,Collector> out) throws Exception{
        double sum = 0L;
        int count = 0;
        for(Tuple2 record : input) {
            sum += record.f1;   
            count ++;
        }
    Tuple2 result = input.iterator().next();
    result.f1 = (sum/count);
    out.collect(result);
    }

});

要注意容错,设置checkpoint

kafka producer也可以作为sink来用

stream.addSink(new FlinkKafkaProducer09("localhost:9092","mytopic",new SimpleStringSchema()));

6.以关系型数据库为数据源

flink--demo_第1张图片

 

7.Table API

Flink提供了一个table接口来进行批处理和流处理,这个接口叫做Table API。一旦dataset/datastream被注册为table后,就可以引用聚合、join和select等关系型的操作了。

Table同样可以通过标准SQL来操作,操作执行后,需要将table转换为dataSet/datastream。Flink内部中使用开源框架Apache Calcite来优化这些转换操作

org.apache.flink
flink-table_2.11
1.1.4

8.参考学习文档精通Apache Flink读书笔记 1-5

https://blog.csdn.net/lmalds/article/details/60867262

你可能感兴趣的:(flink)