Flink——读取Kafka数据处理后存入StarRocks

  • 授权方式登录

  1. 登录华为FusionInsight Manager下载flinkuser用户的授权文件user.keytab、krb5.conf
  2. 准备LoginUtil工具类
  3. 登录代码如下:
//login
String userPrincipal = "flinkuser";
String userKeytabPath = System.getProperty("user.dir") + File.separator
        + "conf"+File.separator+"user.keytab";
String krb5ConfPath = System.getProperty("user.dir") + File.separator
        + "conf"+File.separator+"krb5.conf";
LoginUtil.setJaasFile(userPrincipal, userKeytabPath);

Configuration configuration = new Configuration();
LoginUtil.login(userPrincipal, userKeytabPath, krb5ConfPath, configuration);
  • Flink上下文环境

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);

  • 连接Kafka,读取数据

Properties properties = new Properties();
properties.setProperty("topic", "petertest1");
properties.setProperty("bootstrap.servers", "ip:port,ip:port,ip:port,ip:port,ip:port");

DataStream messageStream = env.addSource(new FlinkKafkaConsumer010<>(properties.getProperty("topic"),
        new SimpleStringSchema(),properties));
  • 处理数据

//对数据做ETL,将数据封装成Bean
SingleOutputStreamOperator dataStreamSource = messageStream.map(new MapFunction() {
  @Override
  public TableData map(String value) throws Exception {
    System.out.println("receive kafka data:" + value);
    int siteId = random.nextInt(10000);
    int cityCode = random.nextInt(1000);
    return new TableData(siteId, cityCode, value, 1);
  }
});
  • 连接StarRocks,写入数据

//将Bean类型的数据sink到starrocks
dataStreamSource.addSink(StarRocksSink.sink(
        // the table structure
        TableSchema.builder()
                .field("siteid", DataTypes.INT())
                .field("citycode", DataTypes.SMALLINT())
                .field("username", DataTypes.VARCHAR(32))
                .field("pv", DataTypes.BIGINT())
                .build(),
        // the sink options
        StarRocksSinkOptions.builder()
                .withProperty("connector", "starrocks")
                .withProperty("jdbc-url", "jdbc:mysql://ip:port?characterEncoding=utf-8&useSSL=false")
                .withProperty("load-url", "ip:port;ip:port;ip:port")
                .withProperty("username", "root")
                .withProperty("password", "")
                .withProperty("table-name", "table1")
                .withProperty("database-name", "example_db")
                //设置列分隔符
                .withProperty("sink.properties.column_separator", "\\x01")
                //设置行分隔符
                .withProperty("sink.properties.row_delimiter", "\\x02")
                //设置sink提交周期,这里设置10s提交一次
                .withProperty("sink.buffer-flush.interval-ms", "10000")
                .build(),
        // set the slots with streamRowData
        (slots, streamRowData) -> {
          slots[0] = streamRowData.getSiteid();
          slots[1] = streamRowData.getCitycode();
          slots[2] = streamRowData.getUsername();
          slots[3] = streamRowData.getPv();
        }
));
  • 执行

env.execute();

你可能感兴趣的:(大数据,kafka,flink,分布式)