Spark-streaming-kafka使用Java版本获取RDD信息

    JavaStreamingContext jsc = new JavaStreamingContext(conf, Seconds.apply(slice));

    JavaPairInputDStream kafkaAction = KafkaUtils
        .createDirectStream(jsc, String.class, String.class,
            StringDecoder.class, StringDecoder.class, kafkaParams, actTopics);

    // Transaction.
    kafkaAction.foreachRDD(new VoidFunction>() {
      @Override
      public void call(JavaPairRDD rdd) throws Exception {
        // rdd.rdd()拿到KafkaRDD
        // JavaPairRDD只是在原始scala的RDD外层做了一层封装,
        // 如果想要拿到原来的RDD,需要调用rdd()方法,获取JavaPairRDD的属性rdd,
        // 这样得到的就是KafkaRDD,而KafkaRDD实现了HasOffsetRanges接口,
        // 可以拿到这个RDD,读取Kafka的offset、partition等信息
        HasOffsetRanges hasOffsetRanges = (HasOffsetRanges) rdd.rdd();
        for (OffsetRange range : hasOffsetRanges.offsetRanges()) {
          System.out.println("## count: " + range.count());
          System.out.println("## fromOffset: " + range.fromOffset());
          System.out.println("## untilOffset: " + range.untilOffset());
          System.out.println("## topic: " + range.topic());
          System.out.println("## partition: " + range.partition());
        }

        rdd.foreachPartitionAsync(new VoidFunction>>() {
          @Override
          public void call(Iterator> tuple2Iterator) throws Exception {
            // Do something.
          }
        });
      }
    });

 

你可能感兴趣的:(Spark,Java,Kafka)