文章标题

SparkStreaming,textFileStream读取HDFS文件,读取不到的问题

原因很简单,textFileStream()这个方法只能读取到新放入的文件,意思是要先启动程序,然后把文件put进去.
以下是官方的api说明
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them as text files (using key as LongWritable, value as Text and input format as TextInputFormat). Files must be written to the monitored directory by “moving” them from another location within the same file system. File names starting with . are ignored.

public class HDFSWordCount {
public static void main(String[] args) throws InterruptedException {
    SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("WordCount");
    JavaStreamingContext javaStreamingContext = new JavaStreamingContext(conf, Durations.seconds(1));
    JavaDStream lines = javaStreamingContext.textFileStream("hdfs://bigdata02.nebuinfo.com:8020/sparktest/data/wordcount");
    lines.flatMap(x-> Arrays.asList(x.split(" ")).iterator())
            .mapToPair(x->new Tuple2(x,1))
            .reduceByKey((x,y)->x+y).print();
    //必须调用start方法才会开始
    javaStreamingContext.start();
    javaStreamingContext.awaitTermination();
    javaStreamingContext.close();
    }
}

网上说可以用fileStream,但是我得到的结果不正确,哪位大神知道麻烦说一下

JavaPairInputDStream longWritableTextJavaPairInputDStream = javaStreamingContext.fileStream("hdfs://bigdata02.nebuinfo" +
                    ".com:8020/sparktest/data/wordcount",
            LongWritable.class, Text.class, TextInputFormat.class,
            new Function() {
                @Override
                public Boolean call(Path v1) throws Exception {
                    return true;
                }
            }, false);

    longWritableTextJavaPairInputDStream.print();

你可能感兴趣的:(大数据)