用Hadoop mapreduce将json导入到elasticserch

1.把json上传到hdfs中





2.maven依赖


    org.elasticsearch
    elasticsearch-hadoop
    5.5.2


    org.elasticsearch
    elasticsearch-hadoop-mr
    5.5.2


    org.elasticsearch
    elasticsearch-hadoop-hive
    5.5.2


    org.elasticsearch
    elasticsearch-hadoop-pig
    5.5.2





    org.apache.lucene
    lucene-analyzers-common
    6.0.0





    cn.bestwu
    ik-analyzers
    5.1.0


    org.apache.hadoop
    hadoop-common
    2.5.1


    org.apache.hadoop
    hadoop-hdfs
    2.5.1


    org.apache.hadoop
    hadoop-client
    2.5.1



    commons-httpclient
    commons-httpclient
    3.1





3.写代码(我的Es版本是5.5.2)


public class EsHadoop {
    public static class MyMapper extends Mapper {
        private Text line = new Text();
        public void map(Object key, Text value, Mapper.Context context) throws                   IOException, InterruptedException {
            if(value.getLength()>0){
                line.set(value);
                context.write(NullWritable.get(), line);
            }
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        Configuration conf = new Configuration();
       // String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        conf.setBoolean("mapred.map.tasks.speculative.execution", false);
        conf.setBoolean("mapred.reduce.tasks.speculative.execution", false);
        conf.set("es.nodes", "192.168.1.158:9200");
        conf.set("es.resource", "blog/csdn");//index/type
        conf.set("es.input.json", "yes");

        Job job = Job.getInstance(conf, "hadoop es write test");
        job.setJarByClass(EsHadoop.class);
        job.setMapperClass(EsHadoop.MyMapper.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(EsOutputFormat.class);
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(Text.class);

        // 设置输入路径
        FileInputFormat.addInputPath(job, new Path
                ("hdfs://192.168.1.120:9000/data/json"));//json路径
        job.waitForCompletion(true);
    }


4 打jar包


因为用到了第三方jar包,所以打包的时候在jar下新建一个lib把第三方jar放进去,不然会报错
java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat 这个错

用Hadoop mapreduce将json导入到elasticserch_第1张图片

选中你的那个类

用Hadoop mapreduce将json导入到elasticserch_第2张图片

在jar下面新建一个lib文件夹,把第三方包加进入,就可以了

用Hadoop mapreduce将json导入到elasticserch_第3张图片



选择build执行,在out文件夹下找到你的jar包,放入到hadoop目录下


5.执行命令  hadoop jar /usr/local/hadoop-2.8.2/comspark.jar 


你可能感兴趣的:(java,elasticsearch,hadoop)