使用hadoop MR处理数据导入Elasticsearch

我的hadoop版本hadoop-2.6.5。我的Es版本elasticsearch-6.4.3。首先保证你的hadoop集群可以跑wordcount例子。

以下为pom文件,注意这里要加上provided,否则会出现以下错误

org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/usr/local/hadoop/hadoop-2.6.5/share/hadoop/yarn/lib/elasticsearch-hadoop-6.4.3.jar
jar:file:/home/bigdata/tmp/nm-local-dir/usercache/root/appcache/application_1551077535613_0047/filecache/10/job.jar/job.jar


    org.elasticsearch
    elasticsearch-hadoop
    6.4.3
    provided

接下来需要把elasticsearch-hadoop-6.4.3.jar,这个jar包放入到hadoop目录下的yarn的lib下

我的lib位置为/usr/local/hadoop/hadoop-2.6.5/share/hadoop/yarn/lib

必须放入,必须放入,必须放入!!!否则会出现以下错误

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.elasticsearch.hadoop.mr.EsOutputFormat not found

 

接下来上代码

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
    Configuration conf = new Configuration();
    conf.setBoolean("mapred.map.tasks.speculative.execution", false);
    conf.setBoolean("mapred.reduce.tasks.speculative.execution", false);
    conf.set("es.nodes", "localhost:9200");
    conf.set("es.resource", "my_index/my_type");
    conf.set("es.mapping.id", "id");
    conf.set("es.input.json", "yes");

    Job job = Job.getInstance(conf, "hadoop es write test");
    job.setJarByClass(HdfsToES.class);
    job.setMapperClass(HdfsToES.MyMapper.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(EsOutputFormat.class);

    job.setMapOutputKeyClass(NullWritable.class);
    job.setMapOutputValueClass(Text.class);

    // 设置输入路径
    FileInputFormat.setInputPaths(job, new Path
            ("hdfs://node01:8020/xxxx/xxxx.json"));
    job.waitForCompletion(true);
}

 

 public static class MyMapper extends Mapper {
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
                JSONObject jsonObject2 = new JSONObject();
                
                jsonObject2.put("id","9527");
                jsonObject2.put("name","盖伦");
                jsonObject2.put("age,"18");


                Text valueout = new Text();
                valueout.set(jsonObject2.toJSONString().trim());
                context.write(NullWritable.get(), valueout);
            }
        }
    }

以下为错误展示:

org.apache.hadoop.mapred.YarnChild: Exception running child : org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('c' (code 99)): was expecting double-quote to start field name

出现以上错误请使用job.setMapOutputValueClass(Text.class);

不要使用job.setMapOutputValueClass(BytesWritable.class);

或者job.setMapOutputValueClass(LinkedMapWritable.class);

 

 

 

org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.io.ArrayWritable.()

或者

org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('b' (code 98)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: [B@69c79f09; line: 1, column: 3]

出现以上错误请使用job.setMapOutputValueClass(Text.class);不要使用LinkedMapWritable

使用json存放数据(我是用的alibaba的fastjson),

然后Text valueout = new Text();
                valueout.set(jsonObject2.toJSONString().trim());
                context.write(NullWritable.get(), valueout);

 

出现错误不要着急,更不要盲目的反复,反复,反复。试验csdn博主给的解决方式,往往版本不同,环境不同,解决方法就不一样。别人适用的解决方法并不适合你的情况。多查资料,多动脑

你可能感兴趣的:(elasticsearch,hadoop,mapreduce,elasticsearch,导入)