hadoop的MR当用MultipleInputs时要获取文件路径名称方法以及TaggedInputSplit报错

HADOOP中利用MR读取输入文件名报错


Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
原因:因为我的MR需要多路读取hdfs文件,所以使用 MultipleInputs.addInputPath读取文件。却使用了FileInputFormat单路输入文件对应获得文件名的API

            InputSplit inputSplit = context.getInputSplit();
            String filePath = ((FileSplit) inputSplit).getPath().toString();

MultipleInputs对应获取文件名写法

    		InputSplit split = context.getInputSplit();
            Class<? extends InputSplit> splitClass = split.getClass();
            FileSplit fileSplit = null;
            if (splitClass.equals(FileSplit.class)) {
                fileSplit = (FileSplit) split;
            } else if (splitClass.getName().equals(
                    "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
                // begin reflection hackery...
                try {
                    Method getInputSplitMethod = splitClass
                            .getDeclaredMethod("getInputSplit");
                    getInputSplitMethod.setAccessible(true);
                    fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
                } catch (Exception e) {
                    // wrap and re-throw error
                    throw new IOException(e);
                }
                // end reflection hackery
            }
            String filePath = fileSplit.getPath().toString();

stackoverflow外文解释

或者使用网站内大神写好的一个类

Path path = MapperUtils.getPath(context.getInputSplit());
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

import java.lang.invoke.MethodHandle;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Method;
import java.util.Optional;

public class MapperUtils {

    public static Path getPath(InputSplit split) {
        return getFileSplit(split).map(FileSplit::getPath).orElseThrow(() -> 
            new AssertionError("cannot find path from split " + split.getClass()));
    }

    public static Optional<FileSplit> getFileSplit(InputSplit split) {
        if (split instanceof FileSplit) {
            return Optional.of((FileSplit)split);
        } else if (TaggedInputSplit.clazz.isInstance(split)) {
            return getFileSplit(TaggedInputSplit.getInputSplit(split));
        } else {
            return Optional.empty();
        }
    }

    private static final class TaggedInputSplit {
        private static final Class<?> clazz;
        private static final MethodHandle method;

        static {
            try {
                clazz = Class.forName("org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit");
                Method m = clazz.getDeclaredMethod("getInputSplit");
                m.setAccessible(true);
                method = MethodHandles.lookup().unreflect(m).asType(
                    MethodType.methodType(InputSplit.class, InputSplit.class));
            } catch (ReflectiveOperationException e) {
                throw new AssertionError(e);
            }
        }

        static InputSplit getInputSplit(InputSplit o) {
            try {
                return (InputSplit) method.invokeExact(o);
            } catch (Throwable e) {
                throw new AssertionError(e);
            }
        }
    }

    private MapperUtils() { }

}

你可能感兴趣的:(java,MapReduce)