使用hadoop的压缩方式进行压缩和解压

压缩算法及其编码/解码器

压缩格式 对应的编码/解码器
DEFLATE org.apache.hadoop.io.compress.DefaultCodec
gzip org.apache.hadoop.io.compress.GzipCodec
bzip org.apache.hadoop.io.compress.BZip2Codec
Snappy org.apache.hadoop.io.compress.SnappyCodec

压缩过程实现:
接受一个字符串参数,用于指定编码/解码器,使用反射机制创建对应的并对相应的编码解码对象,对文件进行压缩。

public static  void  compress(String method) throws ClassNotFoundException, IOException {
        File fileIn = new File("adult.data");
        //输入流
        FileInputStream in = new FileInputStream(fileIn);
        Class codecClass = Class.forName(method);
        Configuration conf = new Configuration();
        //通过名称找对应的编码/解码器
        CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf);
        File fileOut = new File("adult.data" + codec.getDefaultExtension());
        fileOut.delete();
        //文件输出流
        FileOutputStream out = new FileOutputStream(fileOut);
        //通过编码/解码器创建对应的输出流
        CompressionOutputStream cout = codec.createOutputStream(out);
        //压缩
        IOUtils.copyBytes(in,cout,4096,false);
        in.close();
        cout.close();
    }

解压缩过程实现:
解压文件时,通常通过指定其拓展名来推断解码器。

public static void decompress(File file) throws IOException {
        Configuration conf = new Configuration();
        CompressionCodecFactory factory = new CompressionCodecFactory(conf);
        //通过文件拓展名获得相应的编码/解码器
        CompressionCodec codec = factory.getCodec(new Path(file.getName()));
        if(codec == null){
            System.out.println("Cannot find codec for file " + file);
        }
        File fileOut = new File(file.getName());
        //通过编码/解码器创建对应的输入流
        CompressionInputStream in = codec.createInputStream(new FileInputStream(file));
        FileOutputStream out = new FileOutputStream(new File("adult.data.decompress"));
        IOUtils.copyBytes(in,out,4096,false);
        in.close();
        out.close();
    }

你可能感兴趣的:(hadoop)