Hadoop文件压缩与解压

hadoop文件压缩和解压缩的一个简单测试程序:

package org.myorg;

import java.io.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.util.ReflectionUtils;

public class StreamCompressor {
	
	public static void main(String[] args) throws Exception{

		String codecClassname = args[0];
		Class codecClass = Class.forName(codecClassname);
		Configuration conf = new Configuration();
		CompressionCodec codec = (CompressionCodec)ReflectionUtils.newInstance(codecClass, conf);
		
		//把str里到数据压缩后放到text文件里
		CompressionOutputStream out = codec.createOutputStream(new FileOutputStream(new File("text")));
		String str = "try compress and decompress";
		byte[] bytes = new byte[1024];
		bytes = str.getBytes();
//		IOUtils.copyBytes(new ByteArrayInputStream(bytes), out, 4096, false);
		out.write(bytes);
		out.finish();
		
		//把text文件里到数据解压,然后输出到控制台
		InputStream in = codec.createInputStream(new FileInputStream(new File("text")));
		BufferedInputStream bfin = new BufferedInputStream(in);
		bfin.read(bytes);
		System.out.println(new String(bytes));
	}
}

1. arg[0]的值为:org.apache.hadoop.io.compress.GzipCodec

2. 先创建一个压缩输出流out,向输出流写数据(try compress and decompress),然后关闭输出流。这时压缩好到数据被放到text文件里。

3. 从text文件获取input流进行解压,然后输出。输出结果如下。有些警告信息,目前还不太清楚原因。


你可能感兴趣的:(hadoop)