把纯文本字符串用Gzip压缩再转换为Base64能有多少压缩率呢?

其实具体多大压缩率要看源文件的内容,一般来说重复的单词越多,压缩率越高。

下面是把/usr/share/dict/words压缩的测试程序



import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.zip.GZIPOutputStream;

import org.apache.commons.codec.binary.Base64;

public class GzipBase64Tests {

public static void main(String[] args) throws Exception {
File input = new File("/usr/share/dict/words");

if (!input.exists()) {
System.out.println("input file not exists!");
return;
}

if (output.exists()) {
output.delete();
}

ByteArrayOutputStream buffer = new ByteArrayOutputStream();
GZIPOutputStream gout = new GZIPOutputStream(buffer);

FileInputStream in = new FileInputStream(input);

long t1 = System.currentTimeMillis();
byte[] buf = new byte[1024];
int total=0;
int rd;
while ((rd = in.read(buf)) != -1) {
total += rd;
gout.write(buf,0, rd);
}

gout.close();
in.close();

byte[] result = buffer.toByteArray();

long t2 = System.currentTimeMillis();
String base64 = Base64.encodeBase64String(result);
long t3 = System.currentTimeMillis();

System.out.printf("raw %d -> gzip %d -> base64 %d, time1 %dms, time2 %dms", total, result.length, base64.length(), t2-t1, t3-t2);
}
}



输出为: raw 2493109 -> gzip 753932 -> base64 1005244, time1 225ms, time2 43ms

压缩了50%。 感觉没啥实用价值。

你可能感兴趣的:(Java)