选择压缩算法的经历 (by quqi99)
作者:张华 发表于:2007-08-03 ( http://blog.csdn.net/quqi99 )
刚开始,主要是比较 zlib, bzip2, gzip, rar, zip 几种算法的压缩性能比。
对于 rar 格式,由于主要是调用命令进行解压缩(代码见附件一)。它是跑在虚拟机之外的,一旦出现了错误,可能整个虚拟机因此会死掉,所以这种方法不予考虑。
网上说 bzip2 对于文本压缩的效率算是最高的, ant.jar 包的 org.apache.tools.bzip2 提供了相应的 API 。但是使用时总不顺利,也就没花时间继续了。相关代码见附件二(未测试成功)。
于是开始学习 zlib. 它的 JAVA 版本叫 jzlib, 用 jzlib 进行解压缩的代码见附件三。觉得这个还不错。于是,准备用它,但是压缩一个文件还行,但用 java.util.zip 包那样压缩目录确挺不方便的。现在才开始恍然大悟。哦,原来这些压缩算法一般只注重算法本身,至于怎么用着方便如支持按条目压缩则是外围应用要管的事情。
于是,开始考虑怎么吸收 java.util.zip 包里的思想在 zlib 算法的基础上包装能按目录压缩。搞到最后,发现原来 java.util.zip 包的底层用的压缩算法就是用的是 zlib. SUN 公司只不过是在核心算法的基础上加上了一些如校验( CRC32, Adler32 )及按目录压缩( ZipEntry )以及方便访问的输入输出流( ZipInputStream , ZipOutputStream )。
既然 java.util.zip 包里用的就是 zlib ,我们就不需要再考虑怎样按目录进行压缩了,但事情进展也并不是一帆风顺。
首先,直接用 java.util.zip 的 API 编出的解压缩不能支持中文文件名,因为 java 对于文字的编码是以 unicode 为基础,因此,若是以 ZipInputStream 及 ZipOutputStream 来处理压缩及解压缩的工作,碰到中文档名或路径,它就不处理。仔细查看了 ZipInputStream 的 API ,发现问题就出现在 java.uti.zip.ZipInputStream 类中的这一句: ZipEntry e = createZipEntry(getUTF8String(b, 0, len)); 它应该被改成:
ZipEntry e=null;
if (this.encoding.toUpperCase().equals("UTF-8"))
e=createZipEntry(getUTF8String(b, 0, len));
e=createZipEntry(new String(b,0,len,this.encoding));
catch(Exception byteE)
e=createZipEntry(getUTF8String(b, 0, len));
幸好,在网上一搜,发现这个改动不需要由我们自己来做,因为 ant 的 org.apache.tools.zip 包中已经为我们改好了。用这个包编写的解压缩代码见附件四 .
接着又发现了问题。解压文件时有两种方式,一是采用 ZipFile, 二是采用 ZipOutputStream 。 ZipFile 一次性将 zip 文件全部读到内存中去,对于大 zip 就不行了,这时得采用 ZipOutputStream 方式,但是 org.apache.tools.zip 包对 ZipOutputStream 类恰好没进行改定,只仅仅提供了改写后的 ZipFile 。当你用 java.util.zip.ZipOutputStream 时同样对于中文文件名的文件不能进行压缩。
这时候在网上找到了文件《 让 ZipOutputStream 和 ZipInputStream 支持中文》(可在 google 搜)。它的方法是直接改 JDK 的源代码。但是我觉得直接改 JDK 的 JAR 包以后软件部署时比较麻烦,为些,我开始寻找另外的解决办法。
为了不改动 java.util.zip. ZipInputStream, 自己就直接将这个类再重写一遍,首先通过复制粘贴写一个与之内容一模一样的类 jcss.search.base.zip.C ZipInputStream 。然后在这个类中将 ZipEntry e = createZipEntry(getUTF8String(b, 0, len)) 改写成上述的代码。此类见附件五。
另外,将复制出与 java.uti.zip.ZipConstants 内容一模一样的类 jcss.search.base.zip. ZipConstants
另外,再实现一个 jcss.search.base.zip.ZipEntry 类,代码见附件六 .
至此, OK 。
若想进一步提高压缩比的话,可以采用 7zip, 并且目前也有专门版本的 7zip SDK (实现了 LZMA 压缩算法 . 另外,也有热心人士为方便访问在此基础上增加了两件输入输出流类( net.contrapunctus.lzma.LzmaInputStream 与 net.contrapunctus.lzma.LzmaOutputStream ) ,但是没有包装按目录进行压缩相关的条目类。
package jcss.search.base;
* @author 张华
* @time 2007 - 8 - 1
* @description
public class RarUtil {
* 解压
* @param compress
* rar 压缩文件
* @param decompression
* 解压路径
public void unZip(String compress, String decompression) throws Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime ();
Process p = rt.exec( "C://Program Files//WinRAR//UNRAR.EXE x -o+ -p- " + compress + " " + decompression);
StringBuffer sb = new StringBuffer();
java.io.InputStream fis = p.getInputStream();
int value = 0;
while ((value = fis.read()) != -1)
sb.append(( char ) value);
String result = new String(sb.toString().getBytes( "ISO-8859-1" ), "GBK" );
System. out .println(result);
* @param outputRar 输出目录
* @param compression 要压缩的文件或目录
* @throws Exception
public void zip(String outputRar, String compression) throws Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime ();
//rar.exe x -t -o+ -p- E:/2.rar E:/
Process p = rt.exec( "C://Program Files//WinRAR//rar.exe x -t -o+ -p- " + outputRar + " " + compression);
StringBuffer sb = new StringBuffer();
java.io.InputStream fis = p.getInputStream();
int value = 0;
while ((value = fis.read()) != -1)
sb.append(( char ) value);
String result = new String(sb.toString().getBytes( "ISO-8859-1" ), "GBK" );
System. out .println(result);
* @param args
public static void main(String[] args) {
RarUtil test = new RarUtil();
String compress = "f:/ 增加转码过滤器 .rar" ; // rar 压缩文件
String decompression = "f:/test/" ; // 解压路径
try {
test.zip( "f:/test.rar" , " 说明 .txt" );
//test.unZip(compress, decompression);
} catch (Exception e) {
package jcss.search.base;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.tools.bzip2.CBZip2InputStream;
import org.apache.tools.bzip2.CBZip2OutputStream;
* @author 张华
* @time 2007-7-26
* @description BZip2 压缩,解压算法
public class BZip2Util {
public static void Bzip2Compress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
CBZip2OutputStream output = new CBZip2OutputStream(
new FileOutputStream(destination));
final FileInputStream input = new FileInputStream(source);
copy(input, output);
} catch (Exception e) {
public static void Bzip2Uncompress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
FileOutputStream output =new FileOutputStream(destination);
CBZip2InputStream input = new CBZip2InputStream( new FileInputStream(source));
copy( input, output );
} catch (Exception e) {
static void copy(final InputStream input, final OutputStream output)
throws IOException {
final byte[] buffer = new byte[8024];
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
* @param args
public static void main(String[] args) {
BZip2Util test = new BZip2Util();
String in = "f://~HlIndex.htm";
String to = "f://a.bz2";
String out2 = "b.htm";
//test.Bzip2Compress(in, to);
//test.Bzip2Uncompress(to, out2);
package example;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import com.jcraft.jzlib.*;
/** 缺点:不能按目录压缩。
* @author 张华
* @time 2007-7-30
* @description reference http://tianxiagod.spaces.live.com/
* http://blog.csdn.net/kong555/archive/2006/03/28/641855.aspx
public class TestJZlib {
// 压缩的文件长度,压缩、解压时均要用,挺关键。
// 要确保方法 compressfile ()与 uncompressfile ()参数一致
static int resLen = 0;
* 压缩
* @param data
* @param type
* 压缩方法为一个整数 -1 为默认压缩比 9 为最高压缩比 0 为不压缩 1 为快速压缩
* @return
public static byte[] compressfile(byte[] data, int type,int len) {
int err;
int comprLen = len;
byte[] compr = new byte[comprLen];
ZStream c_stream = new ZStream();
err = c_stream.deflateInit(type);
CHECK_ERR(c_stream, err, "deflateInit");
c_stream.next_in = data;
c_stream.next_in_index = 0;
c_stream.next_out = compr;
c_stream.next_out_index = 0;
while (c_stream.total_in != data.length
&& c_stream.total_out < comprLen) {
c_stream.avail_in = c_stream.avail_out = 1; // 置初值
err = c_stream.deflate(JZlib.Z_NO_FLUSH);
CHECK_ERR(c_stream, err, "deflate");
System.out.println(" 压缩前 --" + c_stream.total_in + " 字节 ");
while (true) {
c_stream.avail_out = 1;
err = c_stream.deflate(JZlib.Z_FINISH);
if (err == JZlib.Z_STREAM_END) {
CHECK_ERR(c_stream, err, "deflate");
System.out.println(" 压缩后 --" + c_stream.total_out + " 字节 ");
err = c_stream.deflateEnd();
CHECK_ERR(c_stream, err, "deflateEnd");
byte[] zipfile = new byte[(int) c_stream.total_out];
System.arraycopy(compr, 0, zipfile, 0, zipfile.length);
return zipfile;
public static byte[] uncompressfile(byte[] data,int len) {
int err;
int uncomprLen = len;
byte[] uncompr = new byte[uncomprLen];
ZStream d_stream = new ZStream();
err = d_stream.inflateInit();
CHECK_ERR(d_stream, err, "inflateInit");
d_stream.next_in = data;
d_stream.next_in_index = 0;
d_stream.next_out = uncompr;
d_stream.next_out_index = 0;
while (d_stream.total_out < uncomprLen
&& d_stream.total_in < uncomprLen) {
d_stream.avail_in = d_stream.avail_out = 1;
err = d_stream.inflate(JZlib.Z_NO_FLUSH);
if (err == JZlib.Z_STREAM_END) {
CHECK_ERR(d_stream, err, "inflate");
System.out.println(" 解压缩前 --" + d_stream.total_in + " 字节 ");
System.out.println(" 解压缩后 --" + d_stream.total_out + " 字节 ");
err = d_stream.inflateEnd();
CHECK_ERR(d_stream, err, "inflateEnd");
byte[] unzipfile = new byte[(int) d_stream.total_out];
System.arraycopy(uncompr, 0, unzipfile, 0, unzipfile.length);
return unzipfile;
static void CHECK_ERR(ZStream z, int err, String msg) {
if (err != JZlib.Z_OK) {
if (z.msg != null) {
System.out.print(z.msg + " ");
System.out.println(msg + " error: " + err);
static void zip(File input, File output, int compressFactor) {
if (!input.exists())
if (!output.getParentFile().exists())
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
resLen = in.available();
byte[] buff = new byte[resLen];
byte[] suBuf = compressfile(buff, compressFactor,resLen);
out.write(suBuf, 0, suBuf.length); // 写压缩文件
System.out.println(" 压缩完毕! " + input.getAbsolutePath());
} catch (Exception e) {
static void unZip(File input, File output) {
if (!input.exists())
if (!output.getParentFile().exists())
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
byte[] buff = new byte[resLen];
byte[] suBuff = uncompressfile(buff,resLen);
out.write(suBuff, 0, suBuff.length); // 写压缩文件
System.out.println(" 解压完毕! " + input.getAbsolutePath());
} catch (Exception e) {
* @param args
public static void main(String[] args) {
TestJZlib test = new TestJZlib();
// 压缩
File input = new File("f:// 搜索引擎原理系统与设计 .pdf");
File output = new File("f://test.bz2");
test.zip(input, output, 9);
// 解压
File output2 = new File("f://test.jpg");
test.unZip(output, output2);
package jcss.search.base;
调用 org.apache.tools.zip 实现压缩。
夜可以使用 java.util.zip 不过如果是中文的话,
java.util.zip.ZipInputStream 的编码字符集不同
java.util.zip.ZipInputStream 的字符集固定是 UTF-8
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.Date;
import java.util.zip.ZipInputStream;
import jcss.search.base.zip.CZipInputStream;
import org.apache.tools.zip.ZipOutputStream;
* @ 作者:张华 @ 日期: 2006-5-14 @ 说明:
public class ZipUtil {
int count = 0;
static final int BUFFER = 2048;
public void zip(String zipFileName, String inputFile) throws Exception {
zip(zipFileName, new File(inputFile));
public void zip(String zipFileName, File inputFile) throws Exception {
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
new String(zipFileName.getBytes("gb2312"))));
System.out.println("zip start");
zip(out, inputFile, "");
System.out.println("zip done");
public void zip(ZipOutputStream out, File f, String base) throws Exception {
System.out.println("Zipping " + f.getName());
Date beginDate = new Date();
if (f.isDirectory()) {
File[] fl = f.listFiles();
// out.putNextEntry(new ZipEntry(base + "/"));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base + "/"));
base = base.length() == 0 ? "" : base + "/";
for (int i = 0; i < fl.length; i++) {
zip(out, fl[i], base + fl[i].getName());
// System.out.println(new
// String(fl[i].getName().getBytes("gb2312")));
} else {
// out.putNextEntry(new ZipEntry(base));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base));
FileInputStream in = new FileInputStream(f);
int b;
while ((b = in.read()) != -1)
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println(" 共用时间: " + temp);
private void createDirectory(String directory, String subDirectory) {
String dir[];
File fl = new File(directory);
try {
if (subDirectory == "" && fl.exists() != true)
else if (subDirectory != "") {
dir = subDirectory.replace('//', '/').split("/");
for (int i = 0; i < dir.length; i++) {
File subFile = new File(directory + File.separator + dir[i]);
if (subFile.exists() == false)
directory += File.separator + dir[i];
} catch (Exception ex) {
* 使用 ZipFile 解压缩小 ZIP
* * 类 ZipInputStream 读出 ZIP 文件序列(简单地说就是读出这个 ZIP 文件压缩了多少文件)
* 而类 ZipFile 使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出 ZIP 压缩文件序列。
* ZIPInputStream 和 ZipFile 之间另外一个基本的不同点在于高速缓冲的使用方面。
* 当文件使用 ZipInputStream 和 FileInputStream 流读出的时候, ZIP 条目不使用高速缓冲。
* 然而,如果使用 ZipFile (文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果 ZipFile (文件名)
* 被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在 UNIX 系统下,
* 这是什么作用都没有的,因为使用 ZipFile 打开的所有 ZIP 文件都在内存中存在映射,
* 所以使用 ZipFile 的性能优于 ZipInputStream 。
* 然而,如果同一 ZIP 文件的内容在程序执行期间经常改变,或是重载的话,使用 ZipInputStream 就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
public void unSmallZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
java.util.Enumeration e = zipFile.getEntries();
org.apache.tools.zip.ZipEntry zipEntry = null;
createDirectory(outputDirectory, "");
while (e.hasMoreElements()) {
zipEntry = (org.apache.tools.zip.ZipEntry) e.nextElement();
String name = null;
if (zipEntry.isDirectory()) {
name = zipEntry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
System.out.println(" 创建目录: " + outputDirectory
+ File.separator + name);
} else {
String fileName = zipEntry.getName();
fileName = fileName.replace('//', '/');
System.out.println(" 正在解压第 " + count + " 个文件 : "
+ zipEntry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,
fileName = fileName.substring(
fileName.lastIndexOf("/") + 1, fileName
File f = new File(outputDirectory + File.separator
+ zipEntry.getName());
InputStream in = zipFile.getInputStream(zipEntry);
FileOutputStream out = new FileOutputStream(f);
byte[] by = new byte[1024];
int c;
while ((c = in.read(by)) != -1) {
out.write(by, 0, c);
// 删除文件不能在这里删,因为文件正在使用,应在上传那处删
// 解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println(" 正在删除文件: "+ zipFileToDel.getCanonicalPath());
// // 删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println(" 解压共用时间: " + temp);
} catch (Exception ex) {
* 使用 ZipInputStream 解压大 ZIP( 通过修改 ZipInputStream 类让其支持中文文件名 )
* 类 ZipInputStream 读出 ZIP 文件序列(简单地说就是读出这个 ZIP 文件压缩了多少文件)
* 而类 ZipFile 使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出 ZIP 压缩文件序列。
* ZIPInputStream 和 ZipFile 之间另外一个基本的不同点在于高速缓冲的使用方面。
* 当文件使用 ZipInputStream 和 FileInputStream 流读出的时候, ZIP 条目不使用高速缓冲。
* 然而,如果使用 ZipFile (文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果 ZipFile (文件名)
* 被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在 UNIX 系统下,
* 这是什么作用都没有的,因为使用 ZipFile 打开的所有 ZIP 文件都在内存中存在映射,
* 所以使用 ZipFile 的性能优于 ZipInputStream 。
* 然而,如果同一 ZIP 文件的内容在程序执行期间经常改变,或是重载的话,使用 ZipInputStream 就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
public void unBigZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
//org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
FileInputStream fis = new FileInputStream(zipFileName);
BufferedOutputStream dest = null;
//CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis));
CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis),"gb2312");
//org.apache.tools.zip.ZipEntry entry;
//java.util.zip.ZipEntry entry;
jcss.search.base.zip.ZipEntry entry;
while((entry =zin.getNextEntry()) != null) {
String name = null;
if (entry.isDirectory()) {
name = entry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
System.out.println(" 创建目录: " + outputDirectory + File.separator + name);
String fileName = entry.getName();
fileName = fileName.replace('//', '/');
System.out.println(" 正在解压第 " + count + " 个文件 : " + entry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,fileName.lastIndexOf("/")));
fileName = fileName.substring(fileName.lastIndexOf("/") + 1, fileName.length());
File f = new File(outputDirectory + File.separator + entry.getName());
// InputStream in = zipFile.getInputStream(zipEntry);
// FileOutputStream out = new FileOutputStream(f);
// byte[] by = new byte[1024];
// int c;
// while ((c = in.read(by)) != -1) {
// out.write(by, 0, c);
// }
// out.close();
// in.close();
int cnt;
byte data[] = new byte[BUFFER];
FileOutputStream fos = new FileOutputStream(f);
dest = new BufferedOutputStream(fos, BUFFER);
while ((cnt = zin.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, cnt);
// 删除文件不能在这里删,因为文件正在使用,应在上传那处删
// 解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println(" 正在删除文件: "+ zipFileToDel.getCanonicalPath());
// // 删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println(" 解压共用时间: " + temp);
} catch (Exception ex) {
* 删掉一层目录
* @param zipFileName
* @param outputDirectory
public void delALayerDir(String zipFileName, String outputDirectory) {
String[] dir = zipFileName.replace('//', '/').split("/");
String fileFullName = dir[dir.length - 1]; // 得到 aa.zip
int pos = -1;
pos = fileFullName.indexOf(".");
String fileName = fileFullName.substring(0, pos); // 得到 aa
String sourceDir = outputDirectory + File.separator + fileName;
try {
copyFile(new File(outputDirectory), new File(sourceDir), new File(
deleteSourceBaseDir(new File(sourceDir));
} catch (Exception e) {
* 将 sourceDir 目录的文件全部 copy 到 destDir 中去
public void copyFile(File destDir, File sourceBaseDir, File sourceDir)
throws Exception {
File[] lists = sourceDir.listFiles();
String line = null;
String url = null;
if (lists == null)
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
FileInputStream fis = new FileInputStream(f);
String content = "";
String sourceBasePath = sourceBaseDir.getCanonicalPath();
String destPath = destDir.getCanonicalPath();
String fPath = f.getCanonicalPath();
String drPath = destDir
+ fPath.substring(fPath.indexOf(sourceBasePath)
+ sourceBasePath.length());
FileOutputStream fos = new FileOutputStream(drPath);
byte[] b = new byte[2048];
while (fis.read(b) != -1) {
if (content != null)
content += new String(b);
content = new String(b);
b = new byte[2048];
content = content.trim();
} else {
// 先新建目录
new File(destDir + File.separator + f.getName()).mkdir();
copyFile(destDir, sourceBaseDir, f); // 递归调用
* 将 sourceDir 目录的文件全部 copy 到 destDir 中去
public void deleteSourceBaseDir(File curFile) throws Exception {
File[] lists = curFile.listFiles();
String line = null;
String url = null;
File parentFile = null;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
// 若它的父目录没有文件了,说明已经删完,应该删除父目录
parentFile = f.getParentFile();
if (parentFile.list().length == 0)
} else {
deleteSourceBaseDir(f); // 递归调用
public static void main(String[] args) {
try {
ZipUtil t = new ZipUtil();
// t.zip("e://test.zip", "E://news.sina.com.cn//news.sina.com.cn");
Date beginDate = new Date();
//t.unZip("e://test.zip", "E://news.sina.com.cn");
t.unBigZip("e://test.zip", "E://news.sina.com.cn");
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println(" 共用时间: " + temp);
} catch (Exception e) {
* @(#)ZipInputStream.java 1.37 04/06/11
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
package jcss.search.base.zip;
import java.io.InputStream;
import java.io.IOException;
import java.io.EOFException;
import java.io.PushbackInputStream;
import java.util.zip.CRC32;
import java.util.zip.Inflater;
import java.util.zip.InflaterInputStream;
import java.util.zip.ZipException;
* @author David Connelly
* @version 1.37, 06/11/04
public class CZipInputStream extends InflaterInputStream implements ZipConstants {
private String encoding = "UTF-8" ;
private ZipEntry entry ;
private CRC32 crc = new CRC32();
private long remaining ;
private byte [] tmpbuf = new byte [512];
private static final int STORED = ZipEntry. STORED ;
private static final int DEFLATED = ZipEntry. DEFLATED ;
private boolean closed = false ;
// this flag is set to true after EOF has reached for
// one entry
private boolean entryEOF = false ;
* Check to make sure that this stream has not been closed
private void ensureOpen() throws IOException {
if ( closed ) {
throw new IOException( "Stream closed" );
boolean usesDefaultInflater = false ;
* Creates a new ZIP input stream.
* @param in the actual input stream
public CZipInputStream(InputStream in) {
super ( new PushbackInputStream(in, 512), new Inflater( true ), 512);
usesDefaultInflater = true ;
if (in == null ) {
throw new NullPointerException( "in is null" );
public CZipInputStream(InputStream in,String encoding) {
super ( new PushbackInputStream(in,512), new Inflater( true ),512);
usesDefaultInflater = true ;
if (in == null ) {
throw new NullPointerException( "in is null" );
this . encoding =encoding;
* Reads the next ZIP file entry and positions the stream at the
* beginning of the entry data.
* @return the next ZIP file entry, or null if there are no more entries
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
public ZipEntry getNextEntry() throws IOException {
if ( entry != null ) {
crc .reset();
inf .reset();
if (( entry = readLOC()) == null ) {
return null ;
if ( entry . method == STORED ) {
remaining = entry . size ;
entryEOF = false ;
return entry ;
* Closes the current ZIP entry and positions the stream for reading the
* next entry.
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
public void closeEntry() throws IOException {
while (read( tmpbuf , 0, tmpbuf . length ) != -1) ;
entryEOF = true ;
* Returns 0 after EOF has reached for the current entry data,
* otherwise always return 1.
* <p>
* Programs should not count on this method to return the actual number
* of bytes that could be read without blocking.
* @return 1 before EOF and 0 after EOF has reached for current entry.
* @exception IOException if an I/O error occurs.
public int available() throws IOException {
if ( entryEOF ) {
return 0;
} else {
return 1;
* Reads from the current ZIP entry into an array of bytes. Blocks until
* some input is available.
* @param b the buffer into which the data is read
* @param off the start offset of the data
* @param len the maximum number of bytes read
* @return the actual number of bytes read, or - 1 if the end of the
* entry is reached
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
public int read( byte [] b, int off, int len) throws IOException {
if (off < 0 || len < 0 || off > b. length - len) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
if ( entry == null ) {
return -1;
switch ( entry . method ) {
len = super .read(b, off, len);
if (len == -1) {
readEnd( entry );
entryEOF = true ;
entry = null ;
} else {
crc .update(b, off, len);
return len;
case STORED :
if ( remaining <= 0) {
entryEOF = true ;
entry = null ;
return -1;
if (len > remaining ) {
len = ( int ) remaining ;
len = in .read(b, off, len);
if (len == -1) {
throw new ZipException( "unexpected EOF" );
crc .update(b, off, len);
remaining -= len;
return len;
default :
throw new InternalError( "invalid compression method" );
* Skips specified number of bytes in the current ZIP entry.
* @param n the number of bytes to skip
* @return the actual number of bytes skipped
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
* @exception IllegalArgumentException if n < 0
public long skip( long n) throws IOException {
if (n < 0) {
throw new IllegalArgumentException( "negative skip length" );
int max = ( int )Math.min (n, Integer. MAX_VALUE );
int total = 0;
while (total < max) {
int len = max - total;
if (len > tmpbuf . length ) {
len = tmpbuf . length ;
len = read( tmpbuf , 0, len);
if (len == -1) {
entryEOF = true ;
break ;
total += len;
return total;
* Closes this input stream and releases any system resources associated
* with the stream.
* @exception IOException if an I/O error has occurred
public void close() throws IOException {
if (! closed ) {
super .close();
closed = true ;
private byte [] b = new byte [256];
* Reads local file (LOC) header for next entry.
private ZipEntry readLOC() throws IOException {
try {
readFully( tmpbuf , 0, LOCHDR );
} catch (EOFException e) {
return null ;
if (get32 ( tmpbuf , 0) != LOCSIG ) {
return null ;
// get the entry name and create the ZipEntry first
int len = get16 ( tmpbuf , LOCNAM );
if (len == 0) {
throw new ZipException( "missing entry name" );
int blen = b . length ;
if (len > blen) {
blen = blen * 2;
while (len > blen);
b = new byte [blen];
readFully( b , 0, len);
//ZipEntry e = createZipEntry(getUTF8String(b, 0, len));
ZipEntry e= null ;
if ( this . encoding .toUpperCase().equals( "UTF-8" ))
e=createZipEntry(getUTF8String ( b , 0, len));
e=createZipEntry( new String( b ,0,len, this . encoding ));
catch (Exception byteE)
e=createZipEntry(getUTF8String ( b , 0, len));
// now get the remaining fields for the entry
e. version = get16 ( tmpbuf , LOCVER );
e. flag = get16 ( tmpbuf , LOCFLG );
if ((e. flag & 1) == 1) {
throw new ZipException( "encrypted ZIP entry not supported" );
e. method = get16 ( tmpbuf , LOCHOW );
e. time = get32 ( tmpbuf , LOCTIM );
if ((e. flag & 8) == 8) {
/* EXT descriptor present */
if (e. method != DEFLATED ) {
throw new ZipException(
"only DEFLATED entries can have EXT descriptor" );
} else {
e. crc = get32 ( tmpbuf , LOCCRC );
e. csize = get32 ( tmpbuf , LOCSIZ );
e. size = get32 ( tmpbuf , LOCLEN );
len = get16 ( tmpbuf , LOCEXT );
if (len > 0) {
byte [] bb = new byte [len];
readFully(bb, 0, len);
e. extra = bb;
return e;
* Fetches a UTF8-encoded String from the specified byte array.
private static String getUTF8String( byte [] b, int off, int len) {
// First, count the number of characters in the sequence
int count = 0;
int max = off + len;
int i = off;
while (i < max) {
int c = b[i++] & 0xff;
switch (c >> 4) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
break ;
case 12: case 13:
// 110xxxxx 10xxxxxx
if (( int )(b[i++] & 0xc0) != 0x80) {
throw new IllegalArgumentException();
break ;
case 14:
// 1110xxxx 10xxxxxx 10xxxxxx
if ((( int )(b[i++] & 0xc0) != 0x80) ||
(( int )(b[i++] & 0xc0) != 0x80)) {
throw new IllegalArgumentException();
break ;
default :
// 10xxxxxx, 1111xxxx
throw new IllegalArgumentException();
if (i != max) {
throw new IllegalArgumentException();
// Now decode the characters...
char [] cs = new char [count];
i = 0;
while (off < max) {
int c = b[off++] & 0xff;
switch (c >> 4) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
cs[i++] = ( char )c;
break ;
case 12: case 13:
// 110xxxxx 10xxxxxx
cs[i++] = ( char )(((c & 0x1f) << 6) | (b[off++] & 0x3f));
break ;
case 14:
// 1110xxxx 10xxxxxx 10xxxxxx
int t = (b[off++] & 0x3f) << 6;
cs[i++] = ( char )(((c & 0x0f) << 12) | t | (b[off++] & 0x3f));
break ;
default :
// 10xxxxxx, 1111xxxx
throw new IllegalArgumentException();
return new String(cs, 0, count);
* Creates a new <code> ZipEntry </code> object for the specified
* entry name.
* @param name the ZIP file entry name
* @return the ZipEntry just created
protected ZipEntry createZipEntry(String name) {
return new ZipEntry(name);
* Reads end of deflated entry as well as EXT descriptor if present.
private void readEnd(ZipEntry e) throws IOException {
int n = inf .getRemaining();
if (n > 0) {
((PushbackInputStream) in ).unread( buf , len - n, n);
if ((e. flag & 8) == 8) {
/* EXT descriptor present */
readFully( tmpbuf , 0, EXTHDR );
long sig = get32 ( tmpbuf , 0);
if (sig != EXTSIG ) { // no EXTSIG present
e. crc = sig;
e. csize = get32 ( tmpbuf , EXTSIZ - EXTCRC );
e. size = get32 ( tmpbuf , EXTLEN - EXTCRC );
((PushbackInputStream) in ).unread(
tmpbuf , EXTHDR - EXTCRC - 1, EXTCRC );
} else {
e. crc = get32 ( tmpbuf , EXTCRC );
e. csize = get32 ( tmpbuf , EXTSIZ );
e. size = get32 ( tmpbuf , EXTLEN );
if (e. size != inf .getBytesWritten()) {
throw new ZipException(
"invalid entry size (expected " + e. size +
" but got " + inf .getBytesWritten() + " bytes)" );
if (e. csize != inf .getBytesRead()) {
throw new ZipException(
"invalid entry compressed size (expected " + e. csize +
" but got " + inf .getBytesRead() + " bytes)" );
if (e. crc != crc .getValue()) {
throw new ZipException(
"invalid entry CRC (expected 0x" + Long.toHexString (e. crc ) +
" but got 0x" + Long.toHexString ( crc .getValue()) + ")" );
* Reads bytes, blocking until all bytes are read.
private void readFully( byte [] b, int off, int len) throws IOException {
while (len > 0) {
int n = in .read(b, off, len);
if (n == -1) {
throw new EOFException();
off += n;
len -= n;
* Fetches unsigned 16-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
private static final int get16( byte b[], int off) {
return (b[off] & 0xff) | ((b[off+1] & 0xff) << 8);
* Fetches unsigned 32-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
private static final long get32( byte b[], int off) {
return get16 (b, off) | (( long )get16 (b, off+2) << 16);
package jcss.search.base.zip;
* @author 张华
* @time 2007 - 8 - 3
* @description
public class ZipEntry extends org.apache.tools.zip.ZipEntry {
String name ; // entry name
long time = -1; // modification time (in DOS time)
long crc = -1; // crc-32 of entry data
long size = -1; // uncompressed size of entry data
long csize = -1; // compressed size of entry data
int method = -1; // compression method
byte [] extra ; // optional extra field data for entry
String comment ; // optional comment string for entry
// The following flags are used only by Zip{Input,Output}Stream
int flag ; // bit flags
int version ; // version needed to extract
long offset ; // offset of loc header
* Compression method for uncompressed entries.
public static final int STORED = 0;
* Compression method for compressed (deflated) entries.
public static final int DEFLATED = 8;
// 下面这句一定要注释掉
// static {
// /* Zip library is loaded from System.initializeSystemClass */
// initIDs();
// }
// private static native void initIDs();
public ZipEntry(String name){
super (name);