选择压缩算法的经历 (by quqi99)
作者:张华 发表于:2007-08-03 ( http://blog.csdn.net/quqi99 )
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明。
最近,由于蜘蛛下载下来的文件需要压缩,本人趁此机会学习了解了一系列的压缩算法。一要考虑压缩比,二要比较速度,三要考虑追加,删除,查询(即在不解压的情况下知道压缩包里压缩的是什么东西,能方便的提取出元数据信息)是否方便。
刚开始,主要是比较 zlib, bzip2, gzip, rar, zip 几种算法的压缩性能比。
对于 rar 格式,由于主要是调用命令进行解压缩(代码见附件一)。它是跑在虚拟机之外的,一旦出现了错误,可能整个虚拟机因此会死掉,所以这种方法不予考虑。
网上说 bzip2 对于文本压缩的效率算是最高的, ant.jar 包的 org.apache.tools.bzip2 提供了相应的 API 。但是使用时总不顺利,也就没花时间继续了。相关代码见附件二(未测试成功)。
于是开始学习 zlib. 它的 JAVA 版本叫 jzlib, 用 jzlib 进行解压缩的代码见附件三。觉得这个还不错。于是,准备用它,但是压缩一个文件还行,但用 java.util.zip 包那样压缩目录确挺不方便的。现在才开始恍然大悟。哦,原来这些压缩算法一般只注重算法本身,至于怎么用着方便如支持按条目压缩则是外围应用要管的事情。
于是,开始考虑怎么吸收 java.util.zip 包里的思想在 zlib 算法的基础上包装能按目录压缩。搞到最后,发现原来 java.util.zip 包的底层用的压缩算法就是用的是 zlib. SUN 公司只不过是在核心算法的基础上加上了一些如校验( CRC32, Adler32 )及按目录压缩( ZipEntry )以及方便访问的输入输出流( ZipInputStream , ZipOutputStream )。
既然 java.util.zip 包里用的就是 zlib ,我们就不需要再考虑怎样按目录进行压缩了,但事情进展也并不是一帆风顺。
首先,直接用 java.util.zip 的 API 编出的解压缩不能支持中文文件名,因为 java 对于文字的编码是以 unicode 为基础,因此,若是以 ZipInputStream 及 ZipOutputStream 来处理压缩及解压缩的工作,碰到中文档名或路径,它就不处理。仔细查看了 ZipInputStream 的 API ,发现问题就出现在 java.uti.zip.ZipInputStream 类中的这一句: ZipEntry e = createZipEntry(getUTF8String(b, 0, len)); 它应该被改成:
ZipEntry e=null;
try
{
if (this.encoding.toUpperCase().equals("UTF-8"))
e=createZipEntry(getUTF8String(b, 0, len));
else
e=createZipEntry(new String(b,0,len,this.encoding));
}
catch(Exception byteE)
{
e=createZipEntry(getUTF8String(b, 0, len));
}
幸好,在网上一搜,发现这个改动不需要由我们自己来做,因为 ant 的 org.apache.tools.zip 包中已经为我们改好了。用这个包编写的解压缩代码见附件四 .
接着又发现了问题。解压文件时有两种方式,一是采用 ZipFile, 二是采用 ZipOutputStream 。 ZipFile 一次性将 zip 文件全部读到内存中去,对于大 zip 就不行了,这时得采用 ZipOutputStream 方式,但是 org.apache.tools.zip 包对 ZipOutputStream 类恰好没进行改定,只仅仅提供了改写后的 ZipFile 。当你用 java.util.zip.ZipOutputStream 时同样对于中文文件名的文件不能进行压缩。
这时候在网上找到了文件《 让 ZipOutputStream 和 ZipInputStream 支持中文》(可在 google 搜)。它的方法是直接改 JDK 的源代码。但是我觉得直接改 JDK 的 JAR 包以后软件部署时比较麻烦,为些,我开始寻找另外的解决办法。
为了不改动 java.util.zip. ZipInputStream, 自己就直接将这个类再重写一遍,首先通过复制粘贴写一个与之内容一模一样的类 jcss.search.base.zip.C ZipInputStream 。然后在这个类中将 ZipEntry e = createZipEntry(getUTF8String(b, 0, len)) 改写成上述的代码。此类见附件五。
另外,将复制出与 java.uti.zip.ZipConstants 内容一模一样的类 jcss.search.base.zip. ZipConstants
另外,再实现一个 jcss.search.base.zip.ZipEntry 类,代码见附件六 .
至此, OK 。
若想进一步提高压缩比的话,可以采用 7zip, 并且目前也有专门版本的 7zip SDK (实现了 LZMA 压缩算法 . 另外,也有热心人士为方便访问在此基础上增加了两件输入输出流类( net.contrapunctus.lzma.LzmaInputStream 与 net.contrapunctus.lzma.LzmaOutputStream ) ,但是没有包装按目录进行压缩相关的条目类。
附件一:
package jcss.search.base;
/**
* @author 张华
* @time 2007 - 8 - 1
* @description
*/
public class RarUtil {
/**
* 解压
*
* @param compress
* rar 压缩文件
* @param decompression
* 解压路径
*/
public void unZip(String compress, String decompression) throws Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime ();
Process p = rt.exec( "C://Program Files//WinRAR//UNRAR.EXE x -o+ -p- " + compress + " " + decompression);
StringBuffer sb = new StringBuffer();
java.io.InputStream fis = p.getInputStream();
int value = 0;
while ((value = fis.read()) != -1)
{
sb.append(( char ) value);
}
fis.close();
String result = new String(sb.toString().getBytes( "ISO-8859-1" ), "GBK" );
System. out .println(result);
}
/**
*
* @param outputRar 输出目录
* @param compression 要压缩的文件或目录
* @throws Exception
*/
public void zip(String outputRar, String compression) throws Exception {
java.lang.Runtime rt = java.lang.Runtime.getRuntime ();
//rar.exe x -t -o+ -p- E:/2.rar E:/
Process p = rt.exec( "C://Program Files//WinRAR//rar.exe x -t -o+ -p- " + outputRar + " " + compression);
StringBuffer sb = new StringBuffer();
java.io.InputStream fis = p.getInputStream();
int value = 0;
while ((value = fis.read()) != -1)
{
sb.append(( char ) value);
}
fis.close();
String result = new String(sb.toString().getBytes( "ISO-8859-1" ), "GBK" );
System. out .println(result);
}
/**
* @param args
*/
public static void main(String[] args) {
RarUtil test = new RarUtil();
String compress = "f:/ 增加转码过滤器 .rar" ; // rar 压缩文件
String decompression = "f:/test/" ; // 解压路径
try {
test.zip( "f:/test.rar" , " 说明 .txt" );
//test.unZip(compress, decompression);
} catch (Exception e) {
e.printStackTrace();
}
}
}
附件二:
package jcss.search.base;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.tools.bzip2.CBZip2InputStream;
import org.apache.tools.bzip2.CBZip2OutputStream;
/**
* @author 张华
* @time 2007-7-26
* @description BZip2 压缩,解压算法
*/
public class BZip2Util {
public static void Bzip2Compress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
CBZip2OutputStream output = new CBZip2OutputStream(
new FileOutputStream(destination));
final FileInputStream input = new FileInputStream(source);
copy(input, output);
input.close();
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void Bzip2Uncompress(String in, String to) {
try {
File source = new File(in);
File destination = new File(to);
FileOutputStream output =new FileOutputStream(destination);
CBZip2InputStream input = new CBZip2InputStream( new FileInputStream(source));
copy( input, output );
input.close();
output.close();
} catch (Exception e) {
e.printStackTrace();
}
}
static void copy(final InputStream input, final OutputStream output)
throws IOException {
final byte[] buffer = new byte[8024];
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
}
}
/**
* @param args
*/
public static void main(String[] args) {
BZip2Util test = new BZip2Util();
String in = "f://~HlIndex.htm";
String to = "f://a.bz2";
String out2 = "b.htm";
//test.Bzip2Compress(in, to);
//test.Bzip2Uncompress(to, out2);
}
}
附件三:
package example;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import com.jcraft.jzlib.*;
/** 缺点:不能按目录压缩。
* @author 张华
* @time 2007-7-30
* @description reference http://tianxiagod.spaces.live.com/
* http://blog.csdn.net/kong555/archive/2006/03/28/641855.aspx
*/
public class TestJZlib {
// 压缩的文件长度,压缩、解压时均要用,挺关键。
// 要确保方法 compressfile ()与 uncompressfile ()参数一致
static int resLen = 0;
/**
* 压缩
*
* @param data
* @param type
* 压缩方法为一个整数 -1 为默认压缩比 9 为最高压缩比 0 为不压缩 1 为快速压缩
* @return
*/
public static byte[] compressfile(byte[] data, int type,int len) {
int err;
int comprLen = len;
byte[] compr = new byte[comprLen];
ZStream c_stream = new ZStream();
err = c_stream.deflateInit(type);
CHECK_ERR(c_stream, err, "deflateInit");
c_stream.next_in = data;
c_stream.next_in_index = 0;
c_stream.next_out = compr;
c_stream.next_out_index = 0;
while (c_stream.total_in != data.length
&& c_stream.total_out < comprLen) {
c_stream.avail_in = c_stream.avail_out = 1; // 置初值
err = c_stream.deflate(JZlib.Z_NO_FLUSH);
CHECK_ERR(c_stream, err, "deflate");
}
System.out.println(" 压缩前 --" + c_stream.total_in + " 字节 ");
while (true) {
c_stream.avail_out = 1;
err = c_stream.deflate(JZlib.Z_FINISH);
if (err == JZlib.Z_STREAM_END) {
break;
}
CHECK_ERR(c_stream, err, "deflate");
}
System.out.println(" 压缩后 --" + c_stream.total_out + " 字节 ");
err = c_stream.deflateEnd();
CHECK_ERR(c_stream, err, "deflateEnd");
byte[] zipfile = new byte[(int) c_stream.total_out];
System.arraycopy(compr, 0, zipfile, 0, zipfile.length);
return zipfile;
}
public static byte[] uncompressfile(byte[] data,int len) {
int err;
int uncomprLen = len;
byte[] uncompr = new byte[uncomprLen];
ZStream d_stream = new ZStream();
err = d_stream.inflateInit();
CHECK_ERR(d_stream, err, "inflateInit");
d_stream.next_in = data;
d_stream.next_in_index = 0;
d_stream.next_out = uncompr;
d_stream.next_out_index = 0;
while (d_stream.total_out < uncomprLen
&& d_stream.total_in < uncomprLen) {
d_stream.avail_in = d_stream.avail_out = 1;
err = d_stream.inflate(JZlib.Z_NO_FLUSH);
if (err == JZlib.Z_STREAM_END) {
break;
}
CHECK_ERR(d_stream, err, "inflate");
}
System.out.println(" 解压缩前 --" + d_stream.total_in + " 字节 ");
System.out.println(" 解压缩后 --" + d_stream.total_out + " 字节 ");
err = d_stream.inflateEnd();
CHECK_ERR(d_stream, err, "inflateEnd");
byte[] unzipfile = new byte[(int) d_stream.total_out];
System.arraycopy(uncompr, 0, unzipfile, 0, unzipfile.length);
return unzipfile;
}
static void CHECK_ERR(ZStream z, int err, String msg) {
if (err != JZlib.Z_OK) {
if (z.msg != null) {
System.out.print(z.msg + " ");
}
System.out.println(msg + " error: " + err);
System.exit(1);
}
}
static void zip(File input, File output, int compressFactor) {
if (!input.exists())
return;
if (!output.getParentFile().exists())
output.getParentFile().mkdir();
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
resLen = in.available();
byte[] buff = new byte[resLen];
in.read(buff);
byte[] suBuf = compressfile(buff, compressFactor,resLen);
out.write(suBuf, 0, suBuf.length); // 写压缩文件
in.close();
out.close();
System.out.println(" 压缩完毕! " + input.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
static void unZip(File input, File output) {
if (!input.exists())
return;
if (!output.getParentFile().exists())
output.getParentFile().mkdir();
try {
FileInputStream in = new FileInputStream(input);
FileOutputStream out = new FileOutputStream(output);
byte[] buff = new byte[resLen];
in.read(buff);
byte[] suBuff = uncompressfile(buff,resLen);
out.write(suBuff, 0, suBuff.length); // 写压缩文件
in.close();
out.close();
System.out.println(" 解压完毕! " + input.getAbsolutePath());
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* @param args
*/
public static void main(String[] args) {
TestJZlib test = new TestJZlib();
// 压缩
File input = new File("f:// 搜索引擎原理系统与设计 .pdf");
File output = new File("f://test.bz2");
test.zip(input, output, 9);
// 解压
File output2 = new File("f://test.jpg");
test.unZip(output, output2);
}
}
附件四:
package jcss.search.base;
/*
调用 org.apache.tools.zip 实现压缩。
夜可以使用 java.util.zip 不过如果是中文的话,
解压缩的时候文件名字会是乱码。原因是解压缩软件的编码格式跟
java.util.zip.ZipInputStream 的编码字符集不同
java.util.zip.ZipInputStream 的字符集固定是 UTF-8
注销的部分是解压缩的代码。
*/
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.util.Date;
import java.util.zip.ZipInputStream;
import jcss.search.base.zip.CZipInputStream;
import org.apache.tools.zip.ZipOutputStream;
/*
* @ 作者:张华 @ 日期: 2006-5-14 @ 说明:
*/
public class ZipUtil {
int count = 0;
static final int BUFFER = 2048;
public void zip(String zipFileName, String inputFile) throws Exception {
zip(zipFileName, new File(inputFile));
}
public void zip(String zipFileName, File inputFile) throws Exception {
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(
new String(zipFileName.getBytes("gb2312"))));
System.out.println("zip start");
zip(out, inputFile, "");
System.out.println("zip done");
out.close();
}
public void zip(ZipOutputStream out, File f, String base) throws Exception {
System.out.println("Zipping " + f.getName());
Date beginDate = new Date();
if (f.isDirectory()) {
File[] fl = f.listFiles();
// out.putNextEntry(new ZipEntry(base + "/"));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base + "/"));
base = base.length() == 0 ? "" : base + "/";
for (int i = 0; i < fl.length; i++) {
zip(out, fl[i], base + fl[i].getName());
System.out.println(fl[i].getName());
// System.out.println(new
// String(fl[i].getName().getBytes("gb2312")));
}
} else {
// out.putNextEntry(new ZipEntry(base));
out.putNextEntry(new org.apache.tools.zip.ZipEntry(base));
System.out.println(base);
FileInputStream in = new FileInputStream(f);
int b;
while ((b = in.read()) != -1)
out.write(b);
in.close();
}
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println(" 共用时间: " + temp);
}
private void createDirectory(String directory, String subDirectory) {
String dir[];
File fl = new File(directory);
try {
if (subDirectory == "" && fl.exists() != true)
fl.mkdir();
else if (subDirectory != "") {
dir = subDirectory.replace('//', '/').split("/");
for (int i = 0; i < dir.length; i++) {
File subFile = new File(directory + File.separator + dir[i]);
if (subFile.exists() == false)
subFile.mkdir();
directory += File.separator + dir[i];
}
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
* 使用 ZipFile 解压缩小 ZIP
* * 类 ZipInputStream 读出 ZIP 文件序列(简单地说就是读出这个 ZIP 文件压缩了多少文件)
* 而类 ZipFile 使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出 ZIP 压缩文件序列。
* ZIPInputStream 和 ZipFile 之间另外一个基本的不同点在于高速缓冲的使用方面。
* 当文件使用 ZipInputStream 和 FileInputStream 流读出的时候, ZIP 条目不使用高速缓冲。
* 然而,如果使用 ZipFile (文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果 ZipFile (文件名)
* 被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在 UNIX 系统下,
* 这是什么作用都没有的,因为使用 ZipFile 打开的所有 ZIP 文件都在内存中存在映射,
* 所以使用 ZipFile 的性能优于 ZipInputStream 。
* 然而,如果同一 ZIP 文件的内容在程序执行期间经常改变,或是重载的话,使用 ZipInputStream 就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
*/
public void unSmallZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
java.util.Enumeration e = zipFile.getEntries();
org.apache.tools.zip.ZipEntry zipEntry = null;
createDirectory(outputDirectory, "");
while (e.hasMoreElements()) {
zipEntry = (org.apache.tools.zip.ZipEntry) e.nextElement();
String name = null;
if (zipEntry.isDirectory()) {
name = zipEntry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
f.mkdir();
System.out.println(" 创建目录: " + outputDirectory
+ File.separator + name);
} else {
String fileName = zipEntry.getName();
fileName = fileName.replace('//', '/');
count++;
System.out.println(" 正在解压第 " + count + " 个文件 : "
+ zipEntry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,
fileName.lastIndexOf("/")));
fileName = fileName.substring(
fileName.lastIndexOf("/") + 1, fileName
.length());
}
File f = new File(outputDirectory + File.separator
+ zipEntry.getName());
f.createNewFile();
InputStream in = zipFile.getInputStream(zipEntry);
FileOutputStream out = new FileOutputStream(f);
byte[] by = new byte[1024];
int c;
while ((c = in.read(by)) != -1) {
out.write(by, 0, c);
}
out.close();
in.close();
}
}
// 删除文件不能在这里删,因为文件正在使用,应在上传那处删
// 解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println(" 正在删除文件: "+ zipFileToDel.getCanonicalPath());
// // 删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = beginDate.getTime() - endDate.getTime();
System.out.println(" 解压共用时间: " + temp);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
* 使用 ZipInputStream 解压大 ZIP( 通过修改 ZipInputStream 类让其支持中文文件名 )
*
* 类 ZipInputStream 读出 ZIP 文件序列(简单地说就是读出这个 ZIP 文件压缩了多少文件)
* 而类 ZipFile 使用内嵌的随机文件访问机制读出其中的文件内容,所以不必顺序的读出 ZIP 压缩文件序列。
* ZIPInputStream 和 ZipFile 之间另外一个基本的不同点在于高速缓冲的使用方面。
* 当文件使用 ZipInputStream 和 FileInputStream 流读出的时候, ZIP 条目不使用高速缓冲。
* 然而,如果使用 ZipFile (文件名)来打开文件,它将使用内嵌的高速缓冲,所以如果 ZipFile (文件名)
* 被重复调用的话,文件只被打开一次。缓冲值在第二次打开进使用。如果你工作在 UNIX 系统下,
* 这是什么作用都没有的,因为使用 ZipFile 打开的所有 ZIP 文件都在内存中存在映射,
* 所以使用 ZipFile 的性能优于 ZipInputStream 。
* 然而,如果同一 ZIP 文件的内容在程序执行期间经常改变,或是重载的话,使用 ZipInputStream 就成为你的首选了。
* @param zipFileName
* @param outputDirectory
* @throws Exception
*/
public void unBigZip(String zipFileName, String outputDirectory)
throws Exception {
try {
Date beginDate = new Date();
//org.apache.tools.zip.ZipFile zipFile = new org.apache.tools.zip.ZipFile(zipFileName);
FileInputStream fis = new FileInputStream(zipFileName);
BufferedOutputStream dest = null;
//CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis));
CZipInputStream zin = new CZipInputStream(new BufferedInputStream(fis),"gb2312");
//org.apache.tools.zip.ZipEntry entry;
//java.util.zip.ZipEntry entry;
jcss.search.base.zip.ZipEntry entry;
while((entry =zin.getNextEntry()) != null) {
String name = null;
if (entry.isDirectory()) {
name = entry.getName();
name = name.substring(0, name.length() - 1);
File f = new File(outputDirectory + File.separator + name);
f.mkdir();
System.out.println(" 创建目录: " + outputDirectory + File.separator + name);
}else{
String fileName = entry.getName();
fileName = fileName.replace('//', '/');
count++;
System.out.println(" 正在解压第 " + count + " 个文件 : " + entry.getName());
if (fileName.indexOf("/") != -1) {
createDirectory(outputDirectory, fileName.substring(0,fileName.lastIndexOf("/")));
fileName = fileName.substring(fileName.lastIndexOf("/") + 1, fileName.length());
}
File f = new File(outputDirectory + File.separator + entry.getName());
f.createNewFile();
// InputStream in = zipFile.getInputStream(zipEntry);
// FileOutputStream out = new FileOutputStream(f);
// byte[] by = new byte[1024];
// int c;
// while ((c = in.read(by)) != -1) {
// out.write(by, 0, c);
// }
// out.close();
// in.close();
int cnt;
byte data[] = new byte[BUFFER];
FileOutputStream fos = new FileOutputStream(f);
dest = new BufferedOutputStream(fos, BUFFER);
while ((cnt = zin.read(data, 0, BUFFER)) != -1) {
dest.write(data, 0, cnt);
}
dest.flush();
dest.close();
}
}
zin.close();
// 删除文件不能在这里删,因为文件正在使用,应在上传那处删
// 解压后,删除压缩文件
// File zipFileToDel = new File(zipFileName);
// zipFileToDel.delete();
// System.out.println(" 正在删除文件: "+ zipFileToDel.getCanonicalPath());
// // 删除解压后的那一层目录
// delALayerDir(zipFileName, outputDirectory);
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println(" 解压共用时间: " + temp);
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
/**
* 删掉一层目录
*
* @param zipFileName
* @param outputDirectory
*/
public void delALayerDir(String zipFileName, String outputDirectory) {
String[] dir = zipFileName.replace('//', '/').split("/");
String fileFullName = dir[dir.length - 1]; // 得到 aa.zip
int pos = -1;
pos = fileFullName.indexOf(".");
String fileName = fileFullName.substring(0, pos); // 得到 aa
String sourceDir = outputDirectory + File.separator + fileName;
try {
copyFile(new File(outputDirectory), new File(sourceDir), new File(
sourceDir));
deleteSourceBaseDir(new File(sourceDir));
} catch (Exception e) {
e.printStackTrace();
}
}
/**
* 将 sourceDir 目录的文件全部 copy 到 destDir 中去
*/
public void copyFile(File destDir, File sourceBaseDir, File sourceDir)
throws Exception {
File[] lists = sourceDir.listFiles();
String line = null;
String url = null;
if (lists == null)
return;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
FileInputStream fis = new FileInputStream(f);
String content = "";
String sourceBasePath = sourceBaseDir.getCanonicalPath();
String destPath = destDir.getCanonicalPath();
String fPath = f.getCanonicalPath();
String drPath = destDir
+ fPath.substring(fPath.indexOf(sourceBasePath)
+ sourceBasePath.length());
FileOutputStream fos = new FileOutputStream(drPath);
byte[] b = new byte[2048];
while (fis.read(b) != -1) {
if (content != null)
content += new String(b);
else
content = new String(b);
b = new byte[2048];
}
content = content.trim();
fis.close();
fos.write(content.getBytes());
fos.flush();
fos.close();
} else {
// 先新建目录
new File(destDir + File.separator + f.getName()).mkdir();
copyFile(destDir, sourceBaseDir, f); // 递归调用
}
}
}
/**
* 将 sourceDir 目录的文件全部 copy 到 destDir 中去
*/
public void deleteSourceBaseDir(File curFile) throws Exception {
File[] lists = curFile.listFiles();
String line = null;
String url = null;
File parentFile = null;
for (int i = 0; i < lists.length; i++) {
File f = lists[i];
if (f.isFile()) {
f.delete();
// 若它的父目录没有文件了,说明已经删完,应该删除父目录
parentFile = f.getParentFile();
if (parentFile.list().length == 0)
parentFile.delete();
} else {
deleteSourceBaseDir(f); // 递归调用
}
}
}
public static void main(String[] args) {
try {
ZipUtil t = new ZipUtil();
// t.zip("e://test.zip", "E://news.sina.com.cn//news.sina.com.cn");
Date beginDate = new Date();
//t.unZip("e://test.zip", "E://news.sina.com.cn");
t.unBigZip("e://test.zip", "E://news.sina.com.cn");
Date endDate = new Date();
long temp = endDate.getTime() - beginDate.getTime();
System.out.println(" 共用时间: " + temp);
} catch (Exception e) {
e.printStackTrace(System.out);
}
}
}
附件五:
/*
* @(#)ZipInputStream.java 1.37 04/06/11
*
* Copyright 2004 Sun Microsystems, Inc. All rights reserved.
* SUN PROPRIETARY/CONFIDENTIAL. Use is subject to license terms.
*/
package jcss.search.base.zip;
import java.io.InputStream;
import java.io.IOException;
import java.io.EOFException;
import java.io.PushbackInputStream;
import java.util.zip.CRC32;
import java.util.zip.Inflater;
import java.util.zip.InflaterInputStream;
import java.util.zip.ZipException;
/**
*
*
* @author David Connelly
* @version 1.37, 06/11/04
*/
public class CZipInputStream extends InflaterInputStream implements ZipConstants {
private String encoding = "UTF-8" ;
private ZipEntry entry ;
private CRC32 crc = new CRC32();
private long remaining ;
private byte [] tmpbuf = new byte [512];
private static final int STORED = ZipEntry. STORED ;
private static final int DEFLATED = ZipEntry. DEFLATED ;
private boolean closed = false ;
// this flag is set to true after EOF has reached for
// one entry
private boolean entryEOF = false ;
/**
* Check to make sure that this stream has not been closed
*/
private void ensureOpen() throws IOException {
if ( closed ) {
throw new IOException( "Stream closed" );
}
}
boolean usesDefaultInflater = false ;
/**
* Creates a new ZIP input stream.
* @param in the actual input stream
*/
public CZipInputStream(InputStream in) {
super ( new PushbackInputStream(in, 512), new Inflater( true ), 512);
usesDefaultInflater = true ;
if (in == null ) {
throw new NullPointerException( "in is null" );
}
}
public CZipInputStream(InputStream in,String encoding) {
super ( new PushbackInputStream(in,512), new Inflater( true ),512);
usesDefaultInflater = true ;
if (in == null ) {
throw new NullPointerException( "in is null" );
}
this . encoding =encoding;
}
/**
* Reads the next ZIP file entry and positions the stream at the
* beginning of the entry data.
* @return the next ZIP file entry, or null if there are no more entries
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
*/
public ZipEntry getNextEntry() throws IOException {
ensureOpen();
if ( entry != null ) {
closeEntry();
}
crc .reset();
inf .reset();
if (( entry = readLOC()) == null ) {
return null ;
}
if ( entry . method == STORED ) {
remaining = entry . size ;
}
entryEOF = false ;
return entry ;
}
/**
* Closes the current ZIP entry and positions the stream for reading the
* next entry.
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
*/
public void closeEntry() throws IOException {
ensureOpen();
while (read( tmpbuf , 0, tmpbuf . length ) != -1) ;
entryEOF = true ;
}
/**
* Returns 0 after EOF has reached for the current entry data,
* otherwise always return 1.
* <p>
* Programs should not count on this method to return the actual number
* of bytes that could be read without blocking.
*
* @return 1 before EOF and 0 after EOF has reached for current entry.
* @exception IOException if an I/O error occurs.
*
*/
public int available() throws IOException {
ensureOpen();
if ( entryEOF ) {
return 0;
} else {
return 1;
}
}
/**
* Reads from the current ZIP entry into an array of bytes. Blocks until
* some input is available.
* @param b the buffer into which the data is read
* @param off the start offset of the data
* @param len the maximum number of bytes read
* @return the actual number of bytes read, or - 1 if the end of the
* entry is reached
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
*/
public int read( byte [] b, int off, int len) throws IOException {
ensureOpen();
if (off < 0 || len < 0 || off > b. length - len) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
if ( entry == null ) {
return -1;
}
switch ( entry . method ) {
case DEFLATED :
len = super .read(b, off, len);
if (len == -1) {
readEnd( entry );
entryEOF = true ;
entry = null ;
} else {
crc .update(b, off, len);
}
return len;
case STORED :
if ( remaining <= 0) {
entryEOF = true ;
entry = null ;
return -1;
}
if (len > remaining ) {
len = ( int ) remaining ;
}
len = in .read(b, off, len);
if (len == -1) {
throw new ZipException( "unexpected EOF" );
}
crc .update(b, off, len);
remaining -= len;
return len;
default :
throw new InternalError( "invalid compression method" );
}
}
/**
* Skips specified number of bytes in the current ZIP entry.
* @param n the number of bytes to skip
* @return the actual number of bytes skipped
* @exception ZipException if a ZIP file error has occurred
* @exception IOException if an I/O error has occurred
* @exception IllegalArgumentException if n < 0
*/
public long skip( long n) throws IOException {
if (n < 0) {
throw new IllegalArgumentException( "negative skip length" );
}
ensureOpen();
int max = ( int )Math.min (n, Integer. MAX_VALUE );
int total = 0;
while (total < max) {
int len = max - total;
if (len > tmpbuf . length ) {
len = tmpbuf . length ;
}
len = read( tmpbuf , 0, len);
if (len == -1) {
entryEOF = true ;
break ;
}
total += len;
}
return total;
}
/**
* Closes this input stream and releases any system resources associated
* with the stream.
* @exception IOException if an I/O error has occurred
*/
public void close() throws IOException {
if (! closed ) {
super .close();
closed = true ;
}
}
private byte [] b = new byte [256];
/*
* Reads local file (LOC) header for next entry.
*/
private ZipEntry readLOC() throws IOException {
try {
readFully( tmpbuf , 0, LOCHDR );
} catch (EOFException e) {
return null ;
}
if (get32 ( tmpbuf , 0) != LOCSIG ) {
return null ;
}
// get the entry name and create the ZipEntry first
int len = get16 ( tmpbuf , LOCNAM );
if (len == 0) {
throw new ZipException( "missing entry name" );
}
int blen = b . length ;
if (len > blen) {
do
blen = blen * 2;
while (len > blen);
b = new byte [blen];
}
readFully( b , 0, len);
//ZipEntry e = createZipEntry(getUTF8String(b, 0, len));
ZipEntry e= null ;
try
{
if ( this . encoding .toUpperCase().equals( "UTF-8" ))
e=createZipEntry(getUTF8String ( b , 0, len));
else
e=createZipEntry( new String( b ,0,len, this . encoding ));
}
catch (Exception byteE)
{
e=createZipEntry(getUTF8String ( b , 0, len));
}
// now get the remaining fields for the entry
e. version = get16 ( tmpbuf , LOCVER );
e. flag = get16 ( tmpbuf , LOCFLG );
if ((e. flag & 1) == 1) {
throw new ZipException( "encrypted ZIP entry not supported" );
}
e. method = get16 ( tmpbuf , LOCHOW );
e. time = get32 ( tmpbuf , LOCTIM );
if ((e. flag & 8) == 8) {
/* EXT descriptor present */
if (e. method != DEFLATED ) {
throw new ZipException(
"only DEFLATED entries can have EXT descriptor" );
}
} else {
e. crc = get32 ( tmpbuf , LOCCRC );
e. csize = get32 ( tmpbuf , LOCSIZ );
e. size = get32 ( tmpbuf , LOCLEN );
}
len = get16 ( tmpbuf , LOCEXT );
if (len > 0) {
byte [] bb = new byte [len];
readFully(bb, 0, len);
e. extra = bb;
}
return e;
}
/*
* Fetches a UTF8-encoded String from the specified byte array.
*/
private static String getUTF8String( byte [] b, int off, int len) {
// First, count the number of characters in the sequence
int count = 0;
int max = off + len;
int i = off;
while (i < max) {
int c = b[i++] & 0xff;
switch (c >> 4) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
count++;
break ;
case 12: case 13:
// 110xxxxx 10xxxxxx
if (( int )(b[i++] & 0xc0) != 0x80) {
throw new IllegalArgumentException();
}
count++;
break ;
case 14:
// 1110xxxx 10xxxxxx 10xxxxxx
if ((( int )(b[i++] & 0xc0) != 0x80) ||
(( int )(b[i++] & 0xc0) != 0x80)) {
throw new IllegalArgumentException();
}
count++;
break ;
default :
// 10xxxxxx, 1111xxxx
throw new IllegalArgumentException();
}
}
if (i != max) {
throw new IllegalArgumentException();
}
// Now decode the characters...
char [] cs = new char [count];
i = 0;
while (off < max) {
int c = b[off++] & 0xff;
switch (c >> 4) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
cs[i++] = ( char )c;
break ;
case 12: case 13:
// 110xxxxx 10xxxxxx
cs[i++] = ( char )(((c & 0x1f) << 6) | (b[off++] & 0x3f));
break ;
case 14:
// 1110xxxx 10xxxxxx 10xxxxxx
int t = (b[off++] & 0x3f) << 6;
cs[i++] = ( char )(((c & 0x0f) << 12) | t | (b[off++] & 0x3f));
break ;
default :
// 10xxxxxx, 1111xxxx
throw new IllegalArgumentException();
}
}
return new String(cs, 0, count);
}
/**
* Creates a new <code> ZipEntry </code> object for the specified
* entry name.
*
* @param name the ZIP file entry name
* @return the ZipEntry just created
*/
protected ZipEntry createZipEntry(String name) {
return new ZipEntry(name);
}
/*
* Reads end of deflated entry as well as EXT descriptor if present.
*/
private void readEnd(ZipEntry e) throws IOException {
int n = inf .getRemaining();
if (n > 0) {
((PushbackInputStream) in ).unread( buf , len - n, n);
}
if ((e. flag & 8) == 8) {
/* EXT descriptor present */
readFully( tmpbuf , 0, EXTHDR );
long sig = get32 ( tmpbuf , 0);
if (sig != EXTSIG ) { // no EXTSIG present
e. crc = sig;
e. csize = get32 ( tmpbuf , EXTSIZ - EXTCRC );
e. size = get32 ( tmpbuf , EXTLEN - EXTCRC );
((PushbackInputStream) in ).unread(
tmpbuf , EXTHDR - EXTCRC - 1, EXTCRC );
} else {
e. crc = get32 ( tmpbuf , EXTCRC );
e. csize = get32 ( tmpbuf , EXTSIZ );
e. size = get32 ( tmpbuf , EXTLEN );
}
}
if (e. size != inf .getBytesWritten()) {
throw new ZipException(
"invalid entry size (expected " + e. size +
" but got " + inf .getBytesWritten() + " bytes)" );
}
if (e. csize != inf .getBytesRead()) {
throw new ZipException(
"invalid entry compressed size (expected " + e. csize +
" but got " + inf .getBytesRead() + " bytes)" );
}
if (e. crc != crc .getValue()) {
throw new ZipException(
"invalid entry CRC (expected 0x" + Long.toHexString (e. crc ) +
" but got 0x" + Long.toHexString ( crc .getValue()) + ")" );
}
}
/*
* Reads bytes, blocking until all bytes are read.
*/
private void readFully( byte [] b, int off, int len) throws IOException {
while (len > 0) {
int n = in .read(b, off, len);
if (n == -1) {
throw new EOFException();
}
off += n;
len -= n;
}
}
/*
* Fetches unsigned 16-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
private static final int get16( byte b[], int off) {
return (b[off] & 0xff) | ((b[off+1] & 0xff) << 8);
}
/*
* Fetches unsigned 32-bit value from byte array at specified offset.
* The bytes are assumed to be in Intel (little-endian) byte order.
*/
private static final long get32( byte b[], int off) {
return get16 (b, off) | (( long )get16 (b, off+2) << 16);
}
}
附件六:
package jcss.search.base.zip;
/**
* @author 张华
* @time 2007 - 8 - 3
* @description
**/
public class ZipEntry extends org.apache.tools.zip.ZipEntry {
String name ; // entry name
long time = -1; // modification time (in DOS time)
long crc = -1; // crc-32 of entry data
long size = -1; // uncompressed size of entry data
long csize = -1; // compressed size of entry data
int method = -1; // compression method
byte [] extra ; // optional extra field data for entry
String comment ; // optional comment string for entry
// The following flags are used only by Zip{Input,Output}Stream
int flag ; // bit flags
int version ; // version needed to extract
long offset ; // offset of loc header
/**
* Compression method for uncompressed entries.
*/
public static final int STORED = 0;
/**
* Compression method for compressed (deflated) entries.
*/
public static final int DEFLATED = 8;
// 下面这句一定要注释掉
// static {
// /* Zip library is loaded from System.initializeSystemClass */
// initIDs();
// }
// private static native void initIDs();
public ZipEntry(String name){
super (name);
}
}