哈夫曼编码压缩和解压文件的Java实现

哈夫曼编码压缩和解压文件的Java实现

上一次已经介绍了如何用Huffman树实现文件的压缩及其原理,那么今天我们试着真正运用它实现文件的压缩与解压

前戏

我们先来整理一下思路

首先我们拿到一个文件,看起来是一串串字符/音频/图片/视频,实际上它是一堆01串。

我们对这个01串所表示的字符统计词频,用新的01串来表示它,可能原来每个字符要用到8个bit,现在有些字符出现频率高的字符只要一两个bit,而有些出现频率少的字符在新的表示中对应了几十个bit,这也就是Huffman编码的过程,这里我们得到了两个东西,一个是字符与01串的对应关系,我们把它用键值对的形式存储在Hashmap中,另一个是用Huffman编码后的文件(一长串01)。这些都是上次博客中已经实现的部分。

现在考虑把上面得到的两个东西写入一个.zip文件,然后再把这个.zip文件转换成一个正常文件实现解压,本文中以txt为例。

由于这篇博客是用Java实现的,而Java不能直接对bit进行操作(事实上任意一种高级编程语言都不能直接操作bit),Java操作的最小单位是byte,但是我们上一段得到的那个代表压缩后数据的01串如果不能被8整除的话,是无法直接转成byte数组的(一个byte由8个bit组成,也就是8个01)。所以这里把这串01串后面用0补齐,补到8的倍数个,并且记这个数字为 n u m b e r _ o f _ z e r o number\_of\_zero number_of_zero。由于之后我们从.zip文件里面读取信息的时候要区分代表键值对的部分和代表原文件数据的部分,所以我们这里引进一个int型变量 h e a d _ l e n g t h head\_length head_length,用来表示键值对部分的长度。为了方便,把这四个东西打包成一个ZipResult对象。

得到这些以后,我们把它们写入.zip文件,接下来从.zip中读取数据,并解码成一个根原文件一样的文件。

读取到了ZipResult对象中的四个东西,再根据上述描述,反过来操作,可以把它们整理成两个部分,一个Hashmap表示解码的码表,一个01串表示原文件数据,再用码=码表反解码数据得到解压后的文件。

再简单提一提Java的文件读写

Java的文件读写是java.io中的FileInputStream和FileOutputStream实现的。通过创建流,提供了读取、写入、刷新、关闭等方法其中读取是读取到的字节数组,写入也是写入的字节数组。

正文

要用到的类

import java.util.*;
import java.io.*;

//创建ZipResult类
class ZipResult{
    int head_length;
    int number_of_zero;
    byte[] head;
    byte[] data;
    public ZipResult(int head_length, int number_of_zero, byte[] head, byte[] data){
        this.head_length = head_length;
        this.number_of_zero = number_of_zero;
        this.head = head;
        this.data = data;
    }
}

//为了让Huffman节点之间可比大小,需要实现compareTo方法
class HTreeNode implements Comparable<HTreeNode>{
    Integer count;
    char c;
    String code = "";
    HTreeNode left;
    HTreeNode right;
    HTreeNode parent;
    public HTreeNode(int count){
        this.count = count;
    }
    public HTreeNode(int count,char c){
        this.count = count;
        this.c = c;
    }
    @Override
    public int compareTo(HTreeNode o) {
        return this.count.compareTo(o.count);
    }
}

Huffman树节点类的主要方法

统计词频

输入一个字符串,这个方法可以统计每一个字符的频率,返回词频表

private static Map<Character,Integer> transferString(String s){
        char[] cl = new char[s.length()];
        int[] nl = new int[s.length()];
        ArrayList<Character> list = new ArrayList<>();
        char[] chars = s.toCharArray();//将字符串转化成字符数组
        for (int i = 0; i < chars.length; i++) {
            char aChar = chars[i];
            list.add(aChar);//将字符数组元素添加到集合中
        }
        for (int i = 0; i < list.size(); i++) {//遍历集合取出每个字符
            int count = 0;//定义计数器
            Character character = list.get(i);
            for (int j = 0; j < chars.length; j++) {//遍历数组取出每个字符和集合中的元素比较
                char aChar = chars[j];
                if (character.equals(aChar)){//如果集合中的元素有等于数组中的字符,计数器加1
                    count++;
                }
            }
            cl[i] = character;
            nl[i] = count;
        }
        Character[] ncl = new Character[cl.length];
        for(int i=0;i<cl.length;i++){
            ncl[i] = (Character)cl[i];
        }
        Map<Character,Integer> map = new HashMap<Character,Integer>();
        for(int i=0;i<ncl.length;i++){
            map.put(ncl[i],nl[i]);
        }
        return map;

    }

用优先队列构建Huffman树

根据上面那个词频表,创建一个优先队列,并且迭代更新

private static Queue<HTreeNode> init(Map<Character,Integer> map){
    Set<Character> s = map.keySet();
    Character[] cl = new Character[s.size()];
    cl =  s.toArray(cl);
    Queue<HTreeNode> q = new PriorityQueue<HTreeNode>();
    for(int i=0;i<map.size();i++){
        HTreeNode h = new HTreeNode((int)map.get(cl[i]),(char)cl[i]);
        q.add(h);
    }
    return q;
}

private static void update(Queue<HTreeNode> q){
  
        HTreeNode a = q.poll();
        HTreeNode b = q.poll();
        HTreeNode c = new HTreeNode(a.count+b.count);
        c.left  = a;
        c.right = b;
        a.parent = c;
        b.parent = c;
        q.add(c);

 }
private static void countTreeCode(HTreeNode t){

        if(t.left!=null){
            t.left.code += t.code+"0";
            countTreeCode(t.left);
        }

        if(t.right!=null){
            t.right.code += t.code+"1";
            countTreeCode(t.right);
        }
}

Huffman编码

从Huffman树得到码表,然后对原文件中的数据进行压缩,得到01串,再把这个01串转码成字节数组的格式,好写进.zip文件。还有我们的map部分,要转化成字符串储存到.zip文件里。

private static Map<Character,String> coding(HTreeNode t){
    Map<Character,String> map = new HashMap<Character,String>();
    if(t.c!='\0'){
        if(t.c=='\t'){
            System.out.println("\\t"+" 的出现频率为"+t.count+" Huffman编码为:"+t.code);
            map.put(t.c, t.code);
        }else if(t.c=='\r'){
            System.out.println("\\r"+" 的出现频率为"+t.count+" Huffman编码为:"+t.code);
            map.put(t.c, t.code);
        }else if(t.c=='\n'){
            System.out.println("\\n"+" 的出现频率为"+t.count+" Huffman编码为:"+t.code);
            map.put(t.c, t.code);
        }else {
            System.out.println(t.c + " 的出现频率为" + t.count + " Huffman编码为:" + t.code);
            map.put(t.c, t.code);
        }
    }
    if (t.left!=null){
        map.putAll(coding(t.left));
    }

    if (t.right!=null){
        map.putAll(coding(t.right));
    }
    return map;
}

private static String String2HFMcode(String s,Map<Character,String> map){
        char[] cl = s.toCharArray();
        String[] vl = new String[cl.length];
        for(int i=0;i<cl.length;i++){
            try {
                vl[i] = map.get(cl[i]);
            }catch (NullPointerException e){
                e.printStackTrace();
            }
        }
        for(int i=1;i<vl.length;i++){
            vl[0] += vl[i];
        }
        return vl[0];
}

private static byte[] binaryStringToBytes(String str) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        int curByte = 0;
        int i, bc = 0;
        for (i = 0; i < str.length(); i++) {
            int bit;
            char charAt = str.charAt(i);
            if (charAt == '1')
                bit = 1;
            else if (charAt == '0')
                bit = 0;
            else
                continue;
            curByte |= bit << (7 - bc % 8);
            if (bc % 8 == 7) {
                baos.write(curByte);
                curByte = 0;
            }
            bc++;
        }
        if (bc % 8 != 0)
            baos.write(curByte);
        try {
            baos.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return baos.toByteArray();
}

private static String Map2mapString(Map<Character,String> map){
        Character[] ca = new Character[map.size()];
        ca =  map.keySet().toArray(ca);
        String[] sa = new String[map.size()];
        for(int i=0;i<ca.length;i++){
            try {
              	//注意!下面这里等号的后面有一个空格
                sa[i] = ca[i] + "= " + map.get(ca[i]);
            }catch (NullPointerException e){
                e.printStackTrace();
            }
        }
        for(int i=1;i<sa.length;i++){
          	//这里为啥用“,@,”
            sa[0] += ",@,"+sa[i];
        }
        return sa[0];
 }

回答上面代码中的问题: 其实一开始我用的是一个逗号做的分隔,但是后来发现,如果只用逗号的话,会出现这样的情况“,=10110”,这种就是表示在原文件里读到了逗号,并且它的Huffman编码是10110,如果这样,用我下面那种解压的方法就会出问题,我下面那种方法是取split以后的数组的第一项和第二项作为键和值,如果按这种方法,遇到了这种问题,就会出现上述第一个逗号被split吞掉,返回一个键和值均为空字符串的键值对,所以就会报错,而第二个逗号split以后会变成一个键为空,值为10110的键值对,一样会报错。所以这里采用“,@,”,因为不会出现“,@,”作为键的情况。上面一个地方那里用的是”“= ”而不是“=”,原因是一样的,否则当我们压缩的文件里面有等号这个字符串的话就会报错。

文件压缩

文件压缩的过程

//压缩的过程
public static ZipResult zip(String s){
        Map<Character,Integer> char_intmap = transferString(s);
        Queue<HTreeNode> q = init(char_intmap);
        while (q.size()>1){
            update(q);
        }
        HTreeNode t = q.poll();
        countTreeCode(t);
        //创建码表
        Map<Character,String> map = coding(t);
        //压缩文件
        String HFMcode = String2HFMcode(s, map);
        int number_of_zero = 0;
        if(HFMcode.length()%8!=0){
            number_of_zero = 8 - HFMcode.length()%8;
            HFMcode = HFMcode+"0".repeat(number_of_zero);
        }

        byte[] head = Map2mapString(map).getBytes();
        byte[] data = binaryStringToBytes(HFMcode);
        int head_length = head.length;
        return new ZipResult(head_length,number_of_zero,head,data);
}

解压预备

从读取到的byte数组中,得到01串。根据读取到的字符串,得到解码的码表,根据这两个东西把文件解压

private static String Bytes2binaryString(byte[] bl){
    ByteArrayInputStream bais = new ByteArrayInputStream(bl);
    String [] sa = new String[bl.length];
    for(int i=0;i<bl.length;i++){
        String s = Integer.toBinaryString(bais.read());
        int delta = 8-s.length();
        s = "0".repeat(delta) + s;
        sa[i] = s;
        if(i!=0) {
            sa[0] += sa[i];
        }
    }
    return sa[0];
}

public static Map<String,Character> mapStringToMap(String str){
  	//呐,这里的“,@,”对应上面的
    String[] strs=str.split(",@,");
    Map<String,Character> map = new HashMap<String, Character>();

    for (String string:strs){
      //这里的“= ”对应上面的
        String key   = string.split("= ")[1];
        Character value = string.split("= ")[0].charAt(0);
        map.put(key,value);
    }
    return map;
}

文件解压

public static byte[] unzip(ZipResult zipresult){

    byte[] head = zipresult.head;
    byte[] data = zipresult.data;
    int number_of_zero = zipresult.number_of_zero;


    //取到解码表
    String mapstring = new String(head);
    Map<String, Character> enmap = mapStringToMap(mapstring);


    //取到01串
    String HFMcode_with_zero = Bytes2binaryString(data);
    //System.out.println(HFMcode_with_zero);
    //System.out.println(number_of_zero);
    String HFMcode = HFMcode_with_zero.substring(0,HFMcode_with_zero.length()-number_of_zero);
    String str = "";
    String content ="";
    for(int i=0;i<HFMcode.length();i++){
        str += HFMcode.charAt(i);
        if(enmap.containsKey(str)){
            content += enmap.get(str);
            str = "";
        }
    }

    byte[] result = content.getBytes();
    return result;

}

文件层面的操作

读取正常文件

public static String ReadFile(String path){

    try {
        //读取指定路径的文件
        FileInputStream ips = new FileInputStream(path);
        //把文件写进字节数组
        byte[] buffer = ips.readAllBytes();
        String content = new String(buffer);
        ips.close();

        return content;

    } catch (FileNotFoundException e) {
        e.printStackTrace();
        System.out.println("指定路径不存在");
    } catch (IOException e) {
        e.printStackTrace();
        System.out.println("读取文件失败");
    }

    return null;

}

创建压缩文件

public static void CreateZipFile(ZipResult zipresult,String path){
    try {
        FileOutputStream ops = new FileOutputStream(path);
        int head_length = zipresult.head_length;
        int number_of_zero = zipresult.number_of_zero;
        byte[] head = zipresult.head;
        byte[] data = zipresult.data;
        ops.write((Integer.toString(head_length)+"\r\n").getBytes());
        ops.flush();
        ops.write((Integer.toString(number_of_zero)+"\r\n").getBytes());
        ops.flush();
        ops.write(head);
        ops.flush();
        ops.write(data);
        ops.flush();
        ops.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
        System.out.println("创建路径失败");
    } catch (IOException e) {
        e.printStackTrace();
        System.out.println("写入失败");
    }

}

读取压缩文件

public static ZipResult ReadZipFile(String path) {
        if (path.contains(".myzip")) {

            try {
                //读取指定路径的文件
                FileInputStream ips = new FileInputStream(path);
                //把文件写进字节数组
                byte[] whole = ips.readAllBytes();
                String content = new String(whole);
                String[] sa = content.split("\r\n");
                int head_length = Integer.parseInt(sa[0]);
                int number_of_zero = Integer.parseInt(sa[1]);
                String s = sa[0] + "\r\n" + sa[1] + "\r\n";

                byte[] head = Arrays.copyOfRange(whole, s.length(), s.length() + head_length);
              	//!!!!!!这里一定要注意
                byte[] data = Arrays.copyOfRange(whole, s.length() + head_length, whole.length);
                ips.close();

                return new ZipResult(head_length, number_of_zero, head, data);

            } catch (FileNotFoundException e) {
                e.printStackTrace();
                System.out.println("指定路径不存在");
            } catch (IOException e) {
                e.printStackTrace();
                System.out.println("读取文件失败");
            }

            return null;

        }else {
            System.out.println("只有后缀名为.myzip的文件才可以被解压");
            return null;
        }
 }

ps:这里没用.zip是因为这个是标准的压缩文件后缀名,咱们自己弄着玩玩嘛,就取一个后缀名为.myzip就好了

上面代码中那个要注意的地方,是数组切片生成data的最后的那个参数,应该是 w h o l e . l e n g t h whole.length whole.length而不应该是 c o n t e n t . l e n g t h content.length content.length()。按道理来说,两者应该是一样的,但是其实不一样。事实上在文件文本全是英文的情况下是一样的,因为一个英文对应一个字符,但是当你的文件中出现中文时,你要知道,一个中午呢是占两个字符的,所以字节数组读出来的前者,要比后者的长度大,其差值就是中文字符的个数

创建正常文件

public static void CreateFile(byte[] buffer,String path){
    try {
        FileOutputStream ops = new FileOutputStream(path);
        ops.write(buffer);
        ops.flush();
        ops.close();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
        System.out.println("创建路径失败");
    } catch (IOException e) {
        e.printStackTrace();
        System.out.println("写入失败");
    }

}

三个打包的方法

//压缩
private static void ZIP(String inputpath,String outputpath){
    String content = ReadFile(inputpath);
    ZipResult zipresult = HFMtree.zip(content);
    CreateZipFile(zipresult,outputpath);
}
//解压
private static void UNZIP(String inputpath,String outputpath){
    ZipResult zipresult = ReadZipFile(inputpath);
    byte[] buffer = HFMtree.unzip(zipresult);
    CreateFile(buffer,outputpath);
}

public static void main(String[] args){
    ZIP("D:\\javacode\\src\\数据结构\\input.txt","D:\\javacode\\src\\数据结构\\zip.myzip");
    UNZIP("D:\\javacode\\src\\数据结构\\zip.myzip","D:\\javacode\\src\\数据结构\\output.txt");
}

一个容易遇到的问题

写Huffman压缩文件时最容易遇到的问题就是,有的时候你会惊人地发现,经过Huffman压缩后的压缩文件居然要比原文件还大。

这个大概率是因为你的data部分是直接用01字符串存储的(直接把01字符串 g e t B y t e s ( ) getBytes() getBytes()),或者你是用的 A r r a y s . t o S t r i n g ( ) Arrays.toString() Arrays.toString()方法,把字节数组转化成字符串存储( A r r a y s . t o S t r i n g ( ) Arrays.toString() Arrays.toString()以后再 g e t B y t e s ( ) getBytes() getBytes()写入)。这种情况,你打开你的.zip,就会发现你的这个文件几乎没有乱码,全都是你看得懂的,根据你以前玩电脑的经验,这里边一定有问题。其实是你用字符串形式存储这些的话,所占内存并没有怎么减少,而且 A r r a y s . t o S t r i n g ( ) Arrays.toString() Arrays.toString()这个方法还会帮你生成很多不必要的空格,这都会使得你的.zip文件很大。正确的方法是像上面那样直接用byte写入。这样你得到的.zip文件你直接用UTF-8打开应该data部分全是乱码(但是head部分可以不是,我这里也是用字符串形式存储的,方便调试,事实上这个也可以直接byte写进去)

哈夫曼编码压缩和解压文件的Java实现_第1张图片

Java源码

package 数据结构;

import java.util.*;
import java.io.*;

class ZipResult{
    int head_length;
    int number_of_zero;
    byte[] head;
    byte[] data;
    public ZipResult(int head_length, int number_of_zero, byte[] head, byte[] data){
        this.head_length = head_length;
        this.number_of_zero = number_of_zero;
        this.head = head;
        this.data = data;
    }
}

class HTreeNode implements Comparable<HTreeNode>{
    Integer count;
    char c;
    String code = "";

    HTreeNode left;
    HTreeNode right;
    HTreeNode parent;

    public HTreeNode(int count){
        this.count = count;
    }
    public HTreeNode(int count,char c){
        this.count = count;
        this.c = c;
    }

    @Override
    public int compareTo(HTreeNode o) {
        return this.count.compareTo(o.count);
    }


}

class HFMtree {

    private static Map<Character,Integer> transferString(String s){
        char[] cl = new char[s.length()];
        int[] nl = new int[s.length()];
        ArrayList<Character> list = new ArrayList<>();
        char[] chars = s.toCharArray();//将字符串转化成字符数组
        for (int i = 0; i < chars.length; i++) {
            char aChar = chars[i];
            list.add(aChar);//将字符数组元素添加到集合中
        }
        for (int i = 0; i < list.size(); i++) {//遍历集合取出每个字符
            int count = 0;//定义计数器
            Character character = list.get(i);
            for (int j = 0; j < chars.length; j++) {//遍历数组取出每个字符和集合中的元素比较
                char aChar = chars[j];
                if (character.equals(aChar)){//如果集合中的元素有等于数组中的字符,计数器加1
                    count++;
                }
            }
            cl[i] = character;
            nl[i] = count;
        }
        Character[] ncl = new Character[cl.length];
        for(int i=0;i<cl.length;i++){
            ncl[i] = (Character)cl[i];
        }
        Map<Character,Integer> map = new HashMap<Character,Integer>();
        for(int i=0;i<ncl.length;i++){
            map.put(ncl[i],nl[i]);
        }
        return map;

    }

    private static Queue<HTreeNode> init(Map<Character,Integer> map){
        Set<Character> s = map.keySet();
        Character[] cl = new Character[s.size()];
        cl =  s.toArray(cl);
        Queue<HTreeNode> q = new PriorityQueue<HTreeNode>();
        for(int i=0;i<map.size();i++){
            HTreeNode h = new HTreeNode((int)map.get(cl[i]),(char)cl[i]);
            q.add(h);
        }
        return q;
    }

    private static void update(Queue<HTreeNode> q){
        HTreeNode a = q.poll();
        HTreeNode b = q.poll();
        HTreeNode c = new HTreeNode(a.count+b.count);
        c.left  = a;
        c.right = b;
        a.parent = c;
        b.parent = c;
        q.add(c);

    }

    private static void countTreeCode(HTreeNode t){

        if(t.left!=null){
            t.left.code += t.code+"0";
            countTreeCode(t.left);
        }

        if(t.right!=null){
            t.right.code += t.code+"1";
            countTreeCode(t.right);
        }


    }

    private static Map<Character,String> coding(HTreeNode t){

        Map<Character,String> map = new HashMap<Character,String>();
        if(t.c!='\0'){
            if(t.c=='\t'){
                System.out.println("\\t"+" 的出现频率为"+t.count+" Huffman编码为:"+t.code);
                map.put(t.c, t.code);
            }else if(t.c=='\r'){
                System.out.println("\\r"+" 的出现频率为"+t.count+" Huffman编码为:"+t.code);
                map.put(t.c, t.code);
            }else if(t.c=='\n'){
                System.out.println("\\n"+" 的出现频率为"+t.count+" Huffman编码为:"+t.code);
                map.put(t.c, t.code);
            }else {
                System.out.println(t.c + " 的出现频率为" + t.count + " Huffman编码为:" + t.code);
                map.put(t.c, t.code);
            }
        }

        if (t.left!=null){
            map.putAll(coding(t.left));
        }

        if (t.right!=null){
            map.putAll(coding(t.right));
        }
        return map;
    }
/*
    private static Map encoding(HTreeNode t){

        Map enmap = new HashMap();
        if(t.c!='\0'){
            enmap.put(t.code,t.c);
        }

        if (t.left!=null){
            enmap.putAll(encoding(t.left));
        }

        if (t.right!=null){
            enmap.putAll(encoding(t.right));
        }
        return enmap;
    }
*/
    private static String String2HFMcode(String s,Map<Character,String> map){
        //System.out.print(map.keySet().toString());
        char[] cl = s.toCharArray();
        //System.out.println(s);
        String[] vl = new String[cl.length];
        for(int i=0;i<cl.length;i++){
            //System.out.print((int)cl[i]);
            try {
                vl[i] = map.get(cl[i]);
                //System.out.println(" "+vl[i]);
            }catch (NullPointerException e){
                e.printStackTrace();
            }
        }
        for(int i=1;i<vl.length;i++){
            vl[0] += vl[i];
            //System.out.println(vl[0]);
        }
        return vl[0];
    }

    private static byte[] binaryStringToBytes(String str) {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        int curByte = 0;
        int i, bc = 0;
        for (i = 0; i < str.length(); i++) {
            int bit;
            char charAt = str.charAt(i);
            if (charAt == '1')
                bit = 1;
            else if (charAt == '0')
                bit = 0;
            else
                continue;
            curByte |= bit << (7 - bc % 8);
            if (bc % 8 == 7) {
                baos.write(curByte);
                curByte = 0;
            }
            bc++;
        }
        if (bc % 8 != 0)
            baos.write(curByte);
        try {
            baos.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        //System.out.println(Arrays.toString(baos.toByteArray()));
        return baos.toByteArray();

    }

    private static String Map2mapString(Map<Character,String> map){
        Character[] ca = new Character[map.size()];
        ca =  map.keySet().toArray(ca);
        //System.out.println(s);
        String[] sa = new String[map.size()];
        for(int i=0;i<ca.length;i++){
            //System.out.print((int)cl[i]);
            try {
                sa[i] = ca[i] + "= " + map.get(ca[i]);
                //System.out.println(" "+vl[i]);
            }catch (NullPointerException e){
                e.printStackTrace();
            }
        }
        for(int i=1;i<sa.length;i++){
            sa[0] += ",@,"+sa[i];
            //System.out.println(vl[0]);
        }
        return sa[0];
    }

    private static String Bytes2binaryString(byte[] bl){
        ByteArrayInputStream bais = new ByteArrayInputStream(bl);
        String [] sa = new String[bl.length];
        for(int i=0;i<bl.length;i++){
            String s = Integer.toBinaryString(bais.read());
            int delta = 8-s.length();
            s = "0".repeat(delta) + s;
            sa[i] = s;
            if(i!=0) {
                sa[0] += sa[i];
            }
        }
        //System.out.println(sa[0]);
        return sa[0];
    }

    public static Map<String,Character> mapStringToMap(String str){
        //str = str.substring(1, str.length()-1);
        String[] strs=str.split(",@,");
        Map<String,Character> map = new HashMap<String, Character>();

        for (String string:strs){
            //System.out.println(string);
            String key   = string.split("= ")[1];
            Character value = string.split("= ")[0].charAt(0);
            map.put(key,value);
        }
        return map;
    }

    public static ZipResult zip(String s){
        Map<Character,Integer> char_intmap = transferString(s);
        Queue<HTreeNode> q = init(char_intmap);
        while (q.size()>1){
            update(q);
        }
        HTreeNode t = q.poll();
        countTreeCode(t);
        //创建码表
        Map<Character,String> map = coding(t);
        //压缩文件
        //System.out.println(s);
        String HFMcode = String2HFMcode(s, map);
        int number_of_zero = 0;
        if(HFMcode.length()%8!=0){
            number_of_zero = 8 - HFMcode.length()%8;
            HFMcode = HFMcode+"0".repeat(number_of_zero);
        }

        byte[] head = Map2mapString(map).getBytes();
        byte[] data = binaryStringToBytes(HFMcode);
        //System.out.println(Arrays.toString(data));
        //System.out.println(HFMcode);
        int head_length = head.length;
        return new ZipResult(head_length,number_of_zero,head,data);

    }

    public static byte[] unzip(ZipResult zipresult){

        byte[] head = zipresult.head;
        byte[] data = zipresult.data;
        //System.out.println(Arrays.toString(data));
        int number_of_zero = zipresult.number_of_zero;


        //取到解码表
        String mapstring = new String(head);
        Map<String, Character> enmap = mapStringToMap(mapstring);


        //取到01串
        String HFMcode_with_zero = Bytes2binaryString(data);
        //System.out.println(HFMcode_with_zero);
        //System.out.println(number_of_zero);
        String HFMcode = HFMcode_with_zero.substring(0,HFMcode_with_zero.length()-number_of_zero);
        /*String[] sa = content.split(", ");
        byte[] ba = new byte[sa.length];
        for(int i=0;i


        String str = "";
        String content ="";
        for(int i=0;i<HFMcode.length();i++){
            str += HFMcode.charAt(i);
            if(enmap.containsKey(str)){
                content += enmap.get(str);
                str = "";
            }
        }

        byte[] result = content.getBytes();

        //System.out.println(content);
        return result;

    }

}


public class HFM {

    public static String ReadFile(String path){

        try {
            //读取指定路径的文件
            FileInputStream ips = new FileInputStream(path);
            //把文件写进字节数组
            byte[] buffer = ips.readAllBytes();
            String content = new String(buffer);
            ips.close();

            return content;

        } catch (FileNotFoundException e) {
            e.printStackTrace();
            System.out.println("指定路径不存在");
        } catch (IOException e) {
            e.printStackTrace();
            System.out.println("读取文件失败");
        }

        return null;

    }


    public static void CreateZipFile(ZipResult zipresult,String path){
        try {
            FileOutputStream ops = new FileOutputStream(path);
            int head_length = zipresult.head_length;
            int number_of_zero = zipresult.number_of_zero;
            byte[] head = zipresult.head;
            byte[] data = zipresult.data;
            ops.write((Integer.toString(head_length)+"\r\n").getBytes());
            ops.flush();
            ops.write((Integer.toString(number_of_zero)+"\r\n").getBytes());
            ops.flush();
            ops.write(head);
            ops.flush();
            ops.write(data);
            ops.flush();
            ops.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            System.out.println("创建路径失败");
        } catch (IOException e) {
            e.printStackTrace();
            System.out.println("写入失败");
        }

    }
    public static ZipResult ReadZipFile(String path) {
        if (path.contains(".myzip")) {

            try {
                //读取指定路径的文件
                FileInputStream ips = new FileInputStream(path);
                //把文件写进字节数组
                byte[] whole = ips.readAllBytes();
                //System.out.println(whole.length);
                String content = new String(whole);
                //System.out.println(content.length());
                String[] sa = content.split("\r\n");
                int head_length = Integer.parseInt(sa[0]);
                int number_of_zero = Integer.parseInt(sa[1]);
                String s = sa[0] + "\r\n" + sa[1] + "\r\n";


                byte[] head = Arrays.copyOfRange(whole, s.length(), s.length() + head_length);
                byte[] data = Arrays.copyOfRange(whole, s.length() + head_length, whole.length);
                //System.out.println(Arrays.toString(data));
                ips.close();

                return new ZipResult(head_length, number_of_zero, head, data);

            } catch (FileNotFoundException e) {
                e.printStackTrace();
                System.out.println("指定路径不存在");
            } catch (IOException e) {
                e.printStackTrace();
                System.out.println("读取文件失败");
            }

            return null;

        }else {
            System.out.println("只有后缀名为.myzip的文件才可以被解压");
            return null;
        }
    }

    public static void CreateFile(byte[] buffer,String path){
        try {
            FileOutputStream ops = new FileOutputStream(path);
            ops.write(buffer);
            ops.flush();
            ops.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            System.out.println("创建路径失败");
        } catch (IOException e) {
            e.printStackTrace();
            System.out.println("写入失败");
        }

    }

    //压缩
    private static void ZIP(String inputpath,String outputpath){
        String content = ReadFile(inputpath);
        ZipResult zipresult = HFMtree.zip(content);
        CreateZipFile(zipresult,outputpath);
    }
    //解压
    private static void UNZIP(String inputpath,String outputpath){
        ZipResult zipresult = ReadZipFile(inputpath);
        byte[] buffer = HFMtree.unzip(zipresult);
        CreateFile(buffer,outputpath);
    }

    public static void main(String[] args){
        ZIP("D:\\javacode\\src\\数据结构\\input.txt","D:\\javacode\\src\\数据结构\\zip.myzip");
        UNZIP("D:\\javacode\\src\\数据结构\\zip.myzip","D:\\javacode\\src\\数据结构\\output.txt");
    }
}


看看,代码里不知道有多少个 / / S y s t e m . o u t . p r i n t l n ( ) ; //System.out.println(); //System.out.println();,每一个都是一次崩溃,哭哭,不过最后还是 P r o c e s s f i n i s h e d w i t h e x i t c o d e 0 Process finished with exit code 0 Processfinishedwithexitcode0了,开心!!!终于可以出门喝茶颜辽!
哈夫曼编码压缩和解压文件的Java实现_第2张图片

你可能感兴趣的:(哈夫曼编码压缩和解压文件的Java实现)