JAVA 数据结构与算法(六)—— 树结构之赫夫曼树

文章目录

  • 一、赫夫曼树
    • 1、赫夫曼树概述
    • 2、创建赫夫曼树
    • 3、赫夫曼树编码

一、赫夫曼树

1、赫夫曼树概述

(1)基本介绍

  • 给定n个权值作为n个叶子结点,构造一棵二叉树, 若该树的带权路径长度(wpl)达到最小,称这样的二叉树为最优二叉树,也称为赫夫曼树(HuffmanTree),也有翻译为哈夫曼树或者霍夫曼树的。
  • 赫夫曼树是带权路径长度最短的树,权值较大的结点离根较近。

(2)相关概念

  • 路径
    在一棵树中,从一个结点往下可以达到的孩子或孙子结点之间的通路,称为路径。
  • 路径长度
    通路中分支的数目称为路径长度。若规定根结点的层数为1,则从根结点到第L层结点的路径长度为L-1。
  • 结点的权
    若将树中结点赋给一个有着某种含义的数值,则这个数值称为该结点的权。
  • 带权路径长度
    结点的带权路径长度是从根结点到该结点之间的路径长度与该结点的权的乘积。
  • 树的带权路径长度
    树的带权路径长度规定为所有叶子结点的带权路径长度之和,记为WPL(weighted path length) ,权值越大的结点离根结点越近的二叉树才是最优二叉树。WPL最小的就是赫夫曼树。
    JAVA 数据结构与算法(六)—— 树结构之赫夫曼树_第1张图片

2、创建赫夫曼树

(1)创建赫夫曼树的步骤

  • 对数组从小到大进行排序, 将每一个数据, 每个数据都是一个节点, 每个节点可以看成是一颗最简单的二叉树。
  • 取出排序后的根节点权值最小的两颗二叉树。
  • 组成一颗新的二叉树, 该新的二叉树的根节点的权值是前面两颗二叉树根节点权值的和。
  • 再将这颗新的二叉树, 以根节点的权值大小再次排序,不断重复以上步骤,直到数列中,所有的数据都被处理,就得到赫夫曼树。

(2)赫夫曼树创建分析
JAVA 数据结构与算法(六)—— 树结构之赫夫曼树_第2张图片

(3)创建赫夫曼树示例
创建一个树结点类,并实现Comparable接口,目的是为了让这个类的对象能够持续进行排序

/*树结点类*/
public class TreeNode implements Comparable<TreeNode>{
    /*结点权值*/
    private int value;
    /*指向左子节点*/
    private TreeNode leftNode;
    /*指向右子节点*/
    private TreeNode rightNode;

    /*构造器*/
    public TreeNode(int value) {
        this.value = value;
    }

    /*toString*/
    @Override
    public String toString() {
        return "TreeNode{" +
                "value=" + value +
                '}';
    }

    @Override
    public int compareTo(TreeNode o) {
        /*返回从小到大的排序结果*/
        return this.value - o.value;
    }

    /*前序遍历*/
    public void firstShow(){
        System.out.println(this);
        if(this.leftNode != null){
            this.leftNode.firstShow();
        }
        if(this.rightNode != null){
            this.rightNode.firstShow();
        }
    }
    /*省略getter、settet方法*/
}

测试类代码

public class HuffmanTree {
    public static void main(String[] args) {
        /*定义一个数组*/
        int arr[] = {35,7,19,2,12,4,21};

        /*创建赫夫曼树完成后得到根结点*/
        TreeNode rootNode = createHuffmanTree(arr);

        System.out.println("\n前序遍历得到的树结果为:");
        first(rootNode);
    }


    /*创建赫夫曼树*/
    public static TreeNode createHuffmanTree(int[] arr){
        /*遍历数组*/
        List<TreeNode> treeNodes = new ArrayList<TreeNode>();
        for(int value: arr){
            treeNodes.add(new TreeNode(value));
        }

        /*构建赫夫曼树*/
        int count = 0;
        while(treeNodes.size() > 1){
            count++;
            /*从小到大排序*/
            Collections.sort(treeNodes);
            System.out.println("第" + count + "次排序后treeNodes = " + treeNodes);

            /*取出排序后权值最小的两个结点构建新二叉树*/
            TreeNode left = treeNodes.get(0);
            TreeNode right = treeNodes.get(1);
            /*构建二叉树*/
            TreeNode parent = new TreeNode(left.getValue() + right.getValue());
            parent.setLeftNode(left);
            parent.setRightNode(right);

            /*删除掉已经取出的结点*/
            treeNodes.remove(left);
            treeNodes.remove(right);
            /*将构建的树结点添加到treeNodes*/
            treeNodes.add(parent);
        }
        return treeNodes.get(0);
    }

    /*前序遍历*/
    public static void first(TreeNode rootNode){
        if(rootNode != null){
            rootNode.firstShow();
        }else{
            System.out.println("该树为空");
        }
    }
}

输出结果为

1次排序后treeNodes = [TreeNode{value=2}, TreeNode{value=4}, TreeNode{value=7}, TreeNode{value=12}, TreeNode{value=19}, TreeNode{value=21}, TreeNode{value=35}]2次排序后treeNodes = [TreeNode{value=6}, TreeNode{value=7}, TreeNode{value=12}, TreeNode{value=19}, TreeNode{value=21}, TreeNode{value=35}]3次排序后treeNodes = [TreeNode{value=12}, TreeNode{value=13}, TreeNode{value=19}, TreeNode{value=21}, TreeNode{value=35}]4次排序后treeNodes = [TreeNode{value=19}, TreeNode{value=21}, TreeNode{value=25}, TreeNode{value=35}]5次排序后treeNodes = [TreeNode{value=25}, TreeNode{value=35}, TreeNode{value=40}]6次排序后treeNodes = [TreeNode{value=40}, TreeNode{value=60}]

前序遍历得到的树结果为:
TreeNode{value=100}
TreeNode{value=40}
TreeNode{value=19}
TreeNode{value=21}
TreeNode{value=60}
TreeNode{value=25}
TreeNode{value=12}
TreeNode{value=13}
TreeNode{value=6}
TreeNode{value=2}
TreeNode{value=4}
TreeNode{value=7}
TreeNode{value=35}

3、赫夫曼树编码

(1)介绍

  • 赫夫曼编码也翻译为哈夫曼编码(Huffman Coding),又称霍夫曼编码,是一种编码方式,并且是可变字长编码(VLC)的一种,是Huffman于 1952年提出一种编码方法,称之为最佳编码。
  • 赫夫曼编码属于一种程序算法,是赫夫夫曼树在电讯通信中的经典的应用之一。
  • 赫夫曼编码广泛地用于数据文件压缩。其压缩率通常在20%~90%之间。

(2)赫夫曼编码示例
创建一个树结点类

/*树结点*/
public class TreeNode implements Comparable<TreeNode>{
    /*存放数据本身*/
    public Byte data;
    /*权值:表示字符出现的次数*/
    public int weight;
    /*指向左右结点*/
    TreeNode leftNode;
    TreeNode rightNode;

    /*构造器*/
    public TreeNode(Byte data, int weight) {
        this.data = data;
        this.weight = weight;
    }

    @Override
    public int compareTo(TreeNode o) {
        return this.weight - o.weight;
    }

    @Override
    public String toString() {
        return "TreeNode{" +
                "data=" + data +
                ", weight=" + weight +
                '}';
    }

    /*前序遍历*/
    public void firstShow(){
        System.out.println(this);
        if(this.leftNode != null){
            this.leftNode.firstShow();
        }
        if(this.rightNode != null){
            this.rightNode.firstShow();
        }
    }
}

测试代码类(包括压缩和解压)

public class HuffmanCode {
    public static void main(String[] args) {
        /*定义一个字符串*/
        String str = "today is a really a nice day how do you think";
        byte[] bytes = str.getBytes();

        List<TreeNode> treeNodes = getNodes(bytes);
        System.out.println("转为字节数组后:" + treeNodes);

        /*生成赫夫曼树测试*/
        TreeNode rootNode = createHuffmanTree(treeNodes);
        System.out.println("\n前序遍历结果为:");
        rootNode.firstShow();

        /*赫夫曼编码表测试*/
        Map<Byte, String> huffmanCodesMap = getHuffmanCodes(rootNode);
        System.out.println("\n赫夫曼编码表 = " + huffmanCodesMap);

        /*压缩测试*/
        byte[] newByte = zip(bytes, huffmanCodesMap);
        System.out.println("\n压缩后的赫夫曼编码表 = " + Arrays.toString(newByte));

        /*解码测试*/
        byte[] oldBytes = decode(huffmanCodesMap, newByte);
        System.out.println("解码后的字符串oldBytes = " + new String(oldBytes));
    }

    public static List<TreeNode> getNodes(byte[] bytes){
        List<TreeNode> treeNodes = new ArrayList<TreeNode>();

        /*遍历bytes,统计每个字符出现的次数存放到map中*/
        Map<Byte, Integer> countMap = new HashMap<Byte, Integer>();
        for(byte b: bytes){
            Integer count = countMap.get(b);
            if(count == null){
                countMap.put(b, 1);
            }else {
                countMap.put(b, count + 1);
            }
        }

        /*将map中的每个键值对转换成树的结点,并存放到treeNodes集合中*/
        for (Map.Entry<Byte, Integer> entry: countMap.entrySet()){
            treeNodes.add(new TreeNode(entry.getKey(), entry.getValue()));
        }
        return treeNodes;
    }

    /*创建赫夫曼树*/
    public static TreeNode createHuffmanTree(List<TreeNode> treeNodes){
        while(treeNodes.size() > 1){
            /*从小到大排序*/
            Collections.sort(treeNodes);
            /*取出最小的两个结点*/
            TreeNode leftNode = treeNodes.get(0);
            TreeNode rightNode = treeNodes.get(1);
            /*创建新的二叉树*/
            TreeNode parentNode = new TreeNode(null,leftNode.weight + rightNode.weight);
            parentNode.leftNode = leftNode;
            parentNode.rightNode = rightNode;

            /*移除已经取出的结点*/
            treeNodes.remove(leftNode);
            treeNodes.remove(rightNode);
            /*加入新创建的二叉树结点*/
            treeNodes.add(parentNode);
        }
        return treeNodes.get(0);
    }

    /*定义一个map用于存放赫夫曼编码表*/
    static Map<Byte, String> huffmanCodesMap = new HashMap<Byte, String>();
    /*定义一个StringBuilder,存储某个叶子结点的路径*/
    static StringBuilder stringBuilder = new StringBuilder();

    /*生成赫夫曼树对应的赫夫曼编码表,将传入的treeNode所有叶子节点的赫夫曼编码存放到huffmanCodesMap*/
    public static void getHuffmanCodes(TreeNode treeNode, String code, StringBuilder stringBuilder){
        StringBuilder stringBuilder1 = new StringBuilder(stringBuilder);
        stringBuilder1.append(code);
        if(treeNode != null){
            /*data == null表示非叶子结点*/
            if(treeNode.data == null){
                /*向左递归处理*/
                getHuffmanCodes(treeNode.leftNode, "0", stringBuilder1);
                /*向右递归处理*/
                getHuffmanCodes(treeNode.rightNode, "1", stringBuilder1);
            }else{
                /*表示是叶子节点*/
                huffmanCodesMap.put(treeNode.data,stringBuilder1.toString());
            }
        }
    }

    /*重载getHuffmanCodes*/
    public static Map<Byte, String> getHuffmanCodes(TreeNode rootNode){
        if(rootNode == null){
            return null;
        }
        /*向左处理*/
        getHuffmanCodes(rootNode.leftNode, "0", stringBuilder);
        /*向右处理*/
        getHuffmanCodes(rootNode.rightNode, "1", stringBuilder);
        return huffmanCodesMap;
    }

    /*字符串对应的byte[]数组通过生成的编码表压缩并再次返回一个byte[]*/
    /*bytes是压缩前的字节数组,huffmanCodesMap是赫夫曼编码表*/
    public static byte[] zip(byte[] bytes, Map<Byte, String> huffmanCodesMap){
        StringBuilder stringBuilder = new StringBuilder();
        /*利用赫夫曼编码表将bytes转成对应的字符串*/
        for (byte b: bytes){
            stringBuilder.append(huffmanCodesMap.get(b));
        }

        int len;
        if(stringBuilder.length() % 8 == 0){
            len = stringBuilder.length() / 8;
        }else{
            len = stringBuilder.length() / 8 + 1;
        }

        /*创建数组存储压缩后的数据*/
        byte[] newByte = new byte[len];
        int index = 0;
        for(int i = 0; i < stringBuilder.length(); i += 8){
            String str;
            if((i + 8) > stringBuilder.length()){
                str = stringBuilder.substring(i);
            }else {
                str = stringBuilder.substring(i, i + 8);
            }
            newByte[index] = (byte) Integer.parseInt(str, 2);
            index++;
        }
        return newByte;
    }

    /*解码:字节数组转对应的字符串*/
    public static String byteToString(byte b, boolean flag){
        int temp = b;

        if(flag) {
            temp |= 256;
        }
        String str = Integer.toBinaryString(temp);
        if(flag){
            return str.substring(str.length() - 8);
        }else{
            return str;
        }
    }

    /*解码:对压缩数据的解码*/
    public static byte[] decode(Map<Byte, String> huffmanCodesMap, byte[] newBytes){
        /*获取传入的newBytes对应的二进制字符串*/
        StringBuilder stringBuilder = new StringBuilder();
        /*将byte[]数组转成二进制字符串*/
        for(int i = 0; i < newBytes.length; i++){
            boolean flag = (i == newBytes.length - 1);
            stringBuilder.append(byteToString(newBytes[i], !flag));
        }

        /*把字符串按照指定的赫夫曼编码进行解码*/
        Map<String, Byte> map = new HashMap<String, Byte>();
        for (Map.Entry<Byte, String> entry: huffmanCodesMap.entrySet()){
            map.put(entry.getValue(),entry.getKey());
        }

        /*创建一个集合存放byte*/
        List<Byte> list = new ArrayList<Byte>();
        for(int i = 0; i < stringBuilder.length();){
            int count = 1;
            boolean flag = true;
            Byte b = null;
            while(flag){
                String key = stringBuilder.substring(i ,i + count);
                b = map.get(key);
                if(b == null){
                    count++;
                }else{
                    flag = false;
                }

            }
            list.add(b);
            i += count;
        }
        byte[] b = new byte[list.size()];
        for(int i = 0; i < b.length; i++){
            b[i] = list.get(i);
        }
        return b;
    }

    /*前序遍历*/
    public static void first(TreeNode rootNode){
        if(rootNode != null){
            rootNode.firstShow();
        }else{
            System.out.println("该树为空");
        }
    }
}

输出结果为

转为字节数组后:[TreeNode{data=32, weight=10}, TreeNode{data=97, weight=5}, TreeNode{data=99, weight=1}, TreeNode{data=100, weight=3}, TreeNode{data=101, weight=2}, TreeNode{data=104, weight=2}, TreeNode{data=105, weight=3}, TreeNode{data=107, weight=1}, TreeNode{data=108, weight=2}, TreeNode{data=110, weight=2}, TreeNode{data=111, weight=4}, TreeNode{data=114, weight=1}, TreeNode{data=115, weight=1}, TreeNode{data=116, weight=2}, TreeNode{data=117, weight=1}, TreeNode{data=119, weight=1}, TreeNode{data=121, weight=4}]

前序遍历结果为:
TreeNode{data=null, weight=45}
TreeNode{data=null, weight=18}
TreeNode{data=null, weight=8}
TreeNode{data=null, weight=4}
TreeNode{data=116, weight=2}
TreeNode{data=null, weight=2}
TreeNode{data=99, weight=1}
TreeNode{data=107, weight=1}
TreeNode{data=null, weight=4}
TreeNode{data=null, weight=2}
TreeNode{data=114, weight=1}
TreeNode{data=115, weight=1}
TreeNode{data=null, weight=2}
TreeNode{data=117, weight=1}
TreeNode{data=119, weight=1}
TreeNode{data=32, weight=10}
TreeNode{data=null, weight=27}
TreeNode{data=null, weight=11}
TreeNode{data=97, weight=5}
TreeNode{data=null, weight=6}
TreeNode{data=100, weight=3}
TreeNode{data=105, weight=3}
TreeNode{data=null, weight=16}
TreeNode{data=null, weight=8}
TreeNode{data=111, weight=4}
TreeNode{data=121, weight=4}
TreeNode{data=null, weight=8}
TreeNode{data=null, weight=4}
TreeNode{data=101, weight=2}
TreeNode{data=104, weight=2}
TreeNode{data=null, weight=4}
TreeNode{data=108, weight=2}
TreeNode{data=110, weight=2}

赫夫曼编码表 = {32=01, 97=100, 99=00010, 100=1010, 101=11100, 104=11101, 105=1011, 107=00011, 108=11110, 110=11111, 111=1100, 114=00100, 115=00101, 116=0000, 117=00110, 119=00111, 121=1101}

压缩后的赫夫曼编码表 = [12, -87, -83, -107, -119, 57, 61, -19, 99, -5, 23, 26, -102, -9, 14, -42, 59, -122, 67, -73, -15, 1]
解码后的字符串oldBytes = today is a really a nice day how do you think

你可能感兴趣的:(JAVA,数据结构与算法)