Huffman Codes

Huffman Codes (30)

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2 <= N <= 63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (<=1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is a string of '0's and '1's.

Output Specification:

For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Sample Input:
7

A 1 B 1 C 1 D 3 E 3 F 6 G 6

4

A 00000

B 00001

C 0001

D 001

E 01

F 10

G 11

A 01010

B 01011

C 0100

D 011

E 10

F 11

G 00

A 000

B 001

C 010

D 011

E 100

F 101

G 110

A 00000

B 00001

C 0001

D 001

E 00

F 10

G 11

Sample Output:
Yes

Yes

No

No

#include <iostream>

#include <string>

#include <map>

#include <queue>

#include <vector>

#include <algorithm>

using namespace::std;





struct HuffmanCodesNode {

    bool isLeaf;

    int frequence;

    char c;

    int d;

    HuffmanCodesNode *Child[2];



    HuffmanCodesNode(char ch,int f){

        Child[0]=NULL;

        Child[1]=NULL;

        c=ch;

        frequence=f;

        isLeaf=true;

        d=0;

    }

    HuffmanCodesNode(HuffmanCodesNode *h1=NULL,HuffmanCodesNode *h2=NULL):isLeaf(false),frequence(0),c(0),d(0){

        Child[0]=h1;

        Child[1]=h2;

    }



    void clear(){

        if(Child[0]!=NULL){Child[0]->clear();delete Child[0];}

        if(Child[1]!=NULL){Child[1]->clear();delete Child[1];}

        

    }

    int computeWPL(int depth,unsigned int &WPL){

        if(Child[0]==NULL && Child[1]==NULL)WPL+=depth*this->frequence;

        else{

            Child[0]->computeWPL(depth+1, WPL);

            Child[1]->computeWPL(depth+1, WPL);

        }

        return 0;

    }



    friend bool operator<(HuffmanCodesNode a,HuffmanCodesNode b){

        return a.frequence>b.frequence;

    }



};



unsigned int computeWPLwithFrequencyMap(map<char,int> &f){

    priority_queue<HuffmanCodesNode> q;

    HuffmanCodesNode *p1,*p2;

    unsigned int WPL=0;

    for (map<char,int>::iterator iter=f.begin(); iter!=f.end(); ++iter) {

        HuffmanCodesNode temp(iter->first, iter->second);

        q.push(temp);

    }

    while (q.size()>1) {

        p1=new HuffmanCodesNode(q.top());

        q.pop();

        p2=new HuffmanCodesNode(q.top());

        q.pop();

        HuffmanCodesNode temp(p1,p2);

        temp.frequence=p1->frequence+p2->frequence;

        q.push(temp);

    }

    p1=new HuffmanCodesNode(q.top());

    p1->computeWPL(0, WPL);

    return WPL;

}

bool compare(const string& a ,const string& b){

    if (a.length() < b.length()) {

        return true;

    }else if(a.length() == b.length() && a < b){

        return true;

    }

    return false;

}

bool checkPrefix(vector<string>& s){

    sort(s.begin(), s.end(),compare);

    HuffmanCodesNode *root = new HuffmanCodesNode,*p=root;

    int t;

    bool check=false;

    for (vector<string>::iterator i=s.begin();i!=s.end();++i){

        p=root;

        for (string::iterator iter = i->begin(); iter!=i->end(); iter++) {

            t=*iter-'0';

            if(p->Child[t]==NULL)p->Child[t]=new HuffmanCodesNode;

            p=p->Child[t];

            if(p->isLeaf){root->clear();return false;}

        }

        if(p->Child[0]!=NULL || p->Child[1]!=NULL)check=true;

        if(check){root->clear();return false;}

        p->isLeaf = true;

    }

    root->clear();

    if (!check)return true;

    else return false;

}

int main(int argc,const char* argv[])

{

    ios::sync_with_stdio(false);

    

    int N;cin>>N;

    char c;int frequence;

    string s;

    map<char, int> f;

    for (int i=0; i<N; i++) {

        cin>>c;

        cin>>frequence;

        f[c]=frequence;

    }

    unsigned int WPL=computeWPLwithFrequencyMap(f);

    int M;cin>>M;

    for (int i=0; i<M; i++) {

        unsigned int t_WPL=0;

        vector<string> t;

        for (int j=0; j<N; j++) {

            cin>>c;

            cin>>s;

            t_WPL+=f[c]*s.size();

            t.push_back(s);

        }

        if ( t_WPL == WPL && checkPrefix(t)){

            cout<<"Yes"<<endl;

        }else{

            cout<<"No"<<endl;

        }

    }    

}

Huffman code的特征有两点

1. WPL(带权路径长度)最小,这个我们可以构建一个Huffman树来计算。

2. 每个编码唯一不具二义性,也就是每个编码都不会是另一个编码的前缀。

我的代码思路是这样

我们首先把输入的权值保存在一个map里面,然后交给

unsigned int computeWPLwithFrequencyMap(map<char,int> &f)

来计算出WPL,函数是通过建立一个Huffman树然后遍历得到WPL值的,应该有更好的方法。

之后用得到的WPL来和每个同学的WPL比较,比我们大的肯定不是Huffman编码了。

之后检测是否有二义性有两个办法(我目前想到两个):

  1.按编码长度排序(升序),穷举每个编码是否是后面编码的子串,也就是字符串比较,可以通过kmp比较。

  2.建立trie树,看看每个编码的路径中是否有其他的编码。

我用的是办法2,我先用sort()函数进行了按编码长度的升序排列(偷懒了,也可以不用排序),然后依次建立trie,

把编码最后一位所在的节点进行标记。按照Huffman编码,这个肯定是叶节点,如果以后有编码路径经过它,

那这个肯定不是Huffman编码了。

 

 

 

 
  

你可能感兴趣的:(Huffman)