【学到的东西】
1)如何构造哈夫曼树(①优先队列、push*n、pop*2、push,同时累加,这样可以得到整棵树的带权路径长度(从一位大佬的代码里学到的)②直接构造二叉树(parent数组),可使用优先队列pop最小的两个出来(也可以用算法找出最小的两个数),合并后加入队列(数组),看了另外一个大佬的代码,写的很好),以及如何由哈夫曼树得到哈夫曼编码(从叶子开始找parent,如果是parent的左孩子,则+0,右孩子+1,最后颠倒,也是在“另外一个大佬的代码”里看到的)
2)如何构造字典树(next[],用code[i]-'a'表示next[]的下标(这里是code[i]-'0')),如何判断前缀码【我的做法是:Insert(code)时①若没有增加新结点,则code为别人的前缀②若在已成为编码的叶子上增加新结点,则该叶子是code的前缀,我用了每次插入code时最后一个结点的isCode来表示“已成为编码”,这样虽然很方便,但是会增加每个结点的空间)】
7-9 Huffman Codes (30分)
In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the Ndistinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i]
is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i]
is the frequency of c[i]
and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i]
is the i
-th character and code[i]
is an non-empty string of no more than 63 '0's and '1's.
Output Specification:
For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
Sample Input:
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Sample Output:
Yes
Yes
No
No
#include
#include
#include
#include
using namespace std;
#define MaxSize 63
char c[MaxSize]; //c[]数组
int fre_of_ch[128]; //记录每个ASCII码对应的频次
int N; //字符个数
typedef struct trienode { //用字典树(前缀树)判断前缀码
int val; //头结点不存储0、1
struct trienode* next[2];
int isCode; //该结点是否为一个编码
}*TrieNode;
TrieNode createNode() {
TrieNode p = new struct trienode;
p->val = -1;
p->next[0] = p->next[1] = NULL;
p->isCode = 0;
return p;
}
void Create_c(char c[]);
void Insert(string code, int& isPre, TrieNode t);
void freeTree(TrieNode t);
int main() {
Create_c(c);
cin >> N;
priority_queue, greater> q; //优先队列,push同时将小的排在最前面
char ch;
int fre, i, Huff_sum = 0;
for (i = 0; i < N; ++i) { //不改变N
cin >> ch >> fre;
fre_of_ch[(int)ch] = fre;
q.push(fre);
}
while (q.size() > 1) {
fre = q.top();
q.pop();
fre += q.top();
q.pop();
q.push(fre);
Huff_sum += fre; //Huff_sum累加,最终得到哈夫曼编码的总长度。因为每个叶子在第几层就被累加了几次,所以能够得到整棵树的带权路径长度。
}
//printf("the shortest sum: %d\n", Huff_sum);
int M, j, sum;
string code;
cin >> M;
while (M--) {
int isPre = 1;
sum = 0;
TrieNode t = createNode();
for (i = 0; i < N; ++i) { //不改变N
cin >> ch >> code;
sum += code.length() * fre_of_ch[(int)ch];
if (!isPre) continue;
if (code.length() > N) { //有符号数和无符号数运算,会归为无符号数
isPre = 0;
continue; //不能break,因为本组输入还在继续
}
Insert(code, isPre, t); //用字典树判断是否存在前缀重合的问题
}
//printf("the sum: %d\n", sum);
if (isPre && sum == Huff_sum) cout << "Yes" << endl;
else cout << "No" << endl;
freeTree(t); //记得要把建立的二叉树释放掉,重新建立
}
}
void Create_c(char c[]) {
int i;
for (i = 0; i < MaxSize; ++i) {
if (i >= 0 && i <= 9) c[i] = '0' + i;
else if (i >= 10 && i <= 35) c[i] = 'a' + i - 10;
else if (i >= 36 && i <= 61) c[i] = 'A' + i - 36;
else c[i] = '_'; //95
}
//print_c();
}
void freeTree(TrieNode t) {
TrieNode p = t;
if (p->next[0]) freeTree(p->next[0]);
if (p->next[1]) freeTree(p->next[1]);
delete p;
t = NULL;
}
void Insert(string code, int &isPre, TrieNode t) {
int i, newnode = 0; //newnode表示初始时没有产生新结点
TrieNode p = t;
for (i = 0; i < code.length(); ++i) {
if (!isPre) break;
if (p->next[code[i] - '0'] == NULL) {
if (p->isCode) { //p表示一个编码,若在上面插入新结点,则p表示的编码是code的前缀
isPre = 0;
break;
}
TrieNode q = createNode();
q->val = code[i] - '0';
if (i == code.length() - 1) q->isCode = 1; //该code的最后一位
p->next[code[i] - '0'] = q;
newnode = 1; //产生了新结点
p = p->next[code[i] - '0'];
}
else {
p = p->next[code[i] - '0'];
}
}
if(newnode==0)
isPre = 0; //整个编码读完,没有产生新结点,说明该编码是其他编码的前缀
}