In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:
c[1] f[1] c[2] f[2] ... c[N] f[N]
where c[i]
is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i]
is the frequency of c[i]
and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:
c[i] code[i]
where c[i]
is the i
-th character and code[i]
is an non-empty string of no more than 63 '0's and '1's.
For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.
Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.
给定一段文字,如果我们统计出字母出现的频率,是可以根据哈夫曼算法给出一套编码,使得用此编码压缩原文可以得到最短的编码总长。然而哈夫曼编码并不是唯一的。例如对字符串"aaaxuaxz",容易得到字母 'a'、'x'、'u'、'z' 的出现频率对应为 4、2、1、1。我们可以设计编码 {'a'=0, 'x'=10, 'u'=110, 'z'=111},也可以用另一套 {'a'=1, 'x'=01, 'u'=001, 'z'=000},还可以用 {'a'=0, 'x'=11, 'u'=100, 'z'=101},三套编码都可以把原文压缩到 14 个字节。但是 {'a'=0, 'x'=01, 'u'=011, 'z'=001} 就不是哈夫曼编码,因为用这套编码压缩得到 00001011001001 后,解码的结果不唯一,"aaaxuaxz" 和 "aazuaxax" 都可以对应解码的结果。本题就请你判断任一套编码是否哈夫曼编码。
首先第一行给出一个正整数 N(2≤N≤63),随后第二行给出 N 个不重复的字符及其出现频率,格式如下:
c[1] f[1] c[2] f[2] ... c[N] f[N]
其中c[i]
是集合{'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}中的字符;f[i]
是c[i]
的出现频率,为不超过 1000 的整数。再下一行给出一个正整数 M(≤1000),随后是 M 套待检的编码。每套编码占 N 行,格式为:
c[i] code[i]
其中c[i]
是第i
个字符;code[i]
是不超过63个'0'和'1'的非空字符串。
对每套待检编码,如果是正确的哈夫曼编码,就在一行中输出"Yes",否则输出"No"。
注意:最优编码并不一定通过哈夫曼算法得到。任何能压缩到最优长度的前缀编码都应被判为正确。
7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11
Yes
Yes
No
No
提交时间 | 状态 | 分数 | 题目 | 编译器 | 耗时 | 用户 |
---|---|---|---|---|---|---|
2018/5/24 23:50:17 | 答案正确 | 30 | 7-9 | C++ (g++) | 72 ms | copper |
测试点 | 结果 | 耗时 | 内存 |
---|---|---|---|
0 | 答案正确 | 3 ms | 240KB |
1 | 答案正确 | 3 ms | 240KB |
2 | 答案正确 | 3 ms | 240KB |
3 | 答案正确 | 72 ms | 6220KB |
4 | 答案正确 | 3 ms | 236KB |
5 | 答案正确 | 3 ms | 236KB |
6 | 答案正确 | 3 ms | 316KB |
这道题可以说是树这块的压轴题了,无论是代码量还是思维难度都和其他题目不在一个档次。题目意思是给定一个带权的输入序列,和N个与带权输入序列元素相同的编码测试序列,若编码符合最优编码,则输出Yes,若不符合,则输出No。
这道题考察最优编码长度,实际上是在考察Huffman树与Huffman编码,出题人担心你想不到,还特地在题干开头专门介绍了David A. Huffman和他提出的"A Method for the Construction of Minimum-Redundancy Codes"(一种实现最小冗余编码结构的方法),也就是Huffman Codes(哈夫曼编码)。
对于每个给定的带权元素序列来说,其必定可以建成一棵Huffman树,尽管根据建树方法的不同,树的结构不同,但是对于这个给定的带权元素序列,其最优编码长度是固定的,即某一形式的Huffman树下的Huffman编码长度。(建立Huffman树的过程为,每次将权重最小的两个结点合成一棵二叉树,其树根结点权值为两子树权值之合。再将该二叉树当作结点进行重新合并。重复上述过程直到使用完所有结点,建立成一棵树(理论上N个结点要进行N-1次合并))但需要注意的是,Huffman编码的编码长度是最优编码长度,但是最优编码长度可以不是其Huffman编码(题目最后一句也有特地提醒,看来出题人还是比较好心的)。因此,验证输入测试序列需要靠两点:1.符合最优编码长度,2.能够无歧义解码。
那么思路就出来了。先根据输入序列建立Huffman树,并获得最优编码长度。再对提交数据进行检查:1.是否符合最优编码长度,2.是否符合无歧义解码规则(前缀码编码,数据仅存在于二叉树叶节点)。
获得最优编码长度的过程,需要先创建一棵Huffman树,又需要先将带权序列建立成最小堆,再每轮弹出2次最小堆的顶点,作为二叉树的左右子树进行合并,合并完后的二叉树进行权值更新,再继续放入最小堆进行合并……直到最小堆元素全部弹出,最后弹出一整棵Huffman树。(最小堆的建立可以参见:#数据结构与算法学习笔记#PTA14:最小堆与最大堆(C/C++))。计算每一个测试序列的编码长度,与标准Huffman编码长度比较即可。最小堆每次插入和弹出都需要对全堆某个路径(根节点到叶子结点的一条路径)进行一次调整,具体情况分析详见代码注释。
检查前缀码编码的过程,需要根据输入序列的每个元素编码,模拟其在树中的路径(相当于每次创建一个元素编码所代表一条二叉树的路径,0代表左子树,1代表右子树)。模拟过程中的两种情况可以验证不满足前缀码要求(如下图):1.后创建的分支经过或超过已经被定义的叶子结点,2.后创建分支创建结束时未达到叶子结点。具体情况分析详见代码注释。
// HuffmanCodes.cpp : 定义控制台应用程序的入口点。
//
//#include
#include
#include
#include
using namespace std;
//Huffman树结点类
class Node {
public:
Node() {}
Node(char element, int weight)
:element(element), weight(weight), left(NULL), right(NULL) {}
char element;
int weight;
Node* left = NULL;
Node* right = NULL;
bool isleave = false;
};
typedef Node* HFMTree;
//输入测试样例结点类
class Case {
public:
char element;
char route[1000];
int length;
int getlength() {
return strlen(this->route);
}
};
void Read(int num, vector& minHeap, vector& inputlist);
void Insert(vector& minHeap, HFMTree node); //插入数据创建最小堆
HFMTree CreateHFMT(vector& minHeap); //根据最小堆创建Huffman树
HFMTree DeleteMinHeap(vector& minHeap); //从最小堆中取出最小元素,删除该结点并重新调整最小堆,最后删除该结点
int getHFMLength(HFMTree hfmtree, int depth); //获得该树编码长度
void Input(vector& testcase, int num);
bool isOptimalLen(vector& testcase, vector& inputlist, int weight); //检查是否符合最优编码长度
bool isPrefixCode(vector& testcase); //检查是否符合前缀码编码
int main()
{
/*根据输入序列建立Huffman树,并获得最优编码长度*/
int num;
cin >> num;
vector minHeap; //创建最小堆,用最小堆对序列进行存储
vector inputlist; //记录输入顺序与权值大小
HFMTree flag = new Node('-', -1);
minHeap.push_back(flag);
Read(num, minHeap, inputlist);
HFMTree hfmtree; //利用最小堆创建Huffman树
hfmtree = CreateHFMT(minHeap);
int optcodelength = getHFMLength(hfmtree, 0); //通过序列创建的Huffman树获得最优编码长度
/*对提交数据进行检查:1.是否符合最优编码长度,2.是否符合无歧义解码规则(前缀码编码,数据仅存在于二叉树叶节点)*/
int count;
cin >> count;
for (int i = 0;i < count;i++) {
vector testcase;
Input(testcase, num);
bool isoptimallen = isOptimalLen(testcase, inputlist, optcodelength);
bool isprefixcode = isPrefixCode(testcase);
if (isoptimallen && isprefixcode) {
cout << "Yes" << endl;
}
else {
cout << "No" << endl;
}
}
system("pause");
return 0;
}
void Read(int num, vector& minHeap, vector& inputlist) {
char element;
int weight;
for (int i = 0; i < num; i++) {
cin >> element >> weight;
HFMTree node = new Node(element, weight);
inputlist.push_back(node);
Insert(minHeap, node);
}
//minHeap.erase(minHeap.begin());
}
void Insert(vector& minHeap, HFMTree node) {
int index = minHeap.size();
minHeap.push_back(node);
//每次插入后自底向上进行调整
while ((*minHeap[index / 2]).weight > (*node).weight) {
//此处不可单纯进行值交换,需要交换两个对象
//(*minHeap[index]).element = (*minHeap[index / 2]).element;
//(*minHeap[index]).weight = (*minHeap[index / 2]).weight;
minHeap[index] = minHeap[index / 2];
index /= 2;
}
minHeap[index] = node;
}
HFMTree CreateHFMT(vector& minHeap) {
HFMTree hfmtree = new Node();
int size = minHeap.size() - 1;
//进行size-1次合并
for (int i = 1; i < size; i++) {
HFMTree node = new Node();
//每次从最小堆中取出堆顶的两个结点作为该结点的左右子结点
node->left = DeleteMinHeap(minHeap);
node->right = DeleteMinHeap(minHeap);
node->weight = node->left->weight + node->right->weight;
//将该结点作为根节点的二叉树重新加入最小堆
Insert(minHeap, node);
}
//从最小堆中取出建好的Huffman树
hfmtree = DeleteMinHeap(minHeap);
return hfmtree;
}
HFMTree DeleteMinHeap(vector& minHeap) {
//检查是否堆空
if (minHeap.size() == 1) {
return NULL;
}
//将该堆最大元素装入新结点并返回
HFMTree node = new Node();
node = minHeap[1];
//重新调整该堆
int size = minHeap.size();
int parent, child;
//用最大堆中最后一个元素从根结点开始向上过滤下层结点
HFMTree cmp = new Node();
cmp = minHeap[size - 1];
//从根节点开始,用parent记录根结点下标,用child记录其最小子结点下标,每次循环将parent更新为上一次循环的child
//当parent指向底层结点时跳出循环(会有极端情况比如偏向一边的堆使得parent最终并非指向该子树底层结点,但不影响结果)
for (parent = 1; 2 * parent < size; parent = child) {
child = parent * 2;
//若该子结点不是堆尾结点,令child指向左右子结点中的较小者
if ((child != size - 1) && ((*minHeap[child]).weight > (*minHeap[child + 1]).weight)) {
child++;
}
//当循环到堆尾结点值小于等于该子结点值时,可以结束(此时堆尾结点会替换parent结点而不是child结点)
if (cmp->weight <= (*minHeap[child]).weight) {
break;
}
else {
minHeap[parent] = minHeap[child];
}
}
//将尾结点与当前父结点替换
minHeap[parent] = cmp;
//删除堆尾结点
//此处不能用minHeap.erase(minHeap.end());,因为erase会返回被删除结点的下一结点,而尾结点的下一结点超限
minHeap.pop_back();
//返回该结点
return node;
}
int getHFMLength(HFMTree hfmtree, int depth) {
//若为叶子节点,直接返回其编码长度
if (!hfmtree->left && !hfmtree->right) {
return hfmtree->weight * depth;
}
//否则其他节点一定有两个子树,返回左右子树编码长度之合,深度相应加一
else {
return getHFMLength(hfmtree->left, depth + 1) + getHFMLength(hfmtree->right, depth + 1);
}
}
void Input(vector& testcase, int num) {
for (int i = 0;i < num;i++) {
Case inputcase;
cin >> inputcase.element >> inputcase.route;
inputcase.length = inputcase.getlength();
testcase.push_back(inputcase);
}
}
bool isOptimalLen(vector& testcase, vector& inputlist, int weight) {
int testweight = 0;
for (int i = 0;i < testcase.size();i++) {
testweight += (testcase[i].length * (*inputlist[i]).weight);
}
if (testweight == weight) {
return true;
}
else {
return false;
}
}
bool isPrefixCode(vector& testcase) {
bool isprefixcode = true;
HFMTree newtree = new Node();
//两种情况验证不满足前缀码要求:1.后创建的分支经过或超过已经被定义的叶子结点,2.后创建分支创建结束时未达到叶子结点
for (int i = 0;i < testcase.size();i++) {
HFMTree point = newtree;
if (isprefixcode == false)break;
for (int j = 0;j < testcase[i].length;j++) {
if (isprefixcode == false)break;
if (testcase[i].route[j] == '0') {
//先检查左子结点是否存在,若不存在,则创建一个左子结点
if (!point->left) {
HFMTree newnode = new Node();
point->left = newnode;
point = point->left;
//若此时为分支的最后一环,则将该结点定义为叶子结点
if (j == testcase[i].length - 1) {
point->isleave = true;
}
}
//若左子树存在,则先将标记指针移至左子树。
else {
point = point->left;
//若左子树为叶子结点,则不符合要求
if (point->isleave) {
isprefixcode = false;
break;
}
//若此时为分支的最后一环且仍有叶子结点,则不符合要求
if ((j == testcase[i].length - 1) && (point->left || point->right)) {
isprefixcode = false;
break;
}
}
}
else if (testcase[i].route[j] == '1') {
//先检查右子结点是否存在,若不存在,则创建一个右子结点
if (!point->right) {
HFMTree newnode = new Node();
point->right = newnode;
point = point->right;
//若此时为分支的最后一环,则将该结点定义为叶子结点
if (j == testcase[i].length - 1) {
point->isleave = true;
}
}
//若左子树存在,则先将标记指针移至左子树。
else {
point = point->right;
//若左子树为叶子结点,则不符合要求
if (point->isleave) {
isprefixcode = false;
break;
}
//若此时为分支的最后一环且仍有叶子结点,则不符合要求
if ((j == testcase[i].length - 1) && (point->left || point->right)) {
isprefixcode = false;
break;
}
}
}
}
}
return isprefixcode;
}