*****Huffman Codes(※构造哈夫曼树,※构造前缀树)

【学到的东西】

1)如何构造哈夫曼树(①优先队列、push*n、pop*2、push,同时累加,这样可以得到整棵树的带权路径长度(从一位大佬的代码里学到的)②直接构造二叉树(parent数组),可使用优先队列pop最小的两个出来(也可以用算法找出最小的两个数),合并后加入队列(数组),看了另外一个大佬的代码,写的很好),以及如何由哈夫曼树得到哈夫曼编码(从叶子开始找parent,如果是parent的左孩子,则+0,右孩子+1,最后颠倒,也是在“另外一个大佬的代码”里看到的)

2)如何构造字典树(next[],用code[i]-'a'表示next[]的下标(这里是code[i]-'0')),如何判断前缀码【我的做法是:Insert(code)时①若没有增加新结点,则code为别人的前缀②若在已成为编码的叶子上增加新结点,则该叶子是code的前缀,我用了每次插入code时最后一个结点的isCode来表示“已成为编码”,这样虽然很方便,但是会增加每个结点的空间)】

7-9 Huffman Codes (30分)

In 1953, David A. Huffman published his paper "A Method for the Construction of Minimum-Redundancy Codes", and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string "aaaxuaxz", we can observe that the frequencies of the characters 'a', 'x', 'u' and 'z' are 4, 2, 1 and 1, respectively. We may either encode the symbols as {'a'=0, 'x'=10, 'u'=110, 'z'=111}, or in another way as {'a'=1, 'x'=01, 'u'=001, 'z'=000}, both compress the string into 14 bits. Another set of code can be given as {'a'=0, 'x'=11, 'u'=100, 'z'=101}, but {'a'=0, 'x'=01, 'u'=011, 'z'=001} is NOT correct since "aaaxuaxz" and "aazuaxax" can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:

Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the Ndistinct characters and their frequencies in the following format:

 

c[1] f[1] c[2] f[2] ... c[N] f[N]

where c[i] is a character chosen from {'0' - '9', 'a' - 'z', 'A' - 'Z', '_'}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0's and '1's.

Output Specification:

For each test case, print in each line either "Yes" if the student's submission is correct, or "No" if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No
#include
#include
#include
#include
using namespace std;
#define MaxSize 63

char c[MaxSize];  //c[]数组
int fre_of_ch[128];  //记录每个ASCII码对应的频次
int N;  //字符个数
typedef struct trienode {  //用字典树(前缀树)判断前缀码
	int val;  //头结点不存储0、1
	struct trienode* next[2];
	int isCode;  //该结点是否为一个编码
}*TrieNode;

TrieNode createNode() {
	TrieNode p = new struct trienode;
	p->val = -1;
	p->next[0] = p->next[1] = NULL;
	p->isCode = 0;
	return p;
}

void Create_c(char c[]);
void Insert(string code, int& isPre, TrieNode t);
void freeTree(TrieNode t);

int main() {
	Create_c(c);
	cin >> N;
	priority_queue, greater> q;  //优先队列,push同时将小的排在最前面
	char ch;
	int fre, i, Huff_sum = 0;
	for (i = 0; i < N; ++i) {  //不改变N
		cin >> ch >> fre;
		fre_of_ch[(int)ch] = fre;
		q.push(fre);
	}

	while (q.size() > 1) {
		fre = q.top();
		q.pop();
		fre += q.top();
		q.pop();
		q.push(fre);
		Huff_sum += fre;  //Huff_sum累加,最终得到哈夫曼编码的总长度。因为每个叶子在第几层就被累加了几次,所以能够得到整棵树的带权路径长度。
	}
	//printf("the shortest sum: %d\n", Huff_sum);

	int M, j, sum;
	string code;
	cin >> M;
	while (M--) {
		int isPre = 1;
		sum = 0;
		TrieNode t = createNode();
		for (i = 0; i < N; ++i) {  //不改变N
			cin >> ch >> code;
			sum += code.length() * fre_of_ch[(int)ch];
			if (!isPre) continue;
			if (code.length() > N) {  //有符号数和无符号数运算,会归为无符号数
				isPre = 0;
				continue;  //不能break,因为本组输入还在继续
			}
			Insert(code, isPre, t);  //用字典树判断是否存在前缀重合的问题
		}
		//printf("the sum: %d\n", sum);
		if (isPre && sum == Huff_sum) cout << "Yes" << endl;
		else cout << "No" << endl;
		freeTree(t);  //记得要把建立的二叉树释放掉,重新建立 
	}
}

void Create_c(char c[]) {
	int i;
	for (i = 0; i < MaxSize; ++i) {
		if (i >= 0 && i <= 9) c[i] = '0' + i;
		else if (i >= 10 && i <= 35) c[i] = 'a' + i - 10;
		else if (i >= 36 && i <= 61) c[i] = 'A' + i - 36;
		else c[i] = '_';  //95
	}
	//print_c();
}

void freeTree(TrieNode t) {
	TrieNode p = t;
	if (p->next[0]) freeTree(p->next[0]);
	if (p->next[1]) freeTree(p->next[1]);
	delete p;
	t = NULL;
}

void Insert(string code, int &isPre, TrieNode t) {  
	int i, newnode = 0;  //newnode表示初始时没有产生新结点
	TrieNode p = t;
	for (i = 0; i < code.length(); ++i) {
		if (!isPre) break;
		if (p->next[code[i] - '0'] == NULL) {
			if (p->isCode) {  //p表示一个编码,若在上面插入新结点,则p表示的编码是code的前缀
				isPre = 0;
				break;
			}
			TrieNode q = createNode();
			q->val = code[i] - '0';
			if (i == code.length() - 1) q->isCode = 1;  //该code的最后一位
			p->next[code[i] - '0'] = q;
			newnode = 1;  //产生了新结点
			p = p->next[code[i] - '0'];
		}
		else {
			p = p->next[code[i] - '0'];
		}
	}
	if(newnode==0)
		isPre = 0;  //整个编码读完,没有产生新结点,说明该编码是其他编码的前缀
}

 

你可能感兴趣的:(数据结构,#,树)