日撸 Java 三百行学习笔记day29-30

第 29 天: Huffman 编码 (建树)

对于哈夫曼树的构造,最重要是要理解到它的核心思想:所有叶子结点的带权路径长度之和最小。所以在建树的过程中我们需要知道如何来完成这一要求,即我们应该尽可能地让权值大的叶子结点靠近根结点,让权值小的叶子结点远离根结点。至于它的具体构造规则就不再细说,自底向上的方法应当都做过初步了解的,毕竟学计算机的嘛。

依据昨天接着,先初始化charMapping数组,使其每一个元素值为-1.然后在从输入的字符中统计每一个字符出现的次数放入tempCharCounts之中。之后再对其进行遍历将信息元放入alphabet中

// Initialize.
		Arrays.fill(charMapping, -1);

		// The count for each char. At most NUM_CHARS chars.
		int[] tempCharCounts = new int[NUM_CHARS];

		// The index of the char in the ASCII charset.
		int tempCharIndex;

		// Step 1. Scan the string to obtain the counts.
		char tempChar;
		for (int i = 0; i < inputText.length(); i++) {
			tempChar = inputText.charAt(i);
			tempCharIndex = (int) tempChar;

			System.out.print("" + tempCharIndex + " ");

			tempCharCounts[tempCharIndex]++;
		} // Of for i

		// Step 2. Scan to determine the size of the alphabet.
		alphabetLength = 0;
		for (int i = 0; i < 255; i++) {
			if (tempCharCounts[i] > 0) {
				alphabetLength++;
			} // Of if
		} // Of for i

		// Step 3. Compress to the alphabet
		alphabet = new char[alphabetLength];
		charCounts = new int[2 * alphabetLength - 1];

		int tempCounter = 0;
		for (int i = 0; i < NUM_CHARS; i++) {
			if (tempCharCounts[i] > 0) {
				alphabet[tempCounter] = (char) i;
				charCounts[tempCounter] = tempCharCounts[i];
				charMapping[i] = tempCounter;
				tempCounter++;
			} // Of if
		} // Of for i

这一段的相互映射包括其中的各个数组参数,确实让人有点头晕。

先进行建树前的一点准备。因为我们知道了所有的叶子节点,而总节点的个数是叶子节点总个数乘2再减1。

// Step 1. Allocate space.
		nodes = new HuffmanNode[alphabetLength * 2 - 1];
		boolean[] tempProcessed = new boolean[alphabetLength * 2 - 1];

		// Step 2. Initialize leaves.
		for (int i = 0; i < alphabetLength; i++) {
			nodes[i] = new HuffmanNode(alphabet[i], charCounts[i], null, null, null);
		} // Of for i

之后就要进行真正的建树:

// Step 3. Construct the tree.
		int tempLeft, tempRight, tempMinimal;
		for (int i = alphabetLength; i < 2 * alphabetLength - 1; i++) {
			// Step 3.1 Select the first minimal as the left child.
			tempLeft = -1;
			tempMinimal = Integer.MAX_VALUE;
			for (int j = 0; j < i; j++) {
				if (tempProcessed[j]) {
					continue;
				} // Of if

				if (tempMinimal > charCounts[j]) {
					tempMinimal = charCounts[j];
					tempLeft = j;
				} // Of if
			} // Of for j
			tempProcessed[tempLeft] = true;

			// Step 3.2 Select the second minimal as the right child.
			tempRight = -1;
			tempMinimal = Integer.MAX_VALUE;
			for (int j = 0; j < i; j++) {
				if (tempProcessed[j]) {
					continue;
				} // Of if

				if (tempMinimal > charCounts[j]) {
					tempMinimal = charCounts[j];
					tempRight = j;
				} // Of if
			} // Of for j
			tempProcessed[tempRight] = true;
			System.out.println("Selecting " + tempLeft + " and " + tempRight);

			// Step 3.3 Construct the new node.
			charCounts[i] = charCounts[tempLeft] + charCounts[tempRight];
			nodes[i] = new HuffmanNode('*', charCounts[i], nodes[tempLeft], nodes[tempRight], null);

			// Step 3.4 Link with children.
			nodes[tempLeft].parent = nodes[i];
			nodes[tempRight].parent = nodes[i];
			System.out.println("The children of " + i + " are " + tempLeft + " and " + tempRight);
		} // Of for i

其中的tempProcessed数组是布尔类型的,用以判断该节点是否被访问过。然后找到最小的元素作为左孩子,再找最小的元素作为右孩子。再建一个新的节点作为父节点并且依次连接起来。之后再有一个getroot()方法以获取根节点,根节点就是最后一个节点即nodes[nodes.length - 1]。

第 30 天: Huffman 编码 (编码与解码)

开头的前序遍历仅做调试用,依然是递归的方法,不再贴出。对于Huffman的编码我们也知道是对于不同分支,即左右孩子来说,可以规定左边为0右边为1,并且是从第一个最小的节点向上攀爬,进行遍历。

public void generateCodes() {
		huffmanCodes = new String[alphabetLength];
		HuffmanNode tempNode;
		for (int i = 0; i < alphabetLength; i++) {
			tempNode = nodes[i];
			// Use tempCharCode instead of tempCode such that it is unlike
			// tempNode.
			// This is an advantage of long names.
			String tempCharCode = "";
			while (tempNode.parent != null) {
				if (tempNode == tempNode.parent.leftChild) {
					tempCharCode = "0" + tempCharCode;
				} else {
					tempCharCode = "1" + tempCharCode;
				} // Of if

				tempNode = tempNode.parent;
			} // Of while

			huffmanCodes[i] = tempCharCode;
			System.out.println("The code of " + alphabet[i] + " is " + tempCharCode);
		} // Of for i
	}// Of generateCodes

之后相应的就是替换源码工作,将我们的编码逐一代替原字符,以及解码操作:

  1. 编码是从叶节点到根节点, 解码就是反过来.
  2. 解码获得原先的字符串, 就验证正确性了.

/**
	 *********************
	 * Encode the given string.
	 * 
	 * @param paraString
	 *            The given string.
	 *********************
	 */
	public String coding(String paraString) {
		String resultCodeString = "";

		int tempIndex;
		for (int i = 0; i < paraString.length(); i++) {
			// From the original char to the location in the alphabet.
			tempIndex = charMapping[(int) paraString.charAt(i)];

			// From the location in the alphabet to the code.
			resultCodeString += huffmanCodes[tempIndex];
		} // Of for i
		return resultCodeString;
	}// Of coding

	/**
	 *********************
	 * Decode the given string.
	 * 
	 * @param paraString
	 *            The given string.
	 *********************
	 */
	public String decoding(String paraString) {
		String resultCodeString = "";

		HuffmanNode tempNode = getRoot();

		for (int i = 0; i < paraString.length(); i++) {
			if (paraString.charAt(i) == '0') {
				tempNode = tempNode.leftChild;
				System.out.println(tempNode);
			} else {
				tempNode = tempNode.rightChild;
				System.out.println(tempNode);
			} // Of if

			if (tempNode.leftChild == null) {
				System.out.println("Decode one:" + tempNode);
				// Decode one char.
				resultCodeString += tempNode.character;

				// Return to the root.
				tempNode = getRoot();
			} // Of if
		} // Of for i

		return resultCodeString;
	}// Of decoding

是要二刷的代码了,复杂程度是更上一层楼了,而且对于编码部分,感觉有点没理解到核心意思,或者说有点没看懂,几个数组一上来就搞得不知道字符和编码这些的区别了以及对应关系,有点混乱。其他对于建树包括如何编码这些没有什么问题,总之还有细节需要深究一下,还要花时间!

你可能感兴趣的:(java)