基本概念
Base64这个术语最初是在“MIME内容传输编码规范”中提出的。Base64不是一种加密算法,虽然编码后的字符串看起来有点加密的赶脚。它实际上是一种“二进制到文本”的编码方法,它能够将给定的任意二进制数据转换(映射)为ASCII字符串的形式,以便在只支持文本的环境中也能够顺利地传输二进制数据。例如支持MIME的电子邮件应用,或需要在XML中存储复杂数据(例如图片)时。
要实现Base64,首先需要选取适当的64个字符组成字符集。一条通用的原则是从某种常用字符集中选取64个可打印字符,这样就能避免在传输过程中丢失数据(不可打印字符在传输过程中可能会被当做特殊字符处理,从而导致丢失)。例如,MIME的Base64实现选用了大写字母、小写字母和0~9的数字作为前62个字符。其他实现通常会沿用MIME的这种方式,而仅仅在最后2个字符上有所不同,例如UTF-7编码。
一个例子
下面这段文本:
-
Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
通过MIME Base64进行转换后就成为:
-
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
转换方法
以例子开头的“Man”被转换为“TWFu”为例,我们来看看Base64基本的转换过程:
- M、a和n的ASCII编码分别为01001101、01100001和01101110,合并后得到一个24位的二进制串010011010110000101101110
- 按每6位一组将其分为4组:010011、010110、000101、101110
- 最后按对应关系从字符集中取出4个字符(即T、W、F、u)作为结果(本文后面列出了由MIME定义的字符集)。
Base64的基本思想就是这么简单:它将每3个字节(24位)转换为4个字符。因为6位二进制数可以表示64个不同的数,因此只要确定了字符集(含64个字符),并为其中的每个字符确定一个唯一的编码,就可以通过正向与反向映射将二进制字节转换为Base64编码或反之。
补零处理
通过不断将每3个字节转换为4个Base64字符之后,最后可能会出现以下3种情况之一:
- 没有字节剩下
- 还剩下1个字节
- 还剩下2个字节
1没什么好说的。后面的2和3该如何处理呢?
遇到这种情况,就需要在剩下的字节后面补零,直到其位数能够被6整除(因为Base64是对每6位进行编码的)。假如还剩下1个字节,即8位,那么需要再补4个0使其成为12位,这样就可以分为2组了;如果剩下2个字节,即16位,那么只需要再补2个0(18位)就可以分成3组了。最后再用普通方法做映射即可。
填充
还原时,依次将每4个字符还原成3个字节,最后会出现3种情况之一:
- 没有字符剩下
- 还剩下2个字符
- 还剩下3个字符
这3种情况与上面的3种情况一一对应,只要对补零的过程反过来处理,就可以原样还原了。
我们经常会在Base64编码字符串中看到最后有“=”字符,这就是通过填充生成的。填充就是当出现编码时的情况2和3时,在后面补上“=”字符,使编码后的字符数为4的倍数。
所以我们可以很容易地想到,情况2,即还剩下1个字节时,需要补2个“=”,因为此时最后一个字节编码为2个字符,补上2个“=”正好凑够4个。情况3同理,需要补1个“=”。
填充不是必须的,因为无需填充也可以通过编码后的内容计算出缺失的字节。所以在一些实现中填充是必须的,有些却不是。一种必须使用填充的场合是当需要将多个Base64编码文件合并为一个文件的时候。
实现(示例)
下面是一个Base64字符集,它包含大写字母、小写字母和数字,以及“+”和“/”符号。
编码 | 字符 | 编码 | 字符 | 编码 | 字符 | 编码 | 字符 | |||
---|---|---|---|---|---|---|---|---|---|---|
0 | A |
16 | Q |
32 | g |
48 | w |
|||
1 | B |
17 | R |
33 | h |
49 | x |
|||
2 | C |
18 | S |
34 | i |
50 | y |
|||
3 | D |
19 | T |
35 | j |
51 | z |
|||
4 | E |
20 | U |
36 | k |
52 | 0 |
|||
5 | F |
21 | V |
37 | l |
53 | 1 |
|||
6 | G |
22 | W |
38 | m |
54 | 2 |
|||
7 | H |
23 | X |
39 | n |
55 | 3 |
|||
8 | I |
24 | Y |
40 | o |
56 | 4 |
|||
9 | J |
25 | Z |
41 | p |
57 | 5 |
|||
10 | K |
26 | a |
42 | q |
58 | 6 |
|||
11 | L |
27 | b |
43 | r |
59 | 7 |
|||
12 | M |
28 | c |
44 | s |
60 | 8 |
|||
13 | N |
29 | d |
45 | t |
61 | 9 |
|||
14 | O |
30 | e |
46 | u |
62 | + |
|||
15 | P |
31 | f |
47 | v |
63 | / |
利用这个字符集我们可以写一个简单的Base64实现(本文最后附有完整源代码):
下面这个encode()方法用来将Java字符串转换为字节数组(Base64操作的是字节),然后调用真正的encode()方法完成编码:
public String encode(String inputStr, String charset, boolean padding) throws UnsupportedEncodingException { String encodeStr = null; byte[] bytes = inputStr.getBytes(charset); encodeStr = encode(bytes, padding); return encodeStr; }
encode()方法的核心代码是:
for (int i = 0; i < groups; i++) { byte_1 = bytes[3*i] & 0xFF; byte_2 = bytes[3*i+1] & 0xFF; byte_3 = bytes[3*i+2] & 0xFF; group_6bit_1 = byte_1 >>> 2; group_6bit_2 = (byte_1 & 0x03) << 4 | byte_2 >>> 4; group_6bit_3 = (byte_2 & 0x0F) << 2 | byte_3 >>> 6; group_6bit_4 = byte_3 & 0x3F; sb.append(CHARSET[group_6bit_1]) .append(CHARSET[group_6bit_2]) .append(CHARSET[group_6bit_3]) .append(CHARSET[group_6bit_4]); }
即将每3个字节转换为4个字符。
当然还需要判断最后是否还有剩余的字节,如果有要单独处理:
if (tail == 1) { byte_1 = bytes[bytes.length-1] & 0xFF; group_6bit_1 = byte_1 >>> 2; group_6bit_2 = (byte_1 & 0x03) << 4; sb.append(CHARSET[group_6bit_1]) .append(CHARSET[group_6bit_2]); if (padding) { sb.append('=').append('='); } } else if (tail == 2) { byte_1 = bytes[bytes.length-2] & 0xFF; byte_2 = bytes[bytes.length-1] & 0xFF; group_6bit_1 = byte_1 >>> 2; group_6bit_2 = (byte_1 & 0x03) << 4 | byte_2 >>> 4; group_6bit_3 = (byte_2 & 0x0F) << 2; sb.append(CHARSET[group_6bit_1]) .append(CHARSET[group_6bit_2]) .append(CHARSET[group_6bit_3]); if (padding) { sb.append('='); } }
decode过程是类似的,具体请自行查阅完整代码。
附:源程序
package base64; import java.io.UnsupportedEncodingException; /** * This class provides a simple implementation of Base64 encoding and decoding. * * @author QiaoMingkui * */ public class Base64 { /* * charset */ private static final char[] CHARSET = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/' }; /* * charset used to decode. */ private static final int[] DECODE_CHARSET = new int[128]; static { for (int i=0; i<64; i++) { DECODE_CHARSET[CHARSET[i]] = i; } } /** * A convenient method for encoding Java String, * it uses encode(byte[], boolean) to encode byte array. * * @param inputStr a string to be encoded. * @param charset charset name ("GBK" for example) that is used to convert inputStr into byte array. * @param padding whether using padding characters "=" * @return encoded string * @throws UnsupportedEncodingException if charset is unsupported */ public String encode(String inputStr, String charset, boolean padding) throws UnsupportedEncodingException { String encodeStr = null; byte[] bytes = inputStr.getBytes(charset); encodeStr = encode(bytes, padding); return encodeStr; } /** * Using Base64 to encode bytes. * * @param bytes byte array to be encoded * @param padding whether using padding characters "=" * @return encoded string */ public String encode(byte[] bytes, boolean padding) { // 4 6-bit groups int group_6bit_1, group_6bit_2, group_6bit_3, group_6bit_4; // bytes of a group int byte_1, byte_2, byte_3; // number of 3-byte groups int groups = bytes.length / 3; // at last, there might be 0, 1, or 2 byte(s) remained, // which needs to be encoded individually. int tail = bytes.length % 3; StringBuilder sb = new StringBuilder(groups * 4 + 4); // handle each 3-byte group for (int i = 0; i < groups; i++) { byte_1 = bytes[3*i] & 0xFF; byte_2 = bytes[3*i+1] & 0xFF; byte_3 = bytes[3*i+2] & 0xFF; group_6bit_1 = byte_1 >>> 2; group_6bit_2 = (byte_1 & 0x03) << 4 | byte_2 >>> 4; group_6bit_3 = (byte_2 & 0x0F) << 2 | byte_3 >>> 6; group_6bit_4 = byte_3 & 0x3F; sb.append(CHARSET[group_6bit_1]) .append(CHARSET[group_6bit_2]) .append(CHARSET[group_6bit_3]) .append(CHARSET[group_6bit_4]); } // handle last 1 or 2 byte(s) if (tail == 1) { byte_1 = bytes[bytes.length-1] & 0xFF; group_6bit_1 = byte_1 >>> 2; group_6bit_2 = (byte_1 & 0x03) << 4; sb.append(CHARSET[group_6bit_1]) .append(CHARSET[group_6bit_2]); if (padding) { sb.append('=').append('='); } } else if (tail == 2) { byte_1 = bytes[bytes.length-2] & 0xFF; byte_2 = bytes[bytes.length-1] & 0xFF; group_6bit_1 = byte_1 >>> 2; group_6bit_2 = (byte_1 & 0x03) << 4 | byte_2 >>> 4; group_6bit_3 = (byte_2 & 0x0F) << 2; sb.append(CHARSET[group_6bit_1]) .append(CHARSET[group_6bit_2]) .append(CHARSET[group_6bit_3]); if (padding) { sb.append('='); } } return sb.toString(); } /** * Decode a Base64 string to bytes (byte array). * * @param code Base64 string to be decoded * @return byte array */ public byte[] decode(String code) { char[] chars = code.toCharArray(); int group_6bit_1, group_6bit_2, group_6bit_3, group_6bit_4; int byte_1, byte_2, byte_3; int len = chars.length; // ignore last '='s if (chars[chars.length - 1] == '=') { len--; } if (chars[chars.length - 2] == '=') { len--; } int groups = len / 4; int tail = len % 4; // each group of characters (4 characters) will be converted into 3 bytes, // and last 2 or 3 characters will be converted into 1 or 2 byte(s). byte[] bytes = new byte[groups * 3 + (tail > 0 ? tail - 1 : 0)]; int byteIdx = 0; // decode each group for (int i=0; i) { group_6bit_1 = DECODE_CHARSET[chars[4*i]]; group_6bit_2 = DECODE_CHARSET[chars[4*i + 1]]; group_6bit_3 = DECODE_CHARSET[chars[4*i + 2]]; group_6bit_4 = DECODE_CHARSET[chars[4*i + 3]]; byte_1 = group_6bit_1 << 2 | group_6bit_2 >>> 4; byte_2 = (group_6bit_2 & 0x0F) << 4 | group_6bit_3 >>> 2; byte_3 = (group_6bit_3 & 0x03) << 6 | group_6bit_4; bytes[byteIdx++] = (byte) byte_1; bytes[byteIdx++] = (byte) byte_2; bytes[byteIdx++] = (byte) byte_3; } // decode last 2 or 3 characters if (tail == 2) { group_6bit_1 = DECODE_CHARSET[chars[len - 2]]; group_6bit_2 = DECODE_CHARSET[chars[len - 1]]; byte_1 = group_6bit_1 << 2 | group_6bit_2 >>> 4; bytes[byteIdx] = (byte) byte_1; } else if (tail == 3) { group_6bit_1 = DECODE_CHARSET[chars[len - 3]]; group_6bit_2 = DECODE_CHARSET[chars[len - 2]]; group_6bit_3 = DECODE_CHARSET[chars[len - 1]]; byte_1 = group_6bit_1 << 2 | group_6bit_2 >>> 4; byte_2 = (group_6bit_2 & 0x0F) << 4 | group_6bit_3 >>> 2; bytes[byteIdx++] = (byte) byte_1; bytes[byteIdx] = (byte) byte_2; } return bytes; } /** * Test. * @param args */ public static void main(String[] args) { Base64 base64 = new Base64(); String str = "Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."; System.out.println(str); try { String encodeStr = base64.encode(str, "GBK", false); System.out.println(encodeStr); byte[] decodeBytes = base64.decode(encodeStr); String decodeStr = new String(decodeBytes, "GBK"); System.out.println(decodeStr); } catch (UnsupportedEncodingException e) { e.printStackTrace(); } } }
转载自:
Base64编码简介