十六进制解码_快速十六进制编码和解码

十六进制解码

In

Easy String Encryption Using CryptoAPI in C++ I described how to encrypt text and recommended that the encrypted text be stored as a series of hexadecimal digits -- because cyphertext may contain embedded NULLs or other characters that can cause processing problems. 在C ++中使用CryptoAPI进行简单的字符串加密我描述了如何加密文本,并建议将加密的文本存储为一系列十六进制数字-因为密文可能包含嵌入的NULL或其他可能导致处理问题的字符。

Since then I've worked on an application in which large blocks of binary data must be encoded into hexadecimal rapidly, and when needed, decoded back to binary just as rapidly.  In the earlier article, I provided a sort of "throw-away" hex encoder/decoder, but I felt that I needed something that was more efficient for production work.

从那时起,我就致力于一个应用程序,在该应用程序中,必须将大型二进制数据块快速编码为十六进制,并在需要时将其解码为二进制。 在较早的文章中,我提供了一种“扔掉”的十六进制编码器/解码器,但我觉得我需要对生产工作更加有效的东西。

In this article, I'll explore some hex encode/decode options, including the one supported by the CryptoAPI, and at the end of the article, I'll present the ultra-fast functions that ended up in my production code.

在本文中,我将探讨一些十六进制编码/解码选项,包括。在本文的结尾,我将介绍最终在生产代码中使用的超快速功能。

Observe the ordering of characters in an ASCII Table The only real issue with hex encoding is that there is a gap between the 9 and the A.  Most of the complications come when dealing with that.

观察ASCII表中字符的顺序十六进制编码的唯一真正问题是9A之间存在间隙。 大多数并发症是在处理这些问题时出现的。

How to Encode to Hexadecimal

如何编码为十六进制

Your binary source byte is 8 bits.  Take the data 4 bits at a time (values 0-15).  If that value is 0-9, then just add '0' (0x30).  If the value is 10-15, then add 0x37 instead.  That gives you a hex digit ('0', '1',...,'9','A',..., 'F').  Output the resulting high hex digit followed by the low hex digit.

您的二进制源字节为8位。 一次取4位数据(值0-15)。 如果该值为0-9,则只需添加“ 0”(0x30)。 如果值为10-15,则添加0x37。 这会给您一个十六进制数字(“ 0”,“ 1”,...,“ 9”,“ A”,...,“ F”)。 输出结果高位十六进制数字,然后输出低位十六进制数字。

void ToHexManual( BYTE* pSrc, int nSrcLen, CString& sDest )
{
	sDest="";
	for ( int j=0; j
		BYTE b1= *pSrc >> 4;   // hi nybble
		BYTE b2= *pSrc & 0x0f; // lo nybble
		b1+='0'; if (b1>'9') b1 += 7;  // gap between '9' and 'A'
		b2+='0'; if (b2>'9') b2 += 7;  
		sDest += b1; 
		sDest += b2; 
		pSrc++;
	}
}
void ToHexSlow( BYTE* pSrc, int nSrcLen, CString& sDest )
{
	sDest="";
	for ( int j=0; j
		sDest.AppendFormat( "%02x", int(*pSrc) );
		pSrc++;
	}
}
significant problem when the output is large.  Let me explain...

Significant Problem with Long Strings

长字符串的重大问题

Each time you concatenate to an existing CString (or an STL string object), the code must check to see if that current allocation is large enough.  If not, it must allocate a new chunk that is larger, and copy the existing contents to that location before appending the new text.  Imagine how clumsy that gets when the string is about 1MB long!

每次连接到现有CString(或STL字符串对象)时,代码都必须检查以查看当前分配是否足够大。 如果不是,它必须分配一个更大的新块,并在追加新文本之前将现有内容复制到该位置。 想象一下,当字符串长约1MB时,会变得多么笨拙!

We'll look at the efficient version that avoids these problems in a minute.  

我们将看一眼能避免这些问题的高效版本。

How to Decode from Hexadecimal

如何从十六进制解码

Your textual source data is composed of pairs of hex digits, for instance "72" or "0F' or "D3".  Take  the first digit and subtract '0' (0x30).  If the result is greater than 9 (it's 'A','B',...,'F'), then subtract 7, resulting in a value between 0 and 15.  Do the same for the second digit.  Multiply the first by 16 and add it to the second.  Output the resulting 8-bit value.

您的文本源数据由成对的十六进制数字组成,例如“ 72”或“ 0F”或“ D3”。取第一个数字并减去“ 0”(0x30)。如果结果大于9(则为“ A” ','B',...,'F'),然后减去7,得到一个介于0和15之间的值。对第二个数字进行相同的操作。将第一个数字乘以16,然后将其加到第二个数字。结果为8位值。

void FromHexManual( LPCSTR pSrc, int nSrcLen, BYTE* pDest )
{
	for ( int j=0; j
		BYTE b1= pSrc[j]   -'0'; if (b1>9) b1 -= 7;
		BYTE b2= pSrc[j+1] -'0'; if (b2>9) b2 -= 7;
		*pDest++ = (b1<<4) + b2;  // <<4 multiplies by 16
	}
}
void FromHexSlow( LPCSTR pSrc, int nSrcLen, BYTE* pDest )
{
	char sTmp[3]="xx";
	int n;
	for ( int j=0; j
		sTmp[0]= pSrc[0];
		sTmp[1]= pSrc[1];
		sscanf( sTmp, "%x", &n );
		pSrc += 2;
		*pDest++ = (BYTE)n;
	}
}

The CryptoAPI Hex Conversion Functions

CryptoAPI十六进制转换功能

None of the above routines provided the kind of speed that I was hoping to achieve.   After some research, I found a couple of functions in the CryptoAPI toolkit that convert to and from hex:

上述例程都没有提供我希望达到的速度。 经过一番研究,我在CryptoAPI工具箱中发现了一些可与十六进制转换的函数:

   CryptBinaryToString  and  CryptStringToBinary

CryptBinaryToStringCryptStringToBinary

and I tried them:

我尝试了它们:

#pragma comment( lib, "Crypt32.lib" )
#include 
void ToHexCryptoAPI( BYTE* pSrc, int nSrcLen, CString& sDest )
{
    DWORD nOutLen= (nSrcLen*3)+4; // xx + etc/
    char* pBuf= new char[nOutLen];
    BOOL fRet= CryptBinaryToString( pSrc,nSrcLen, CRYPT_STRING_HEX, pBuf, &nOutLen ); 
    sDest= pBuf;
    delete pBuf;
}
int FromHexCryptoAPI( LPCSTR pSrc, int nSrcLen, BYTE* pDest )
{
    DWORD nOutLen= nSrcLen/2;
    BOOL fRet= CryptStringToBinary( pSrc, nSrcLen, CRYPT_STRING_HEX, pDest, &nOutLen, 0, 0);
    return( (int)nOutLen );
}

CryptBinaryToString appears to be formatted for screen output -- spaces between each pair of hex digits, embedded tab characters, CRLF at the end, etc.  The two functions above do work, and there certainly is value in using standard API functions, but the performance was lackluster, and I didn't want the overhead of the extra embedded characters.

CryptBinaryToString似乎是为屏幕输出设置的格式-每对十六进制数字之间的空格,嵌入的制表符,末尾的CRLF等。上面的两个函数

[step="" title="Finally --  The Ultra-Fast Code I Promised!"][/step]Lookup Tables Rather Than Calculations

[step =“” title =“最后-我承诺的超快速代码!”] [/ step] 查找表而不是计算表

If you look at the assembly-language output from any of the above routines, you will see conditional jumps and a significant number of calculations.   Modern processors are fast, using multiple pipelines to handle conditionals, but there is nothing faster than using a lookup table -- especially when the table is in L1 processor cache... the table access is nearly as fast as register access!

如果您查看以上任何例程的汇编语言输出,您将看到条件跳转和大量计算。 现代处理器速度很快,使用多个管道来处理条件,但是比使用查找表更快的速度了-尤其是当该表位于L1处理器高速缓存中时...表访问几乎与寄存器访问一样快!

So...

所以...

For binary-to-hex, I used a table that contains all possible output values for any byte of data.  This is very straightforward -- the data itself is the index into the table.  I use a WORD pointer to grab 16 bits (two hex digits) at a time.

对于二进制到十六进制 ,我使用了一个表,其中包含任何数据字节的所有可能的输出值。 这非常简单-数据本身就是表的索引。 我使用WORD指针一次抓取16位(两个十六进制数字)。

For hex-to-binary, I realized that I could use a lookup table that contains all possible hexadecimal digit values.   Rather than compensate for the "gap" between '9'' and 'A', I just left empty spots in the table.   As a bonus, I left an extra gap between "F" and "a" -- so that the same routine could convert the hexadecimal data regardless of whether it contains A,B,C,D,E,F or a,b,c,d,e,f.

对于十六进制到二进制 ,我意识到我可以使用包含所有可能的十六进制数字值的查找表。 我没有补偿“ 9”和“ A”之间的“差距”,而是在表中留了空白点。 另外,我在“ F”和“ a”之间留了一个额外的空格-以便同一例程可以转换十六进制数据,而不管它是否包含A,B,C,D,E,F或a,b, c,d,e,f

BYTE HexLookup[513]= {
	"000102030405060708090a0b0c0d0e0f"
	"101112131415161718191a1b1c1d1e1f"
	"202122232425262728292a2b2c2d2e2f"
	"303132333435363738393a3b3c3d3e3f"
	"404142434445464748494a4b4c4d4e4f"
	"505152535455565758595a5b5c5d5e5f"
	"606162636465666768696a6b6c6d6e6f"
	"707172737475767778797a7b7c7d7e7f"
	"808182838485868788898a8b8c8d8e8f"
	"909192939495969798999a9b9c9d9e9f"
	"a0a1a2a3a4a5a6a7a8a9aaabacadaeaf"
	"b0b1b2b3b4b5b6b7b8b9babbbcbdbebf"
	"c0c1c2c3c4c5c6c7c8c9cacbcccdcecf"
	"d0d1d2d3d4d5d6d7d8d9dadbdcdddedf"
	"e0e1e2e3e4e5e6e7e8e9eaebecedeeef"
	"f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"
};
void ToHex( BYTE* pSrc, int nSrcLen, CString& sDest )
{
	WORD* pwHex=  (WORD*)HexLookup;
	WORD* pwDest= (WORD*)sDest.GetBuffer((nSrcLen*2)+1);

	for (int j=0; j
		*pwDest= pwHex[*pSrc];
		pwDest++; pSrc++;
	}
	*((BYTE*)pwDest)= 0;  // terminate the string
	sDest.ReleaseBuffer((nSrcLen*2)+1);
}
BYTE DecLookup[] = {
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // gap before first hex digit
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
	0,1,2,3,4,5,6,7,8,9,       // 0123456789
	0,0,0,0,0,0,0,             // :;<=>?@ (gap)
	10,11,12,13,14,15,         // ABCDEF 
	0,0,0,0,0,0,0,0,0,0,0,0,0, // GHIJKLMNOPQRS (gap)
	0,0,0,0,0,0,0,0,0,0,0,0,0, // TUVWXYZ[/]^_` (gap)
	10,11,12,13,14,15          // abcdef 
};
void FromHex( BYTE* pSrc, int nSrcLen, BYTE* pDest )
{
	for (int j=0; j
		int d =  DecLookup[*pSrc++ ] << 4;
		    d |= DecLookup[*pSrc++ ];
		*pDest++ = (BYTE)d;
	}
}

Notes:

笔记:

In the final (ultra-fast)

决赛(超快)

ToHex function, note the usage of ToHex函数,请注意 s.GetBuffer() and s.GetBuffer()s.ReleaseBuffer().  The function allocates the entire output buffer before it starts.  This avoids that s.ReleaseBuffer()的用法。 该函数在启动之前分配整个输出缓冲区。 这样可以避免前面提到的 Significant Problem with Long Strings that I mentioned earlier.  I could have allocated a separate buffer and set the result string to it as a final step, but that's one (possibly large) memory copy that I wanted to avoid.  By using GetBuffer, I allocate and "freeze" the CString buffer while I'm filling it using high-speed pointer operations.

All of the above functions would be safer if they had some error handling.  For instance, the FromHex functions all assume that the output buffer is large enough to hold the resulting binary data.  I omitted that but you might want to be more careful.  Note the technique used in the CryptoAPI functions, and emulate what Microsoft has done.

如果上述所有功能都进行了一些错误处理,它们将更加安全。 例如,FromHex函数全部假定输出缓冲区足够大,可以容纳生成的二进制数据。 我忽略了这一点,但您可能需要更加小心。 请注意CryptoAPI函数中使用的技术,并模拟Microsoft所做的事情。

Also, none of the FromHex

另外,FromHex都不

Xxxx routines take any special action when encountering an invalid hex digit; in fact, the fastest one may fail with an access violation if it indexes beyond the end of the lookup table.  In my case (and to avoid complicating this article) I assumed that I'd always have valid hexadeximal data.  You may not want to make that assumption.

The ToHexManual, ToHexSlow, FromHexManual, and FromHexSlow all assume that the letter digits are uppercase (ABCDEF). The FromHex

ToHexManual,ToHexSlow,FromHexManual和FromHexSlow均假定字母数字为大写(ABCDEF)。 FromHex

如果找到小写的十六进制数字,则它们的 Xxxx versions of these will fail if lowercase hex digits are found.  The CryptoAPI functions use lowercase.

The ToHex (fast) function generates lowercase letter-digits (abcdef) and FromHex (fast) will work with either uppercase or lowercase.  If you want uppercase output from ToHex, you can select the entire table in Visual Stuido, then use Edit/Advanced/Make Uppercase (If you want the option of using either, just create two tables).

ToHex(快速)功能生成小写字母数字(abcdef),FromHex(快速)将使用大写或小写字母。 如果要从ToHex输出大写字母,可以在Visual Stuido中选择整个表,然后使用“编辑/高级/制作大写字母”(如果要选择使用其中一个,只需创建两个表)。

The timing test results are interesting.   I read in a large (1MB) binary file (a ZIP file), then started the timer and ran 10 iterations of each function. Approx Ratio Debug Release (Release) ToHexManual 22,297 20,985 446 : 1 ToHexSlow 34,016 37,891 806 : 1 ToHexCryptoAPI 3,981 4,056 86 : 1 ToHex (fast) 172 47 1 : 1 FromHexManual 110 70 1.6 : 1 FromHexSlow 11,457 2,360 52 : 1 FromHexCryptoAPI 500 500 11 : 1 FromHex (fast) 47 45 1 : 1All times are in milliseconds, based on the less-than-precise GetTickCount() function.   I also ran a 100-iteration test on each to verify and got similar performance ratios.  The really big numbers for ToHexManual and ToHexSlow are almost certainly caused, at least in part, by the

Significant Problem with Long Strings issue (which they both use).  

The FromHexManual speed was faster than I thought it would be.  Upon examining the compiler-generated ASM source code, I found that the compiler had "inlined" the function.  Also, it seems that the calculate-and-jump-conditionally sequence is handled very efficiently on my AMD-based PC -- it's only about 50% slower than the lookup-table technique.


In any case, the lookup-table versions proved to be significantly faster than the others.

时序测试结果很有趣。 我读了一个大(1MB)二进制文件(一个ZIP文件),然后启动了计时器并为每个函数运行了10次迭代。 近似比率调试版本(发布)ToHexManual 22,297 20,985 446:1 ToHexSlow 34,016 37,891 806:1 ToHexCryptoAPI 3,981 4,056 86:1 ToHex(快速)172 47 1:1 FromHexManual 110 70 1.6:1 FromHexSlow 11,457 2,360 52:1 FromHexCryptoAPI 500 :1 FromHex(快速)47 45 1:1所有时间均以毫秒为单位,基于精确度不到的GetTickCount()函数。 我还对每个对象进行了100次迭代测试,以验证并获得相似的性能比。 几乎可以肯定,ToHexManual和ToHexSlow的真正大数字至少部分是由


FromHexManual的速度比我想象的要快。 检查编译器生成的ASM源代码后,我发现编译器已“内联”了该函数。 另外,似乎计算和跳转条件 在基于AMD的PC上,onally序列的处理效率很高-仅比查找表技术慢50%。


无论如何,事实证明查找表版本比其他版本要快得多。

[i]References:[/i]

[i] 参考:[/ i]

CryptBinaryToString Function

CryptBinaryToString函数

http://msdn.microsoft.com/en-us/library/aa379887(VS.85).aspx http://msdn.microsoft.com/zh-CN/library/aa379887(VS.85).aspx

CryptStringToBinary Function

CryptStringToBinary函数

http://msdn.microsoft.com/en-us/library/aa380285(VS.85).aspx http://msdn.microsoft.com/zh-CN/library/aa380285(VS.85).aspx

How fast is your ASM hex converter?

您的ASM十六进制转换器有多快?

https://www.experts-exchange.com/Programming/Languages/Assembly/Q_20272901.html https://www.experts-exchange.com/Programming/Languages/Assembly/Q_20272901.html

This Experts-Exchange "question" was really a challenge to create a particular kind of hexadecimal output in intel ASM.  The lookup table technique won out easily... until some smarty pants!!! figured out how to use MMX opcodes and handle larger chunks of data.   Interesting reading, and a lot of fun for the participants!

在Intel ASM中创建特定类型的十六进制输出确实是一个专家交流的“问题”。 查找表技术很容易胜出...直到有些

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=

If you liked this article and want to see more from this author,  please click the Yes button near the:

如果您喜欢这篇文章,并希望从该作者那里获得更多信息,请单击旁边的按钮:

      Was this article helpful?

本文是否有帮助?

label that is just below and to the right of this text.   Thanks!

此文字下方和右侧的标签。

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=-=- =-=-=-=

翻译自: https://www.experts-exchange.com/articles/1290/Fast-Hexadecimal-Encode-and-Decode.html

十六进制解码

你可能感兴趣的:(十六进制解码_快速十六进制编码和解码)