warning C4819 explanation

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

You may meet below errors before:

error C2220: warning treated as error - no 'object' file generated

warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss

For the Compiler Error C2220:

warning treated as error - no 'file' file generated

/WX tells the compiler to treat all warnings as errors. Since an error occurred, no object or executable file was generated.

Possible solutions

  • Fix the problem that caused the warning.
  • Compile at a lower warning level.
  • Compile without /WX

warning C4819:

The file contains a character that cannot be represented in the current code page.


How does it detect invalid characters?

A few days ago I mentioned the new compiler error C4819 for C/C++. When I did so, I quoted the meaning of the error:

C4819 occurs when an ANSI source file is compiled on a system with a codepage that cannot represent all characters in the file.

A few people asked me how this was being detected.

There are many ways to do it, but the easiest is to call the MultiByteToWideChar API, using CP_ACP as the CodePage parameter and the MB_ERR_INVALID_CHARS flag.

Any time a byte value that is not part of the legal mapping in the codepage is found, the API will fail with a GetLastError return value of ERROR_NO_UNICODE_TRANSLATION.

It is important to note that this functionality is much more akin to that of the spellchecker in Microsoft Word than the thesaurus, in that it has no chance of detecting byte values that are valid for the code page but that make no sense.

Therefore if one attempts to use the strings L"Ελλάδα" and L"ελληνικά" on a machine with code page 1252 as its default will simply cause the compiler to assume you meant L"ÅëëÜäá" and L"åëëçíéêÜ".

The only time you will see the error is when a byte value is not a valid one, as per the tables listed at the Code Pages Supported by Windows site. Examples are the shaded cell choices, for example 0x8d and 0x8f in code page 1252, 0x8e or 0x90 in code page 1255, or 0x80 in code page 932.

As most of these code page tables are full, it is easily possible to fool the compiler (or more accurately to fool the NLS API; I hate to blame an innocent compiler for an error they could not detect!) into thinking the string is perfectly valid even if it essentially crap like L"éùøàì" / L"òáøéú" for L"ישראל" / "עברית" (cp1255). Or L"ÇáããáßÉ ÇáÚÑÈíÉ ÇáÓÚæÏíÉ" / L"ÇáÚÑÈíÉ" for L"المملكة العربية السعودية" / L"العربية" (cp1256).

And all of the examples I gave assumed you had a machine with a CP_ACP value of 1252. The same problems can be seen in any cross-codepage situation, such as returning L"πσρρκθι" when what was meant was L"русский" (cp1253 rather than cp1251). Or L"¤¤¤å(ÁcÅé)" rather than L"中文(繁體)" (cp1254 rather than cp950).

I could go on but you probably get the point; one would really have to rely on the invalid sequences, and some code pages (like 1252) do not have very many. Saving the file as a Unicode (meaning UTF-16LE) file might be the best way to avoid the potential bugs that come up later with these nonsense strings being propogated to your application.

转载于:https://my.oschina.net/u/169988/blog/173876

你可能感兴趣的:(warning C4819 explanation)