Here is a cheat sheet for converting Microsoft C or C++ source code to support Unicode. It does not attempt to explain much and presumes you are generally familiar with Microsoft's approach to Unicode. The goal is just to have a single place to look for names, correct spellings, etc. of relevant data types, functions, etc.
Streams are difficult in Microsoft C++. You may run into 3 types of problems:
Note: There aren't TCHAR equivalents for cout/wcout, cin/wcin, etc. You may want to make your own preprocessor definition for "tout", if you are compiling code both ways.
|
The Byte Order Marker (BOM) is Unicode character U+FEFF. (It can also represent a Zero Width No-break Space.) The code point U+FFFE is illegal in Unicode, and should never appear in a Unicode character stream. Therefore the BOM can be used in the first character of a file (or more generally a string), as an indicator of endian-ness. With UTF-16, if the first character is read as bytes FE FF then the text has the same endian-ness as the machine reading it. If the character is read as bytes FF FE, then the endian-ness is reversed and all 16-bit words should be byte-swapped as they are read-in. In the same way, the BOM indicates the endian-ness of text encoded with UTF-32. Note that not all files start with a BOM however. In fact, the Unicode Standard says that text that does not begin with a BOM MUST be interpreted in big-endian form. The character U+FEFF also serves as an encoding signature for the Unicode Encoding Forms. The table shows the encoding of U+FEFF in each of the Unicode encoding forms. Note that by definition, text labeled as UTF-16BE, UTF-32BE, UTF-32LE or UTF-16LE should not have a BOM. The endian-ness is indicated in the label. For text that is compressed with the SCSU (Standard Compression Scheme for Unicode) algorithm, there is also a recommended signature. |
ANSI | Wide | TCHAR |
---|---|---|
EOF | WEOF | _TEOF |
_environ | _wenviron | _tenviron |
_pgmptr | _wpgmptr | _tpgmptr |
ANSI | Wide | TCHAR |
---|---|---|
char | wchar_t | _TCHAR |
_finddata_t | _wfinddata_t | _tfinddata_t |
__finddata64_t | __wfinddata64_t | _tfinddata64_t |
_finddatai64_t | _wfinddatai64_t | _tfinddatai64_t |
int | wint_t | _TINT |
signed char | wchar_t | _TSCHAR |
unsigned char | wchar_t | _TUCHAR |
char | wchar_t | _TXCHAR |
L | _T or _TEXT | |
LPSTR (char *) |
LPWSTR (wchar_t *) |
LPTSTR (_TCHAR *) |
LPCSTR (const char *) |
LPCWSTR (const wchar_t *) |
LPCTSTR (const _TCHAR *) |
LPOLESTR (For OLE) |
LPWSTR | LPTSTR |
There are many Windows API that compile into ANSI or Wide forms, depending on whether the symbol UNICODE is defined. Modules that operate on both ANSI and Wide characters, need to be aware of this. Otherwise, using the Character Data Type-independent name requires no changes, just compile with the symbol UNICODE defined.
The following list is by no means all of the Character Data Type-dependent API, just some character and string related ones. Look in WinNLS.h for some code page and locale related API.
ANSI | Wide | Character Data Type- Independent Name |
---|---|---|
CharLowerA | CharLowerW | CharLower |
CharLowerBuffA | CharLowerBuffW | CharLowerBuff |
CharNextA | CharNextW | CharNext |
CharNextExA | CharNextExW | CharNextEx |
CharPrevA | CharPrevW | CharPrev |
CharPrevExA | CharPrevExW | CharPrevEx |
CharToOemA | CharToOemW | CharToOem |
CharToOemBuffA | CharToOemBuffW | CharToOemBuff |
CharUpperA | CharUpperW | CharUpper |
CharUpperBuffA | CharUpperBuffW | CharUpperBuff |
CompareStringA | CompareStringW | CompareString |
FoldStringA | FoldStringW | FoldString |
GetStringTypeA | GetStringTypeW | GetStringType |
GetStringTypeExA | GetStringTypeExW | GetStringTypeEx |
IsCharAlphaA | IsCharAlphaW | IsCharAlpha |
IsCharAlphaNumericA | IsCharAlphaNumericW | IsCharAlphaNumeric |
IsCharLowerA | IsCharLowerW | IsCharLower |
IsCharUpperA | IsCharUpperW | IsCharUpper |
LoadStringA | LoadStringW | LoadString |
lstrcatA | lstrcatW | lstrcat |
lstrcmpA | lstrcmpW | lstrcmp |
lstrcmpiA | lstrcmpiW | lstrcmpi |
lstrcpyA | lstrcpyW | lstrcpy |
lstrcpynA | lstrcpynW | lstrcpyn |
lstrlenA | lstrlenW | lstrlen |
OemToCharA | OemToCharW | OemToChar |
OemToCharBuffA | OemToCharBuffW | OemToCharBuff |
wsprintfA | wsprintfW | wsprintf |
wvsprintfA | wvsprintfW | wvsprintf |
Functions sorted by ANSI name, for ease of converting to Unicode.
ANSI | Wide | TCHAR |
---|---|---|
_access | _waccess | _taccess |
_atoi64 | _wtoi64 | _tstoi64 |
_atoi64 | _wtoi64 | _ttoi64 |
_cgets | _cgetws | cgetts |
_chdir | _wchdir | _tchdir |
_chmod | _wchmod | _tchmod |
_cprintf | _cwprintf | _tcprintf |
_cputs | _cputws | _cputts |
_creat | _wcreat | _tcreat |
_cscanf | _cwscanf | _tcscanf |
_ctime64 | _wctime64 | _tctime64 |
_execl | _wexecl | _texecl |
_execle | _wexecle | _texecle |
_execlp | _wexeclp | _texeclp |
_execlpe | _wexeclpe | _texeclpe |
_execv | _wexecv | _texecv |
_execve | _wexecve | _texecve |
_execvp | _wexecvp | _texecvp |
_execvpe | _wexecvpe | _texecvpe |
_fdopen | _wfdopen | _tfdopen |
_fgetchar | _fgetwchar | _fgettchar |
_findfirst | _wfindfirst | _tfindfirst |
_findnext64 | _wfindnext64 | _tfindnext64 |
_findnext | _wfindnext | _tfindnext |
_findnexti64 | _wfindnexti64 | _tfindnexti64 |
_fputchar | _fputwchar | _fputtchar |
_fsopen | _wfsopen | _tfsopen |
_fullpath | _wfullpath | _tfullpath |
_getch | _getwch | _gettch |
_getche | _getwche | _gettche |
_getcwd | _wgetcwd | _tgetcwd |
_getdcwd | _wgetdcwd | _tgetdcwd |
_ltoa | _ltow | _ltot |
_makepath | _wmakepath | _tmakepath |
_mkdir | _wmkdir | _tmkdir |
_mktemp | _wmktemp | _tmktemp |
_open | _wopen | _topen |
_popen | _wpopen | _tpopen |
_putch | _putwch | _puttch |
_putenv | _wputenv | _tputenv |
_rmdir | _wrmdir | _trmdir |
_scprintf | _scwprintf | _sctprintf |
_searchenv | _wsearchenv | _tsearchenv |
_snprintf | _snwprintf | _sntprintf |
_snscanf | _snwscanf | _sntscanf |
_sopen | _wsopen | _tsopen |
_spawnl | _wspawnl | _tspawnl |
_spawnle | _wspawnle | _tspawnle |
_spawnlp | _wspawnlp | _tspawnlp |
_spawnlpe | _wspawnlpe | _tspawnlpe |
_spawnv | _wspawnv | _tspawnv |
_spawnve | _wspawnve | _tspawnve |
_spawnvp | _wspawnvp | _tspawnvp |
_spawnvpe | _wspawnvpe | _tspawnvpe |
_splitpath | _wsplitpath | _tsplitpath |
_stat64 | _wstat64 | _tstat64 |
_stat | _wstat | _tstat |
_stati64 | _wstati64 | _tstati64 |
_strdate | _wstrdate | _tstrdate |
_strdec | _wcsdec | _tcsdec |
_strdup | _wcsdup | _tcsdup |
_stricmp | _wcsicmp | _tcsicmp |
_stricoll | _wcsicoll | _tcsicoll |
_strinc | _wcsinc | _tcsinc |
_strlwr | _wcslwr | _tcslwr |
_strncnt | _wcsncnt | _tcsnbcnt |
_strncnt | _wcsncnt | _tcsnccnt |
_strncnt | _wcsncnt | _tcsnccnt |
_strncoll | _wcsncoll | _tcsnccoll |
_strnextc | _wcsnextc | _tcsnextc |
_strnicmp | _wcsnicmp | _tcsncicmp |
_strnicmp | _wcsnicmp | _tcsnicmp |
_strnicoll | _wcsnicoll | _tcsncicoll |
_strnicoll | _wcsnicoll | _tcsnicoll |
_strninc | _wcsninc | _tcsninc |
_strnset | _wcsnset | _tcsncset |
_strnset | _wcsnset | _tcsnset |
_strrev | _wcsrev | _tcsrev |
_strset | _wcsset | _tcsset |
_strspnp | _wcsspnp | _tcsspnp |
_strtime | _wstrtime | _tstrtime |
_strtoi64 | _wcstoi64 | _tcstoi64 |
_strtoui64 | _wcstoui64 | _tcstoui64 |
_strupr | _wcsupr | _tcsupr |
_tempnam | _wtempnam | _ttempnam |
_ui64toa | _ui64tow | _ui64tot |
_ultoa | _ultow | _ultot |
_ungetch | _ungetwch | _ungettch |
_unlink | _wunlink | _tunlink |
_utime64 | _wutime64 | _tutime64 |
_utime | _wutime | _tutime |
_vscprintf | _vscwprintf | _vsctprintf |
_vsnprintf | _vsnwprintf | _vsntprintf |
asctime | _wasctime | _tasctime |
atof | _wtof | _tstof |
atoi | _wtoi | _tstoi |
atoi | _wtoi | _ttoi |
atol | _wtol | _tstol |
atol | _wtol | _ttol |
character compare | Maps to macro or inline function | _tccmp |
character copy | Maps to macro or inline function | _tccpy |
character length | Maps to macro or inline function | _tclen |
ctime | _wctime | _tctime |
fgetc | fgetwc | _fgettc |
fgets | fgetws | _fgetts |
fopen | _wfopen | _tfopen |
fprintf | fwprintf | _ftprintf |
fputc | fputwc | _fputtc |
fputs | fputws | _fputts |
freopen | _wfreopen | _tfreopen |
fscanf | fwscanf | _ftscanf |
getc | getwc | _gettc |
getchar | getwchar | _gettchar |
getenv | _wgetenv | _tgetenv |
gets | getws | _getts |
isalnum | iswalnum | _istalnum |
isalpha | iswalpha | _istalpha |
isascii | iswascii | _istascii |
iscntrl | iswcntrl | _istcntrl |
isdigit | iswdigit | _istdigit |
isgraph | iswgraph | _istgraph |
islead (Always FALSE) | (Always FALSE) | _istlead |
isleadbyte (Always FALSE) | isleadbyte (Always FALSE) | _istleadbyte |
islegal (Always TRUE) | (Always TRUE) | _istlegal |
islower | iswlower | _istlower |
isprint | iswprint | _istprint |
ispunct | iswpunct | _istpunct |
isspace | iswspace | _istspace |
isupper | iswupper | _istupper |
isxdigit | iswxdigit | _istxdigit |
main | wmain | _tmain |
perror | _wperror | _tperror |
printf | wprintf | _tprintf |
putc | putwc | _puttc |
putchar | putwchar | _puttchar |
puts | _putws | _putts |
remove | _wremove | _tremove |
rename | _wrename | _trename |
scanf | wscanf | _tscanf |
setlocale | _wsetlocale | _tsetlocale |
sprintf | swprintf | _stprintf |
sscanf | swscanf | _stscanf |
strcat | wcscat | _tcscat |
strchr | wcschr | _tcschr |
strcmp | wcscmp | _tcscmp |
strcoll | wcscoll | _tcscoll |
strcpy | wcscpy | _tcscpy |
strcspn | wcscspn | _tcscspn |
strerror | _wcserror | _tcserror |
strftime | wcsftime | _tcsftime |
strlen | wcslen | _tcsclen |
strlen | wcslen | _tcslen |
strncat | wcsncat | _tcsncat |
strncat | wcsncat | _tcsnccat |
strncmp | wcsncmp | _tcsnccmp |
strncmp | wcsncmp | _tcsncmp |
strncpy | wcsncpy | _tcsnccpy |
strncpy | wcsncpy | _tcsncpy |
strpbrk | wcspbrk | _tcspbrk |
strrchr | wcsrchr | _tcsrchr |
strspn | wcsspn | _tcsspn |
strstr | wcsstr | _tcsstr |
strtod | wcstod | _tcstod |
strtok | wcstok | _tcstok |
strtol | wcstol | _tcstol |
strtoul | wcstoul | _tcstoul |
strxfrm | wcsxfrm | _tcsxfrm |
system | _wsystem | _tsystem |
tmpnam | _wtmpnam | _ttmpnam |
tolower | towlower | _totlower |
toupper | towupper | _totupper |
ungetc | ungetwc | _ungettc |
vfprintf | vfwprintf | _vftprintf |
vprintf | vwprintf | _vtprintf |
vsprintf | vswprintf | _vstprintf |
WinMain | wWinMain | _tWinMain |
Copyright © 2003 Tex Texin. All rights reserved.
Send comments to Tex Texin
This page last updated 2003-11-13.
Top of page