Many C++ Windows programmers getconfused over what bizarre identifierslike TCHAR
,LPCTSTR
are.In this article, I would attempt by best to clear out thefog.
In general, a character can be represented in 1 byte or 2 bytes.Let's say 1-byte character is ANSI character - all Englishcharacters are represented throughthis encoding. And let's say a 2-bytecharacter is Unicode, which can represent ALL languages in theworld.
Visual C++ compiler supports char
and wchar_t
asnative data-types for ANSI and Unicode characters respectively.Though there is more concrete definitionof Unicode, but for understanding assumeit as two-byte character which Windows OS uses for multiplelanguage support.
What if you want your C/C++ code to be independent of characterencoding/mode used?
Suggestion: Use generic data-types and names torepresent characters and string.
For example, instead of replacing:
char cResponse; // 'Y' or 'N' char sUsername[64]; // str* functions
with
wchar_t cResponse; // 'Y' or 'N' wchar_t sUsername[64]; // wcs* functions
In order to support multi-lingual (i.e.Unicode) in your language, you can simply code it in more genericmanner:
#include<TCHAR.H> // Implicit or explicit include TCHAR cResponse; // 'Y' or 'N' TCHAR sUsername[64]; // _tcs* functions
The following project setting inGeneral page describes which Character Set is to be used forcompilation:
(General -> Character Set)
This way, when your project is beingcompiled as Unicode, the TCHAR
wouldtranslate to wchar_t
.If it is being compiled as ANSI/MBCS, it would be translatedto char
.You are free to use char
and wchar_t
,and project settings will not affect any direct use of thesekeywords.
T
isdefined as:CHAR
#ifdef _UNICODE typedef wchar_t TCHAR; #else typedef char TCHAR; #endif
The macro _UNICODE
isdefined when you set Character Set to "Use Unicode CharacterSet", and therefore TCHAR
wouldmean wchar_t
.When Character Set if set to "Use Multi-Byte CharacterSet", TCHAR would mean char
.
Likewise, to support multiplecharacter-set using single code base, and possibly supportingmulti-language, use specific functions (macros). Instead ofusing strcpy
, strlen
, strcat
(includingthe secure versions suffixed with_s);or wcscpy
, wcslen
, wcscat
(includingsecure), you should better use use _tcscpy
, _tcslen
, _tcscat
functions.
As youknow strlen
isprototyped as:
size_t strlen(const char*);
And, wcslen
isprototyped as:
size_t wcslen(const wchar_t* );
You may betteruse _tcslen
,whichis logically prototypedas:
size_t _tcslen(const TCHAR* );
WC isfor Wide Character. Therefore, wcs
turnsto be wide-character-string. Thisway, _tcs
wouldmean _T Character String. And you know _T maybe char
or what_t
,logically.
But, inreality, _tcslen
(andother _tcs
functions)areactually not functions,but macros. They are definedsimply as:
#ifdef _UNICODE #define _tcslen wcslen #else #define _tcslen strlen #endif
You shouldrefer TCHAR.H
tolookup more macro definitions like this.
You might ask why they are defined asmacros, and not implemented as functions instead? The reason issimple: A library or DLL may export a single function, with samename and prototype (Ignore overloading concept of C++). Forinstance, when you export a function as:
void _TPrintChar(char);
How the client is supposed to call itas?
void _TPrintChar(wchar_t);
_TPrintChar
cannot be magically convertedinto function taking 2-byte character. There has to be two separatefunctions:
void PrintCharA(char); // A = ANSI void PrintCharW(wchar_t); // W = Wide character
And a simple macro, as defined below,would hide the difference:
#ifdef _UNICODE void _TPrintChar(wchar_t); #else void _TPrintChar(char); #endif
The client would simply call it as:
TCHAR cChar; _TPrintChar(cChar);
Note thatboth TCHAR
and _TPrintChar
wouldmap to either Unicode orANSI, and therefore cChar
andthe argument to function would beeither char
or wchar_t
.
Macros do avoid these complications,and allows us to use either ANSI or Unicode function for charactersand strings. Most of the Windows functions, that take string or acharacter are implemented this way, and for programmersconvenience, only one function (a macro!) isgood. SetWindowText
isone example:
// WinUser.H #ifdef UNICODE #define SetWindowText SetWindowTextW #else #define SetWindowText SetWindowTextA #endif // !UNICODE
There are very few functions that donot have macros, and are available only withsuffixed W or A.One example isReadDirectoryChangesW
,which doesn't have ANSI equivalent.
"This is ANSI String. Each letter takes 1 byte."
The string text given aboveis not Unicode,and would be quantifiable for multi-language support. To representUnicode string, you need to useprefix L
.An example:
L"This is Unicode string. Each letter would take 2 bytes, including spaces."
Notethe L at thebeginning of string, which makes it a Unicode string. Allcharacters (Irepeat all characters)would take two bytes, including all English letters, spaces,digits, and the null character. Therefore, length of Unicode stringwould always be in multiple of 2-bytes. A Unicode string of length7 characters would need 14 bytes, and so on. Unicode string taking15 bytes, for example, would not be valid in any context.
In general, string would be in multipleof sizeof(TCHAR)
bytes!
When you need to express hard-codedstring, you can use:
"ANSI String"; // ANSI L"Unicode String"; // Unicode _T("Either string, depending on compilation"); // ANSI or Unicode // or use TEXT macro, if you need more readability
The non-prefixed string is ANSI string,the L prefixedstring is Unicode, and string specifiedin _T
or TEXT
wouldbe either, depending on compilation.Again, _T
and TEXT
arenothing but macros, and are defined as:
// SIMPLIFIED #ifdef _UNICODE #define _T(c) L##c #define TEXT(c) L##c #else #define _T(c) c #define TEXT(c) c #endif
The ##
symbolis token pasting operator, which wouldturn _T("Unicode")
into L"Unicode"
,where the string passed is argument to macro -If _UNICODE
isdefined. If _UNICODE
isnot defined, _T("Unicode")
wouldsimply mean "Unicode"
.The token pasting operator did exist even in C language, and is notspecific about VC++ or character encoding.
Note that these macros can be used for strings as well ascharacters. _T('R')
wouldturn into L'R'
orsimple 'R'
-former is Unicode character, latter is ANSI character.
No, you cannot usethese macros to convert variables (string or character) intoUnicode/non-Unicode text. Following is not valid:
char c = 'C'; char str[16] = "CodeProject"; _T(c); _T(str);
The bold lines would get successfullycompiled in ANSI (Multi-Byte) build,since _T(x)
wouldsimply be x
,and therefore _T(c)
and _T(str)
wouldcome out to be c
and str
,respectively. But, when you build it with Unicode character set, itwould fail to compile:
error C2065: 'Lc' : undeclared identifier error C2065: 'Lstr' : undeclared identifier
I would not like to insult yourintelligence by describing why and what those errors are.
There exist set of conversion routineto convert MBCS to Unicode and vice versa, which I would explainsoon.
String classes, likeMFC/ATL's CString
implementtwo versions using macro. There are two classes, namedCStringA
forANSI, CStringW
forUnicode. When you use CString
(whichis typedef ontop of templates and Character setting), it translates to either oftwo classes.
The TCHAR
macrowas for a single character. You can definitely declare an arrayof TCHAR
.What if you would like toexpress a character-pointer, ora const-character-pointer -Which one of the following?
// ANSI characters foo_ansi(char*); foo_ansi(const char*); char* pString; // Unicode/wide-string foo_uni(WCHAR*); wchar_t* foo_uni(const WCHAR*); WCHAR* pString; // Independent foo_char(TCHAR*); foo_char(const TCHAR*); TCHAR* pString;After reading about
TCHAR
stuff,you'd definitely select the last one as your choice. But there area better alternatives available. Before that, notethat
TCHAR.H
headerfiledeclares
only
TCHAR
datatype.For the following stuff, you need toinclude
Windows.h
(definedin
WinNT.h).
Windows.h
,you need not include
TCHAR.H
LPSTR
LPCSTR
LPWSTR
LPCWSTR
(C before W,since const
isbefore WCHAR
)LPTSTR
LPCTSTR
Now, I hope you understand thefollowing signatures:
BOOL SetCurrentDirectory( LPCTSTR lpPathName ); DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);
Continuing. You must have seen somefunctions/methods asking you topass number of characters, orreturning the number of characters. Well,like GetCurrentDirectory
,you need to pass number of characters,and not numberof bytes. For example:
TCHAR sCurrentDir[255]; // Pass 255 and not 255*2 GetCurrentDirectory(sCurrentDir, 255);
On the other side, if you need toallocate number or characters, you must allocate proper number ofbytes. In C++, you can simply use new
:
LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.
But if you use memory allocationfunctions like malloc
, LocalAlloc
, GlobalAlloc
,etc; you must specify the number of bytes!
pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );Typecasting the return value is required, as you know. Theexpression in
malloc
'sargument ensures that it allocates desired number of bytes - andmakes up room for desired number of characters.
This article, along with any associatedsource code and files, is licensed under The Code Project OpenLicense (CPOL)