What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?

Many C++ Windows programmers getconfused over what bizarre identifierslike TCHAR,LPCTSTR are.In this article, I would attempt by best to clear out thefog.

In general, a character can be represented in 1 byte or 2 bytes.Let's say 1-byte character is ANSI character - all Englishcharacters are represented throughthis encoding. And let's say a 2-bytecharacter is Unicode, which can represent ALL languages in theworld. 

Visual C++ compiler supports char and wchar_t asnative data-types for ANSI and Unicode characters respectively.Though there is more concrete definitionof Unicode, but for understanding assumeit as two-byte character which Windows OS uses for multiplelanguage support.

What if you want your C/C++ code to be independent of characterencoding/mode used? 
Suggestion: Use generic data-types and names torepresent characters and string.

For example, instead of replacing:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
char cResponse; // 'Y' or 'N'
char sUsername[64];
// str* functions

with

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
wchar_t cResponse; // 'Y' or 'N'
wchar_t sUsername[64];
// wcs* functions

In order to support multi-lingual (i.e.Unicode) in your language, you can simply code it in more genericmanner:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
#include<TCHAR.H> // Implicit or explicit include
TCHAR cResponse; // 'Y' or 'N'
TCHAR sUsername[64];
// _tcs* functions

The following project setting inGeneral page describes which Character Set is to be used forcompilation:
(General -> Character Set)

This way, when your project is beingcompiled as Unicode, the TCHAR wouldtranslate to wchar_t.If it is being compiled as ANSI/MBCS, it would be translatedto char.You are free to use char and wchar_t,and project settings will not affect any direct use of thesekeywords.

TCHAR isdefined as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
#ifdef _UNICODE
typedef wchar_t TCHAR;
#else
typedef char TCHAR;
#endif

The macro _UNICODE isdefined when you set Character Set to "Use Unicode CharacterSet", and therefore TCHARwouldmean wchar_t.When Character Set if set to "Use Multi-Byte CharacterSet", TCHAR would mean char.

Likewise, to support multiplecharacter-set using single code base, and possibly supportingmulti-language, use specific functions (macros). Instead ofusing strcpystrlenstrcat (includingthe secure versions suffixed with_s);or wcscpywcslenwcscat (includingsecure), you should better use use _tcscpy_tcslen_tcscatfunctions.

As youknow strlen isprototyped as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
size_t strlen(const char*);

And, wcslen isprototyped as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
size_t wcslen(const wchar_t* );

You may betteruse _tcslen,whichis logically prototypedas:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
size_t _tcslen(const TCHAR* );

WC isfor Wide Character. Therefore, wcs turnsto be wide-character-string. Thisway, _tcs wouldmean _T Character String. And you know _T maybe char or what_t,logically.

But, inreality, _tcslen (andother _tcs functions)areactually not functions,but macros. They are definedsimply as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
#ifdef _UNICODE
#define _tcslen wcslen 
#else
#define _tcslen strlen
#endif

You shouldrefer TCHAR.H tolookup more macro definitions like this.

You might ask why they are defined asmacros, and not implemented as functions instead? The reason issimple: A library or DLL may export a single function, with samename and prototype (Ignore overloading concept of C++). Forinstance, when you export a function as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
void _TPrintChar(char);

How the client is supposed to call itas?

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
void _TPrintChar(wchar_t);

_TPrintChar cannot be magically convertedinto function taking 2-byte character. There has to be two separatefunctions:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
void PrintCharA(char); // A = ANSI 
void PrintCharW(wchar_t); // W = Wide character

And a simple macro, as defined below,would hide the difference:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
#ifdef _UNICODE
void _TPrintChar(wchar_t); 
#else 
void _TPrintChar(char);
#endif

The client would simply call it as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
TCHAR cChar;
_TPrintChar(cChar);

Note thatboth TCHAR and _TPrintChar wouldmap to either Unicode orANSI, and therefore cChar andthe argument to function would beeither char or wchar_t.

Macros do avoid these complications,and allows us to use either ANSI or Unicode function for charactersand strings. Most of the Windows functions, that take string or acharacter are implemented this way, and for programmersconvenience, only one function (a macro!) isgood. SetWindowText isone example:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
// WinUser.H
#ifdef UNICODE
#define SetWindowText  SetWindowTextW
#else
#define SetWindowText  SetWindowTextA
#endif // !UNICODE

There are very few functions that donot have macros, and are available only withsuffixed W or A.One example isReadDirectoryChangesW,which doesn't have ANSI equivalent.

You all know that we use double quotation marks to representstrings. The string represented in this manner is ANSI-string,having 1-byte each character. Example:
What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
"This is ANSI String. Each letter takes 1 byte."

The string text given aboveis not Unicode,and would be quantifiable for multi-language support. To representUnicode string, you need to useprefix L.An example:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
L"This is Unicode string. Each letter would take 2 bytes, including spaces."

Notethe L at thebeginning of string, which makes it a Unicode string. Allcharacters (Irepeat all characters)would take two bytes, including all English letters, spaces,digits, and the null character. Therefore, length of Unicode stringwould always be in multiple of 2-bytes. A Unicode string of length7 characters would need 14 bytes, and so on. Unicode string taking15 bytes, for example, would not be valid in any context.

In general, string would be in multipleof sizeof(TCHAR) bytes!

When you need to express hard-codedstring, you can use:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
"ANSI String"; // ANSI
L"Unicode String"; // Unicode

_T("Either string, depending on compilation"); // ANSI or Unicode
// or use TEXT macro, if you need more readability

The non-prefixed string is ANSI string,the L prefixedstring is Unicode, and string specifiedin _T or TEXT wouldbe either, depending on compilation.Again, _T and TEXT arenothing but macros, and are defined as:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
// SIMPLIFIED
#ifdef _UNICODE 
 #define _T(c) L##c
 #define TEXT(c) L##c
#else 
 #define _T(c) c
 #define TEXT(c) c
#endif

The ## symbolis token pasting operator, which wouldturn _T("Unicode") into L"Unicode",where the string passed is argument to macro -If _UNICODE isdefined. If _UNICODE isnot defined, _T("Unicode") wouldsimply mean "Unicode".The token pasting operator did exist even in C language, and is notspecific about VC++ or character encoding.

Note that these macros can be used for strings as well ascharacters. _T('R') wouldturn into L'R' orsimple 'R' -former is Unicode character, latter is ANSI character.

No, you cannot usethese macros to convert variables (string or character) intoUnicode/non-Unicode text. Following is not valid:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
char c = 'C';
char str[16] = "CodeProject";

_T(c);
_T(str);

The bold lines would get successfullycompiled in ANSI (Multi-Byte) build,since _T(x) wouldsimply be x,and therefore _T(c) and _T(str) wouldcome out to be c and str,respectively. But, when you build it with Unicode character set, itwould fail to compile:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
error C2065: 'Lc' : undeclared identifier
error C2065: 'Lstr' : undeclared identifier

I would not like to insult yourintelligence by describing why and what those errors are.

There exist set of conversion routineto convert MBCS to Unicode and vice versa, which I would explainsoon.

 

 

String classes, likeMFC/ATL's CString implementtwo versions using macro. There are two classes, namedCStringA forANSI, CStringW forUnicode. When you use CString (whichis typedef ontop of templates and Character setting), it translates to either oftwo classes. 

The TCHAR macrowas for a single character. You can definitely declare an arrayof TCHAR.What if you would like  toexpress a character-pointer, ora const-character-pointer -Which one of the following?

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
// ANSI characters 
foo_ansi(char*); 
foo_ansi(const char*); 
 char* pString; 

// Unicode/wide-string 
foo_uni(WCHAR*); 
wchar_t* foo_uni(const WCHAR*); 
 WCHAR* pString; 

// Independent 
foo_char(TCHAR*); 
foo_char(const TCHAR*); 
 TCHAR* pString;
After reading about  TCHAR  stuff,you'd definitely select the last one as your choice. But there area better alternatives available. Before that, notethat  TCHAR.H  headerfiledeclares  only  TCHAR  datatype.For the following stuff, you need toinclude  Windows.h  (definedin  WinNT.h).

NOTE: If your project implicitly or explicitlyincludes  Windows.h,you need not include  TCHAR.H

  • char* replacement: LPSTR
  • constchar* replacement: LPCSTR
  • WCHAR* replacement: LPWSTR
  • constWCHAR* replacement: LPCWSTR (C before W,since const isbefore WCHAR)
  • TCHAR* replacement: LPTSTR
  • constTCHAR* replacement: LPCTSTR

Now, I hope you understand thefollowing signatures:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
BOOL SetCurrentDirectory( LPCTSTR lpPathName );
DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);

Continuing. You must have seen somefunctions/methods asking you topass number of characters, orreturning the number of characters. Well,like GetCurrentDirectory,you need to pass number of characters,and not numberof bytes. For example:

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
TCHAR sCurrentDir[255];
 
// Pass 255 and not 255*2 
GetCurrentDirectory(sCurrentDir, 255);

On the other side, if you need toallocate number or characters, you must allocate proper number ofbytes. In C++, you can simply use new:

 

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
LPTSTR pBuffer; // TCHAR* 

pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.

But if you use memory allocationfunctions like mallocLocalAllocGlobalAlloc,etc; you must specify the number of bytes!

 

What <wbr>are <wbr>TCHAR, <wbr>WCHAR, <wbr>LPSTR, <wbr>LPWSTR, <wbr>LPCTSTR <wbr>(etc.)?  Collapse  |  CopyCode
pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );
Typecasting the return value is required, as you know. Theexpression in  malloc'sargument ensures that it allocates desired number of bytes - andmakes up room for desired number of characters.

License

This article, along with any associatedsource code and files, is licensed under The Code Project OpenLicense (CPOL)

About the Author

AjayVijayvargiya

你可能感兴趣的:(C++,String,unicode)