1.简介
因为C类型的字符串容易出错和难于管理,更不用提黑客经常利用这个进行缓冲区溢出攻击。现在存在很多字符的封装类。不幸的是,哪个类使用在哪种情况下不是很清楚,也不要把C类型的字符串直接转化成封装类。
这篇文章将覆盖字符串封装类包括Win32 API, MFC, STL, WTL, 和the Visual C++ runtime library。我将描述每个类的使用方法。怎么去构造对象,怎么从一种类型转化成另一个类。
为了更好的理解这篇文章,你必须理解不同字符的类型和编码,我在第一部分已经进行了介绍。
Rule # 1 of string classes
直接转化是一个错误的想法,除非有明确的文档说明了这个情况。对一个字符进行转化对这个字符来说不会做任何改变,不要这么写:
void SomeFunc ( LPCWSTR widestr );
main()
{
SomeFunc ( (LPCWSTR) "C://foo.txt" ); // WRONG!
}
这肯定会100%的失败,这个编译会通过,因为转化编译器不会进行类型检查,但是编译并不意味着代码正确。
在接下来的例子中,我将指出转化在什么情况下是没有问题的。
C-style strings and typedefs
就像我第一部分所说的那样,Windows APIs是预先定义好的的,TCHARs,它是在编译的时候根据你是否定义了_MBCS 或_UNICODE
宏来选择是MBCS或Unicode 字符。你可以通过第一部分得到TCHAR的全面描述,我将列出所有的字符类型定义。
Type |
Meaning |
---|---|
|
Unicode character ( |
|
MBCS or Unicode character, depending on preprocessor settings |
|
string of |
|
constant string of |
|
string of |
|
constant string of |
|
string of |
|
constant string of |
Here are the OLECHAR
-related typedefs you will see:
Type |
Meaning |
---|---|
|
Unicode character ( |
|
string of |
|
constant string of |
There are also two macros used around string and character literals so that the same code can be used for both MBCS and Unicode builds:
Type |
Meaning |
---|---|
|
Prepends |
|
Prepends |
There are also variants on _T
that you might encounter in documentation or sample code. There are four equivalent macros -- TEXT
, _TEXT
, __TEXT
, and __T
-- that all do the same thing.
Strings in COM - BSTR and VARIANT
许多自动化和其它的COM接口用BSTR来代替strings,但是BSTR有一些缺陷,在这里我将介绍BSTR.
BSTR是Pascal类型(where the length is stored explicitly along with the data)的字符串和C类型where the string length must be calculated by looking for a terminating zero character).的字符串的混合体。BSTR 是一个Unicode类型的字符串,string预留了他它的长度,它也是以‘0’字符结尾的,这里有一个例子an example of "Bob" as a BSTR
:
|
|
|
|
|
|
|
|
|
|
BSTR
can hold any arbitrary block of data, not just characters, and can contain embedded zero characters. However, for the purposes of this article, I will not consider such cases.)
BSTR
variable in C++ is actually a pointer to the first character of the string. In fact, the type
BSTR
is defined this way:
typedef OLECHAR* BSTR;
This is very unfortunate, because in reality a BSTR
is not the same as a Unicode string. That typedef defeats type-checking and allows you to freely mix LPOLESTR
s and BSTR
s. Passing a BSTR
to a function expecting a LPCOLESTR
(or LPCWSTR
) is safe, however the reverse is not. Therefore, it's important to be aware of the exact type of string that a function expects, and pass the correct type of string.
有许多APIs是用来操作BSTRs,但是最重要的两个函数是为BSTR分配和销毁空间,他们是SysAllocString() 和SysFreeString()
.SysAllocString() 把一个Unicode字符串拷贝到一个BSTR变量中, SysFreeString()是释放一个BSTR开辟的空间。
BSTR bstr = NULL; bstr = SysAllocString ( L"Hi Bob!" ); if ( NULL == bstr ) // out of memory error // Use bstr here... SysFreeString ( bstr );
很自然的,BSTR封装类接管了内存管理。
另一个自动化的接口是VARIANT。它是用来在无类型的语言,比如JScript、VBScript传送数据的。一个
VARIANT可以包含许多的类型,比如long和IDispatch*。当一个VARIANT包含一个字符串,它就是一个BSTR.
对VARIANT我后面会讲的更详细。
String wrapper classes
Now that I've covered the various types of strings, I'll demonstrate the wrapper
classes. For each one, I'll show how to construct an object and how to convert it t
o a C-style string pointer. The C-style pointer is often necessary for an API call,
or to construct an object of a different string class. I will not cover other
operators the classes provide, such as sorting or comparison.
Once again, do not blindly cast objects unless you understand exactly what the
resulting code will do.
Classes provided by the CRT
_bstr_t
_bstr_t
is a complete wrapper around aBSTR
, and in fact it hides the underlying
BSTR
. It provides various constructors, as well as operators to access theunderlying C-style string. However, there is no operator to access the
BSTR
itself,so a
_bstr_t
cannot be passed as an[out]
parameter to COM methods. If you need a
BSTR*
to use as a parameter, it is easier to the ATL classCComBSTR
.A
_bstr_t
can be passed to a function that takes aBSTR
, but only because ofthree coincidences. First,
_bstr_t
has a conversion function towchar_t*
; second,
wchar_t*
andBSTR
appear the same to the compiler because of the definition ofBSTR
;and third, the
wchar_t*
that a_bstr_t
keeps internally points to a block of memorythat follows the
BSTR
format. So even though there is no documented conversion to
BSTR
, it happens to work.// Constructing _bstr_t bs1 = "char string"; // construct from a LPCSTR _bstr_t bs2 = L"wide char string"; // construct from a LPCWSTR _bstr_t bs3 = bs1; // copy from another _bstr_t _variant_t v = "Bob"; _bstr_t bs4 = v; // construct from a _variant_t that has a string // Extracting data LPCSTR psz1 = bs1; // automatically converts to MBCS string LPCSTR psz2 = (LPCSTR) bs1; // cast OK, same as previous line LPCWSTR pwsz1 = bs1; // returns the internal Unicode string LPCWSTR pwsz2 = (LPCWSTR) bs1; // cast OK, same as previous line BSTR bstr = bs1.copy(); // copies bs1, returns it as a BSTR // ... SysFreeString ( bstr );Note that
_bstr_t
also has conversion operators forchar*
andwchar_t*
. This isa questionable design, because even though those are non-constant string pointers,
you must not use those pointers to modify the buffer, because that could break the
internal
BSTR
structure._variant_t
_variant_t
is a complete wrapper around aVARIANT
, and provides many constructorsand conversion functions to operate on the multitude of types that a
VARIANT
cancontain. I will only cover the string-related operations here.
// Constructing _variant_t v1 = "char string"; // construct from a LPCSTR _variant_t v2 = L"wide char string"; // construct from a LPCWSTR _bstr_t bs1 = "Bob"; _variant_t v3 = bs1; // copy from a _bstr_t object // Extracting data _bstr_t bs2 = v1; // extract BSTR from the VARIANT _bstr_t bs3 = (_bstr_t) v1; // cast OK, same as previous lineNote that the
_variant_t
methods can throw exceptions if the type conversion cannotbe made, so be prepared to catch
_com_error
exceptions.Also note that there is no direct conversion from
_variant_t
to an MBCS string.You will need to make an interim
_bstr_t
variable, use another string class thatprovides the Unicode to MBCS conversion, or use an ATL conversion macro.
Unlike
_bstr_t
, a_variant_t
can be passed directly as a parameter to a COMmethod.
_variant_t
derives from theVARIANT
type, so passing a_variant_t
in placeof a
VARIANT
is allowed by C++ language rules.STL classes
STL just has one string class,
basic_string
. Abasic_string
manages a zero-terminatedarray of characters. The character type is given in the
basic_string
template parameter.In general, a
basic_string
should be treated as an opaque object. You can get aread-only pointer to the internal buffer, but any write operations must use
basic_string
operators and methods.There are two predefined specializations for
basic_string
:string
, whichcontains
char
s, andwstring
, which containswchar_t
s. There is no built-inTCHAR
specialization, but you can use the one listed below.
// Specializations typedef basic_string<TCHAR> tstring; // string of TCHARs // Constructing string str = "char string"; // construct from a LPCSTR wstring wstr = L"wide char string"; // construct from a LPCWSTR tstring tstr = _T("TCHAR string"); // construct from a LPCTSTR // Extracting data LPCSTR psz = str.c_str(); // read-only pointer to str's buffer LPCWSTR pwsz = wstr.c_str(); // read-only pointer to wstr's buffer LPCTSTR ptsz = tstr.c_str(); // read-only pointer to tstr's bufferUnlike
_bstr_t
, abasic_string
cannot directly convert between character sets.However, you can pass the pointer returned by
c_str()
to another class's constructorif the constructor accepts the character type, for example:
// Example, construct _bstr_t from basic_string _bstr_t bs1 = str.c_str(); // construct a _bstr_t from a LPCSTR _bstr_t bs2 = wstr.c_str(); // construct a _bstr_t from a LPCWSTRATL classes
CComBSTR
CComBSTR
is ATL'sBSTR
wrapper, and is more useful in some situations than_bstr_t
.Most notably,
CComBSTR
allows access to the underlyingBSTR
, which means you canpass a
CComBSTR
object to COM methods, and theCComBSTR
object will automaticallymanage the
BSTR
memory for you. For example, say you wanted to call methods of thisinterface:
// Sample interface:struct IStuff : public IUnknown { // Boilerplate COM stuff omitted... STDMETHOD(SetText)(BSTR bsText); STDMETHOD(GetText)(BSTR* pbsText); };
CComBSTR
has anoperator BSTR
method, so it can be passed directly toSetText()
.There is also an
operator &
that returns aBSTR*
, so you can use the&
operator ona
CComBSTR
object to pass it to a function that takes aBSTR*
.CComBSTR bs1; CComBSTR bs2 = "new text"; pStuff->GetText ( &bs1 ); // ok, takes address of internal BSTR pStuff->SetText ( bs2 ); // ok, calls BSTR converter pStuff->SetText ( (BSTR) bs2 ); // cast ok, same as previous line
CComBSTR
has similar constructors to_bstr_t
, however there is no built-in converterto an MBCS string. For that, you can use an ATL conversion macro.
// Constructing CComBSTR bs1 = "char string"; // construct from a LPCSTR CComBSTR bs2 = L"wide char string"; // construct from a LPCWSTR CComBSTR bs3 = bs1; // copy from another CComBSTR CComBSTR bs4; bs4.LoadString ( IDS_SOME_STR ); // load string from string table // Extracting data BSTR bstr1 = bs1; // returns internal BSTR, but don't modify it! BSTR bstr2 = (BSTR) bs1; // cast ok, same as previous line BSTR bstr3 = bs1.Copy(); // copies bs1, returns it as a BSTR BSTR bstr4; bstr4 = bs1.Detach(); // bs1 no longer manages its BSTR // ... SysFreeString ( bstr3 ); SysFreeString ( bstr4 );Note that in the last example, the
Detach()
method is used. After calling thatmethod, the
CComBSTR
object no longer manages itsBSTR
or the associated memory.That's why the
SysFreeString()
call is necessary onbstr4
.As a footnote, the
operator &
override means you can't useCComBSTR
directly in someSTL collections, such as
list
. The collections require that the&
operator returna pointer to the contained class, but applying
&
to aCComBSTR
returns aBSTR*
, nota
CComBSTR*
. However, there is an ATL class to overcome this,CAdapt
. For example,to make a list of
CComBSTR
, declare it like this:std::list< CAdapt<CComBSTR> > bstr_list;
CAdapt
provides the operators required by the collection, but it is invisible toyour code; you can use
bstr_list
just as if it were a list ofCComBSTR
.CComVariant
CComVariant
is a wrapper around aVARIANT
. However, unlike_variant_t
, theVARIANT
is not hidden, and in fact you need to access the members of the
VARIANT
directly.
CComVariant
provides many constructors to operate on the multitude of types that a
VARIANT
can contain. I will only cover the string-related operations here.// Constructing CComVariant v1 = "char string"; // construct from a LPCSTR CComVariant v2 = L"wide char string"; // construct from a LPCWSTR CComBSTR bs1 = "BSTR bob"; CComVariant v3 = (BSTR) bs1; // copy from a BSTR // Extracting data CComBSTR bs2 = v1.bstrVal; // extract BSTR from the VARIANTUnlike
_variant_t
, there are no conversion operators to the variousVARIANT
types.As shown above, you must access the
VARIANT
members directly and ensure that the
VARIANT
holds data of the type you expect. You can call theChangeType()
method ifyou need to convert a
CComVariant
's data to aBSTR
.CComVariant v4 = ... // Init v4 from somewhere CComBSTR bs3; if ( SUCCEEDED( v4.ChangeType ( VT_BSTR ) )) bs3 = v4.bstrVal;As with
_variant_t
, there is no direct conversion to an MBCS string. You will needto make an interim
_bstr_t
variable, use another string class that provides theUnicode to MBCS conversion, or use an ATL conversion macro.
ATL conversion macros
ATL's string conversion macros are a very convenient way to convert between character
encodings, and are especially useful in function calls. They are named according to
the scheme
[source type]2[new type]
or[source type]2C[new type]
. Macros named withthe second form convert to a constant pointer (thus the "C" in the name). The type
abbreviations are:
A: MBCS string,
char*
(A for ANSI)
W: Unicode string,wchar_t*
(W for wide)
T:TCHAR
string,TCHAR*
OLE:OLECHAR
string,OLECHAR*
(in practice, equivalent to W)
BSTR:BSTR
(used as the destination type only)So, for example,
W2A()
converts a Unicode string to an MBCS string, andT2CW()
converts a
TCHAR
string to a constant Unicode string.To use the macros, first include the atlconv.h header file. You can do this even in
non-ATL projects, since that header file has no dependencies on other parts of ATL,
and doesn't require a
_Module
global variable. Then, when you use a conversionmacro in a function, put the
USES_CONVERSION
macro at the beginning of the function.This defines some local variables used by the macros.
When the destination type is anything other than
BSTR
, the converted string isstored on the stack, so if you want to keep the string around for longer than the
current function, you'll need to copy the string into another string class. When
the destination type is
BSTR
, the memory is not automatically freed, so you mustassign the return value to a
BSTR
variable or aBSTR
wrapper class to avoid memoryleaks.
Here are some examples showing various conversion macros:
Collapse// Functions taking various strings: void Foo ( LPCWSTR wstr ); void Bar ( BSTR bstr ); // Functions returning strings: void Baz ( BSTR* pbstr ); #include <atlconv.h> main() { using std::string; USES_CONVERSION; // declare locals used by the ATL macros // Example 1: Send an MBCS string to Foo() LPCSTR psz1 = "Bob"; string str1 = "Bob"; Foo ( A2CW(psz1) ); Foo ( A2CW(str1.c_str()) ); // Example 2: Send a MBCS and Unicode string to Bar() LPCSTR psz2 = "Bob"; LPCWSTR wsz = L"Bob"; BSTR bs1; CComBSTR bs2; bs1 = A2BSTR(psz2); // create a BSTR bs2.Attach ( W2BSTR(wsz) ); // ditto, assign to a CComBSTR Bar ( bs1 ); Bar ( bs2 ); SysFreeString ( bs1 ); // free bs1 memory // No need to free bs2 since CComBSTR will do it for us. // Example 3: Convert the BSTR returned by Baz() BSTR bs3 = NULL; string str2; Baz ( &bs3 ); // Baz() fills in bs3 str2 = W2CA(bs3); // convert to an MBCS string SysFreeString ( bs3 ); // free bs3 memory }As you can see, the macros are very handy when passing parameters to a function if
you have a string in one format and the function takes a different format.
MFC classes
CString
An MFC
CString
holdsTCHAR
s, so the exact character type depends on the preprocessorsymbols you have defined. In general, a
CString
is like an STLstring
, in that youshould treat it as an opaque object and modify it only with
CString
methods. Onenice advantage
CString
has over the STLstring
is that it has constructors thataccept both MBCS and Unicode strings, and it has a converter to
LPCTSTR
, so youcan pass a
CString
object directly to a function that accepts anLPCTSTR
; there isno
c_str()
method you have to call.// Constructing CString s1 = "char string"; // construct from a LPCSTR CString s2 = L"wide char string"; // construct from a LPCWSTR CString s3 ( ' ', 100 ); // pre-allocate a 100-byte buffer, fill with spaces CString s4 = "New window text"; // You can pass a CString in place of an LPCTSTR: SetWindowText ( hwndSomeWindow, s4 ); // Or, equivalently, explicitly cast the CString: SetWindowText ( hwndSomeWindow, (LPCTSTR) s4 );You can also load a string from your string table. There is a
CString
constructorthat will do it, along with
LoadString()
. TheFormat()
method can optionally read aformat string from the string table as well.
// Constructing/loading from string table CString s5 ( (LPCTSTR) IDS_SOME_STR ); // load from string table CString s6, s7; // Load from string table. s6.LoadString ( IDS_SOME_STR ); // Load printf-style format string from the string table: s7.Format ( IDS_SOME_FORMAT, "bob", nSomeStuff, ... );That first constructor looks odd, but that is actually the documented that way to
load a string.
Note that the only legal cast you can apply to a
CString
is a cast toLPCTSTR
.Casting to an
LPTSTR
(that is, a non-const
pointer) is wrong. Getting in the habitof casting a
CString
to anLPTSTR
will only hurt yourself, as when the code doesbreak later on, you might not see why, because you used the same code elsewhere and
it happened to work. The correct way to get a non-const pointer to the buffer is
the
GetBuffer()
method.As an example of the correct usage, consider the case of setting the text of an item
in a list control:
CString str = _T("new text"); LVITEM item = {0}; item.mask = LVIF_TEXT; item.iItem = 1;item.pszText = (LPTSTR)(LPCTSTR) str;// WRONG! item.pszText = str.GetBuffer(0); // correct ListView_SetItem ( &item ); str.ReleaseBuffer(); // return control of the buffer to strThe
pszText
member is anLPTSTR
, a non-const
pointer, therefore you callGetBuffer()
on
str
. The parameter toGetBuffer()
is the minimum length you wantCString
toallocate for the buffer. If for some reason you wanted a modifiable buffer large
enough to hold 1K
TCHAR
s, you would callGetBuffer(1024)
. Passing 0 as the lengthjust returns a pointer to the current contents of the string.
The crossed-out line above will compile, and it will even work, in this case. But
that doesn't mean the code is correct. By using the non-
const
cast, you're breakingobject-oriented encapsulation and assuming something about the internal implementation
of
CString
. If you make a habit of casting like that, you will eventually run intoa case where the code breaks, and you'll wonder why it isn't working, because you
use the same code everywhere else and it (apparently) works.
You know how people are always complaining about how buggy software is these days?
Bugs are caused by the programmers writing incorrect code. Do you really want to
write code you know is wrong, and thus contribute to the perception that all software
is buggy? Take the time to learn the correct way of using a
CString
and have yourcode work 100% of the time.
CString
also has two functions that create aBSTR
from theCString
contents,converting to Unicode if necessary. They are
AllocSysString()
andSetSysString()
.Aside from the
BSTR*
parameter thatSetSysString()
takes, they work identically.// Converting to BSTR CString s5 = "Bob!"; BSTR bs1 = NULL, bs2 = NULL; bs1 = s5.AllocSysString(); s5.SetSysString ( &bs2 ); // ... SysFreeString ( bs1 ); SysFreeString ( bs2 );COleVariant
COleVariant
is pretty similar toCComVariant
.COleVariant
derives fromVARIANT
, soit can be passed to a function that takes a
VARIANT
. However, unlikeCComVariant
,
COleVariant
only has anLPCTSTR
constructor. There are not separate constructorsfor
LPCSTR
andLPCWSTR
. In most cases this is not a problem, since your stringswill likely be
LPCTSTR
s anyway, but it is a point to be aware of.COleVariant
alsohas a constructor that accepts a
CString
.// Constructing CString s1 = _T("tchar string"); COleVariant v1 = _T("Bob"); // construct from an LPCTSTR COleVariant v2 = s1; // copy from a CStringAs with
CComVariant
, you must access theVARIANT
members directly, using the
ChangeType()
method if necessary to convert theVARIANT
to a string. However,
COleVariant::ChangeType()
throws an exception if it fails, instead of returning afailure
HRESULT
code.// Extracting data COleVariant v3 = ...; // fill in v3 from somewhere BSTR bs = NULL; try { v3.ChangeType ( VT_BSTR ); bs = v3.bstrVal; } catch ( COleException* e ) { // error, couldn't convert } SysFreeString ( bs );WTL classes
CString
WTL's
CString
behaves exactly like MFC'sCString
, so refer to the description of theMFC
CString
above.CLR and VC 7 classes
System::String
is the .NET class for handling strings. Internally, aString
objectholds an immutable sequence of characters. Any
String
method that supposedlymanipulates the
String
object actually returns a newString
object, because theoriginal
String
is immutable. A peculiarity ofString
s is that if you have morethan one
String
containing the same series, of characters all of them actuallyrefer the same object. The Managed Extensions to C++ have a new string literal
prefix
S
, which is used to represent a managed string literal.// Constructing String* ms = S"This is a nice managed string";You can construct a
String
object by passing an unmanaged string, but this isslightly less efficient than when you construct a
String
object by passing amanaged string. This is because all instances of identical
S
prefixed stringsrepresent the same object, but this is not true for unmanaged strings. The following
code will make this clear:
String* ms1 = S"this is nice"; String* ms2 = S"this is nice"; String* ms3 = L"this is nice"; Console::WriteLine ( ms1 == ms2 ); // prints true Console::WriteLine ( ms1 == ms3); // prints falseThe right way to compare strings that may not have been created using
S
prefixedstrings is to use the
String::CompareTo()
method as shown below:Console::WriteLine ( ms1->CompareTo(ms2) ); Console::WriteLine ( ms1->CompareTo(ms3) );Both the above lines will print 0, which means the strings are equal.
Converting between a
String
and the MFC 7CString
is easy.CString
has a converterto
LPCTSTR
andString
has two constructors that take achar*
andwchar_t*
,therefore you can pass a
CString
straight to aString
constructor.CString s1 ( "hello world" ); String* s2 ( s1 ); // copy from a CStringConverting the other way works similarly:
String* s1 = S"Three cats"; CString s2 ( s1 );This might puzzle you a bit, but it works because starting with VS.NET,
CString
hasa constructor that accepts a
String
object:CStringT ( System::String* pString );
For some speedy manipulations, you might sometimes want to access the underlying
string:
String* s1 = S"Three cats"; Console::WriteLine ( s1 ); const __wchar_t __pin* pstr = PtrToStringChars(s1); for ( int i = 0; i < wcslen(pstr); i++ ) (*const_cast<__wchar_t*>(pstr+i))++; Console::WriteLine ( s1 );
PtrToStringChars()
returns aconst __wchar_t*
to the underlying string which we needto pin down as otherwise the garbage collector might move the string in memory while
we are manipulating its contents.
Using string classes with printf-style formatting functions
You must pay careful attention when using string wrapper classes with
printf()
orany function that works the way
printf()
does. This includessprintf()
and itsvariants, as well as the
TRACE
andATLTRACE
macros. Because there is no type-checkingdone on the additional parameters to the functions, you must be careful to only pass a C-style string pointer, not a complete string object.
So for example, to pass a string in a
_bstr_t
toATLTRACE()
, you must explicitlywrite the
(LPCSTR)
or(LPCWSTR)
cast:_bstr_t bs = L"Bob!"; ATLTRACE("The string is: %s in line %d/n", (LPCSTR) bs, nLine);If you forget the cast and pass the entire
_bstr_t
object, the trace message willdisplay meaningless output, since what will be pushed on the stack is whatever
internal data the
_bstr_t
variable keeps.Summary of all the classes
The usual way of converting between two string classes is to take the source string, convert it to a C-style string pointer, and then pass the pointer to a constructor in the destination type. So here is a chart showing how to convert a string to a C-style pointer, and which classes can be constructed from C-style pointers.
Class |
string |
convert |
convert to |
convert to |
convert to |
convert |
construct |
construct |
---|---|---|---|---|---|---|---|---|
|
|
yes, cast1 |
yes, cast |
yes, cast1 |
yes, cast |
yes2 |
yes |
yes |
|
|
no |
no |
no |
cast to |
cast to |
yes |
yes |
|
MBCS |
no |
yes, |
no |
no |
no |
yes |
no |
|
Unicode |
no |
no |
no |
yes, |
no |
no |
yes |
|
|
no |
no |
no |
yes, cast |
yes, cast |
yes |
yes |
|
|
no |
no |
no |
yes4 |
yes4 |
yes |
yes |
|
|
no6 |
in MBCS |
no6 |
in Unicode |
no5 |
yes |
yes |
|
|
no |
no |
no |
yes4 |
yes4 |
in MBCS builds |
in Unicode builds |
1 Even though |