C++ 保存文件为UTF8编码格式

前言

       本文是我的第一篇译文,可能翻译不好,将中英同时呈上,便于大家阅读。

 

      本文来自从天堂大鸟的编程博客保存文件为UTF8格式(Writing UTF-8 files in C++).发现的。

      原英文地址:http://mariusbancila.ro/blog/2008/10/20/writing-utf-8-files-in-c/

 

      因为最近在使用tinyxml保存xml,发现其并不能将文件转换为utf8格式,一遇到中文显示,就出乱子,所以就百度一下,找到相关内容。觉得《Writing UTF-8 files in C++》比较浅显易读,就翻译了一下。

 

正文

 

    当你要写下面一段XML文件内容

    Let’s say you need to write an XML file with this content:

 

< ?xml version="1.0" encoding="UTF-8"? >
< root description="this is a naïve example" >
< /root >

    如何用C++实现?

    How do we write that in C++?

 

    咋一看,你可能试图写成这个样子:

   At a first glance, you could be tempted to write it like this:

#include< fstream >

int main()
{
        std::ofstream testFile;

        testFile.open("demo.xml", std::ios::out| std::ios::binary); 

        std::string text =
                "< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
                "< root description=\"this is a naïve example\" >\n< /root >";

        testFile << text;

        testFile.close();

        return0;
}

    当你用IE打开时,惊讶的发现不正确显示:

    When you open the file in IE for instance, surprize! It's not rendered correctly:

C++ 保存文件为UTF8编码格式_第1张图片

    因此,你可能会说:“采用wstring 和 wofstream"。

   So you could be tempted to say "let's switch to wstring and wofstream".

 

int main()
{
        std::wofstream testFile;

        testFile.open("demo.xml", std::ios::out| std::ios::binary); 

        std::wstring text = 
                L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
                L"< root description=\"this is a naïve example\" >\n< /root >";

        testFile << text;

        testFile.close();

        return0;
}

 

    当你运行程序并打开文件时,仍和以前一样。哦,问题出在哪里了?好了,这个问题既不是ofstream也不是用wofstream来写UTF-8编码格式文件。如果你想要实现真正的UFT-8编码格式,你需要在输出字符串中使用UTF-8。我们可以使用WideCharToMultiByte()。这个API函数可以实现将宽字符串转化为新的字符串(不一定是一个多字节字符集)。该函数的第一个参数是字符集编码类型,对于UTF-8格式我们纸需要使用CP_UTF8.

    And when you run it and open the file again, no change. So, where is the problem? Well, the problem is that neither ofstream nor wofstream write the text in a UTF-8 format. If you want the file to really be in UTF-8 format, you have to encode the output buffer in UTF-8. And to do that we can use WideCharToMultiByte(). This Windows API maps a wide character string to a new character string (which is not necessary from a multibyte character set). The first argument indicates the code page. For UTF-8 we need to specify CP_UTF8.

 

    下面函数能帮助我们从std::wstring 转成 UTF-8编码的 std::string。

    The following helper functions encode a std::wstring into a UTF-8 stream, wrapped into a std::string.

#include< windows.h >

std::string to_utf8(constwchar_t* buffer,int len)
{
        int nChars =::WideCharToMultiByte(
                CP_UTF8,
                0,
                buffer,
                len,
                NULL,
                0,
                NULL,
                NULL);
        if(nChars ==0)return"";

        string newbuffer;
        newbuffer.resize(nChars);
        ::WideCharToMultiByte(
                CP_UTF8,
                0,
                buffer,
                len,
                const_cast<char*>(newbuffer.c_str()),
                nChars,
                NULL,
                NULL); 

        return newbuffer;
}

std::string to_utf8(const std::wstring& str)
{
        return to_utf8(str.c_str(),(int)str.size());
}

    有了这个基础,你只需要做如下改动:

    With that in hand, all you have to do is doing the following changes:

int main()
{
        std::ofstream testFile;

        testFile.open("demo.xml", std::ios::out| std::ios::binary); 

        std::wstring text =
                L"< ?xml version=\"1.0\" encoding=\"UTF-8\"? >\n"
                L"< root description=\"this is a naïve example\" >\n< /root >";

        std::string outtext = to_utf8(text);

        testFile << outtext;

        testFile.close();

        return0;
}

现在你再打开文件,你得到你想要的结果:

And now when you open the file, you get what you wanted in the first place.

大功告成!

And that is all!

 

你可能感兴趣的:(VC)