character set,utf 8, unicode, ansi

Character sets

Contributed by Ken Fowles, Personal Systems Division, Microsoft.

This page starts with a summary, then digs into ASCII, OEM, ANSI, DBCS, and Unicode character sets, and how character sets affect technology at Microsoft.

Summary

Character sets affect two fundamental parts of your code:

How you store or transmit data, your file format.
String processing, the logic with which you manipulate text.

Character sets do not solve:

Locale-awareness, formatting preferences.
Special input requirements, keyboard layouts, IMEs.
Text layout, fonts and other display issues.

In the dark ages, developers generally ignored character sets. Since one ANSI character set can handle Western European languages like English, French, German, Italian and Spanish, other languages were considered special cases or not handled at all.

Many, but not all of the world's major writing systems can be represented within 256 characters, using individual 8-bit character sets. It's important to note there isn't an 8-bit character set which can represent all of these languages at once, or even just the languages required by the European Union.

Languages which require more than 256 characters include: Chinese (Traditional and Simplified), Japanese, and Korean (Hangeul). It is a requirement, not an option, that any application which touches text in these languages needs to correctly handle DBCS or Unicode string processing and data. Unless you enjoy throwing away a lot of code and algorithms, it's best to implement this from day one in all your text handling code.

ASCII

ASCII is contained within 2 to the 7th power, or 128 characters. There's room in ASCII for upper and lowercase English, American English punctuation, base 10 numbers, a few control characters and not much else. Although very primitive, it's important to note ASCII is the one common denominator contained in all the other common character sets - so the only means of interchanging data across all major languages (without risk of character mapping loss) is to use ASCII (or have all sides understand Unicode). For example, the safest way to store filenames on a typical network today is using the ASCII subset of characters. If you manually log into CompuServe, they require a 7-bit instead of 8-bit modem protocol, since their servers were originally ASCII-based.

OEM 8-bit characters

Back in the DOS days, separate Original Equipment Manufacturer code pages were created so that text-mode PCs could display and print line-drawing characters. They're still used today for direct FAT access, and for accessing data files created by MS-DOS based applications. OEM code pages typically have a 3-digit label, such as CP 437 for American English.

The emphasis with OEM code pages was linedraw characters. It was a good idea at the time, since the standard video for the original IBM PC was a monochrome text card with 2k RAM, connected to an attractive green monitor. However the drawing characters took up a lot of space in the 256 character map, leaving very little room for international characters. Since each hardware OEM was free to set their own character standards, some situations continue today where characters can be scrambled or lost even within the same language, if two OEM code pages have different character code points. For example a few characters were mapped differently between Russian MS-DOS and Russian IBM PC-DOS, so data movement is unreliable, or software has to be written to map between each special case.

Users aren't going to suddently erase all their old data and reformat all their disks. The raw data and FAT filenames created with OEM code pages will be around for a long time.

Windows ANSI

Since Windows GDI overrides the need for text-based line draw characters, the old OEM line-draw characters could be freed up for something more useful, like international characters and publishing symbols. An assortment of 256-character Windows ANSI character sets cover all the 8-bit languages targeted by Windows.

You can think of Windows ANSI as a lower 128, and an upper 128. The lower 128 is identical to ASCII, and the upper 128 is different for each ANSI character set, and is where the various international characters are parked.

code page	1250	1251	1252	1253	1254	etc.,
upper 128	Eastern Europe	Cyrillic	West Euro ANSI	Greek	Turkish	etc.,
lower 128	ASCII	ASCII	ASCII	ASCII	ASCII	etc.,

The European Union includes more languages than Code Page 1252 can cover - specifically Greek is missing, and there's no way to fit it all into 256 characters. Switching entirely to Unicode would allow coverage of all EU languages (and a lot more) in one character set, but that conversion is not automatic, and requires every algorithm which touches text is inspected or rewritten. So an interim solution available, which allows the spanning of multiple ANSI code pages within one document - Multilingual Content I/O. Remember this is for multilingual document content, not user interface - two separate issues.

DBCS

DBCS stands for Double Byte Character Sets but are actually multi-byte encodings, a mix of 8-bit and 16-bit characters. Modern writing systems used in the Far East region typically require a minimum of 3k-15k characters.

There are several DBCS character sets supported by Far East editions of Microsoft Windows. Leadbytes signal that the following byte is a trailbyte of the 16-bit character unit, instead of the start of the next character. Each DBCS code page has a different leadbyte and trailbyte range. No leadbytes fall within the lower 127 (ASCII) range, but some trailbytes do.

Sample Shift-JIS String

4 Code Range

ASCII
1st lead byte range
Kana
2nd lead byte range

The main rules for DBCS-enabling are:

Always move entire characters, not bytes. Never let a 16-bit character get split up, don't allow an edit/insertion point in the middle of a 16-bit character. Check all your string algorithms for this.
Never code to specific leadbyte/trailbyte ranges - that will lock your code into one FE language. Instead, ask Windows for the answer using GetCPInfo (generic across all encodings) or IsDBCSLeadByte (DBCS systems)
p+ + and p- - will march your pointer right into the center of a DBCS character and break. Use CharPrev and Charnext instead. Charprev/next is the best way to go, since it behaves consistently across ANSI, DBCS and Unicode.

What is Unicode / ISO 10646 ?

Unicode is a 16-bit character set which contains all of the characters commonly used in information processing. Approximately 1/3 of the 64k possible code points are still unassigned, to allow room for adding additional characters in the future.

Unicode is not a technology in itself. Sometimes people misunderstand Unicode and expect it to 'solve' international engineering, which it doesn't. Unicode is an agreed upon way to store characters, a standard supported by members of the Unicode Consortium.

The fundamental idea behind Unicode is to be language-independent, which helps conserve space in the character map - no single character is assumed to identify a language in itself. Just like a character "a" can be a French, German or English "a" even if they have different meanings, a particular Han ideograph might map to a character used in Chinese, Japanese and Korean. Sometimes native speakers of these languages misunderstand Unicode as not "looking" correct in Japanese for example, but that's intentional - appearance should reside in the font as an artistic issue, not the code point as an engineering issue. Although it's technically possible to ship one font which covers all Unicode characters, it would have very limited commercial use, since end-users in Asia will expect fonts dedicated and designed to look correct in their language.

This language-independence also means Unicode does not imply any sort order. The older 8-bit and DBCS character sets usually contain a sort order, but this means they had to create a new character set to change the sort order, which makes a mess out of data interchange between languages. Instead, Unicode expects the host operating system to handle sorting, as the Win32 NLS APIs do.

Data interchange between languages

This is where Unicode has the clearest advantage compared to code pages. Unicode is essentially a superset of every Windows ANSI, Windows DBCS and DOS OEM character set. So for example an Unicode-based Internet browser could let its user simultaneously view Web pages which contained text in practically any language, as long as they have the appropriate fonts on their machine.

Unicode is even useful for products which don't rely on Unicode for string processing, since it makes a good common denominator for mapping characters between code pages. Instead of manually creating an almost infinite set of possible mapping tables between every code page, it's easier to map from one codepage to Unicode, and then back over to the other codepage. The Win32 SDK sample UCONVERT shows how to use the system's *.nls tables to accomplish part of this task.

Impact on your project

Unicode-enabling is not an automatic process - since it requires 16-bit characters, many of the same ANSI coding assumptions which will break on DBCS will also break on Unicode - for example your pointer math can't assume 8 bit characters, and you will need to test to verify correct string handling, in every place your code directly touches text. Fortunately there are some shortcuts.

Twelve steps to Unicode-enabling【熊猫注：这几条不错啊】

from Developing International Applications pages 109-111, Microsoft Press:

Modify your code to use generic data types.
Determine which variables declared as char or char* are text, and not pointers to buffers or binary byte arrays. Change these types to TCHAR and TCHAR*, as defined in the Win32 file windows.h, or to _TCHAR as defined in the VC++ file tchar.h. Replace LPSTR and LPCH with LPTSTR and LPTCH. Make sure to check all local variables and return types. Using generic data types is a good transition strategy because you can compile both ANSI and Unicode versions of your program without sacrificing the readability of the code. Don't use generic data types, however, for data that will always be Unicode or always ANSI. For example, one of the string parameters of MutliByteToWideChar and WideCharToMultiByte should always be in ANSI and the other should always be in Unicode.
Modify your code to use generic function prototypes.
For example, use the C run-time call _tclen instead of strlen, and use the Win32 API GetLocaleInfo instead of GetLocaleInfoA. If you are also porting from 16 bits to 32 bits, most Win32 generic function prototypes conveniently have the same name as the corresponding Windows 3.1 API calls (TextOut is one good example). Besides, the Win32 API is documented using generic types. If you plan to use Visual C++ 2.x or higher, become familiar with the available wide-character functions so that you'll know what kind of function calls you need to change. Always use generic data types when using generic function prototypes.
Surround any character or string literal with the TEXT macro.
The TEXT macro conditionally places an L in front of a character literal or a string literal. The C run-time equivalents are _T and _TEXT. Be careful with escape sequence specifying a 16-bit Unicode double-quote character, not as the beginning of a Unicode string. Visual C++ 2 treats anything within L" " quotes as a multibyte string and translates it to Unicode byte by byte, based on the current locale, using mbtowc. One possible way to create a string with Unicode hex values is to create a regular string and then coerce it to a Unicode string (while paying attention to byte order).
char foo[4] = 0x40,0x40,0x40,0x41;
wchar_t *wfoo = (wchar_t *)foo;
Create generic versions of your data structures.
Type definitions for string or character fields in structure should resolve correctly based on the UNICODE compile-time flag. If you write your own string-handling and character-handling functions, or functions that take strings as parameters, create Unicode versions of them and define generic prototypes for them.
Change your make process.
When you want to build a Unicode version of your application, both the Win32 compile-time flag -DUNICODE and the C run-time compile-time flag -D_UNICODE must be defined.
Adjust pointer arithmetic.
Subtracting char* values yields an answer in terms of bytes; subtracting wchar_t values yields an answer in terms of 16-bit chunks. When determining the number of bytes (for example, when allocating memory), multiply by sizeof(TCHAR). When determining the number of characters from the number of bytes, divide by sizeof(TCHAR). You can also create macros for these two operations, if you prefer. C makes sure that the + + and - - operators increment and decrement by the size of the data type.
Check for any code that assumes a character is always 1 byte long.
Code that assumes a character's value is always less than 256 (for example, code that uses a character value as an index into a table of size 256) must be changed. Remember that the ASCII subset of Unicode is fully compatible with 7-bit ASCII, but be smart about where you assume that character will be limited to ASCII. Make sure your definition of NULL is 16 bits long.
Add Unicode-specific code if necessary.
In particular, add code to map data "at the boundary" to and from Unicode using the Win32 functions WideCharToMultiByte and MutliByteToWideChar, or using the C run-time functions mbtowc, mbstowcs, wctomb, and wcstombs. Boundary refers to systems such as Windows 95, to old files, or to output calls, all of which might expect or provide non-Unicode encoded characters.
Add code to support special Unicode characters.
These include Unicode character in the compatibility zone, character in the private-use zone, combining characters, and character with directionality. Other special characters include the private-use-zone non-character U+FFFF, which can be used as a sentinel, and the byte order marks U+FEFF and U+FFFE, which can serve as flags that indicate a file is stored as Unicode. The byte order marks are used to indicate whether a text stream is little-Endian or big-Endian - that is, whether the high-order byte is stored first or last. In plaintext, the line separator U+2028 marks an unconditional end of line. Inserting a paragraph separator, U+2029, between paragraphs makes it easier to lay out text at different line widths.
Determine how using Unicode will affect file I/O.
If your application will exist in both Unicode and non-Unicode variations, you'll need to consider whether you want them to share a file format. Standardizing on an 8-bit character set data file will take away some of the benefits of having a Unicode application. Having different file formats and adding a conversion layer so each version can read the other's files is another alternative. Even if you choose a Unicode file format, you might have to support reading in old non-Unicode files or saving files in non-Unicode format for backward compatibility. Also, make sure to use the correct printf-style format specifiers for Visual C++, shown here:
   Specifier printf Expects wprintf expects
   %s SBCS or MBCS Unicode
   %S Unicode SBCS or MBCS
   %hs SBCS or MBCS SBCS or MBCS
   %ls Unicode Unicode
Double-check the way in which you retrieve command line arguments.
Use the function GetCommandLine rather than the lpCmdLine parameter (an ANSI string) that is passed to WinMain. WinMain cannot accept Unicode input because it is called before a window class is registered.
Debug your port by enabling your compiler's type-checking.
Do this (using W3 on Microsoft compilers) with and without the UNICODE flag defined. Some warnings that you might be able to ignore in the ANSI world will cause problems with Unicode. If your original code compiles cleanly with type-checking turned on, it will be easier to port. The warnings will help you make sure that you are not passing the wrong data type to code that expects wide-character data types. Use the Win32 NLSAPI or equivalent C run-time calls to get character typing and sorting information. Don't try to write your own logic - your application will end up carrying very large tables!

For samples on Unicode-enabled programming, UCONVERT shows character conversion using the system's *.nls tables, and GRIDFONT shows how font enumeration and display needs to keep track of character encoding.

TextFiled 中输入金额宁梓茞
要求:输入的金额不能超过六位,小数点后面只能输入两位小数如果textFIled中第一位输入的是0,后面必须输入小数点,否则禁止输入用到textfiled代理方法#pragmamark----textFiledDelegate-----(BOOL)textField:(UITextField*)textFieldshouldChangeCharactersInRange:(NSRange)range
看《绝望主妇》学英语高雅_1f79
第四季第九集剧集连接https://www.imeiju.cc/Play/3543-1-8.html单词1.tendverbUS/tend/1）tendverb(BELIKELY)趋向；倾向于[I]tobelikelytohappenortohaveaparticularcharacteristicoreffect:Wetendtoeatathome.Childrentendtobelikethe
【Java】Mybatis Druid连接池配置详细 beautiful_huang Java
pom.xmlcom.alibabadruid1.0.18.propertiesspring.datasource.driver-class-name=com.mysql.jdbc.Driverspring.datasource.url=jdbc:mysql://localhost:3306/mybatis2?characterEncoding=utf-8&useSSL=truespring.da
LeetCode笔记：717. 1-bit and 2-bit Characters Cloudox_
问题（Easy）：Wehavetwospecialcharacters.Thefirstcharactercanberepresentedbyonebit0.Thesecondcharactercanberepresentedbytwobits(10or11).Nowgivenastringrepresentedbyseveralbits.Returnwhetherthelastcharacter
Python中用于从图像中提取文本的8大OCR库 woshicver python ocr 开发语言
介绍你是否曾想过你的电脑如何能够从图像中读取文字？这都要归功于一种叫做光学字符识别（OpticalCharacterRecognition,OCR）的技术。在Python中，有一些非常酷的库可以帮助你的电脑理解图片中的文字。从谷歌强大的Tesseract到EasyOCR时髦的深度学习，这些库能够做一些非常了不起的事情。让我们来看看Python中的OCR库，了解这些库是如何将图像转换成可读文字的吧！
Spring-JdbcTemplate详解，这都看不懂就安心去当个咸鱼吧！ 2401_86367135 面试辅导大厂内推 spring sql 数据库
jdbc.url=jdbc:mysql://localhost:3306/springjdbc?characterEncoding=utf-8&serverTimezone=GMT%2B8&useSSL=falsejdbc.user=rootjdbc.password=1234可以通过HSQLDB自带的工具来初始化数据库表，这里我们写一个Bean，在Spring容器启动时自动创建一个users表：
MySql关键字 zm2714 mysql
ADDALLALTERANALYZEANDASASCASENSITIVEBEFOREBETWEENBIGINTBINARYBLOBBOTHBYCALLCASCADECASECHANGECHARCHARACTERCHECKCOLLATECOLUMNCONDITIONCONNECTIONCONSTRAINTCONTINUECONVERTCREATECROSSCURRENT_DATECURRENT_TI
vscode .vue文件格式化配置 yangdongnan 编辑器前端
{"files.autoSave":"afterDelay","editor.renderControlCharacters":true,"workbench.iconTheme":"material-icon-theme","window.zoomLevel":0,"[html]":{"editor.defaultFormatter":"HookyQR.beautify"},"[javascri
已解决SyntaxError : invalid character in identifier异常的正确解决方法，亲测有效！！！飞码创造者解决bug python bug
已解决SyntaxError:invalidcharacterinidentifier异常的正确解决方法，亲测有效！！！文章目录报错问题报错原因解决方法报错问题SyntaxError:invalidcharacterinidentifier异常报错原因
修改Mysql默认字符集 LeslieLiang
使用SHOWVARIABLESLIKE'character%'查看当前字符集Snipaste_2018-10-09_14-21-34.jpg1.进入Mysql的目录下，将my-default.txt复制为my.ini(影响不大)2.修改my.ini，在对应字段下添加以下内容[mysqld]character-set-server=utf8[client]default-character-set=
JSON parse error: Illegal character ((CTRL-CHAR, code 31)): only regular white space (\r, \n, \t) Chen__Wu java java json
JSONparseerror:Illegalcharacter((CTRL-CHAR,code31)):onlyregularwhitespace(\r,\n,\t)isallowedbetweentokens;nestedexceptioniscom.fasterxml.jackson.core.JsonParseException:Illegalcharacter((CTRL-CHAR,cod
浅谈gbase与oracle 字符集差异 gbase_lmax java 前端开发语言
字符集字符集（CharacterSet）：按照一定的字符编码方案，将特定的符号集编码为计算机能够处理的数值的集合。常见字符集名称：ASCII字符集、Unicode字符集、GB2312字符集、BIG5字符集、GB18030字符集等。字符编码字符编码（CharacterEncoding）：是一套规则，对字符集进行编码的方案。如，Unicode是字符集，UTF-8、UTF-16、UTF-32是三种字符编
创建Hive表后，查看表结构发现中文注释乱码 StoicD Hive hive
问题描述：创建Hive表后，查看表结构发现中文注释乱码解决方法：进入mysql,执行如下命令usehive;#修改表字段注解编码altertableCOLUMNS_V2modifycolumnCOMMENTvarchar(256)charactersetutf8;#修改表注解编码altertableTABLE_PARAMSmodifycolumnPARAM_VALUEvarchar(4000)ch
Custom Sort String nafoahnaw
SandTarestringscomposedoflowercaseletters.InS,noletteroccursmorethanonce.Swassortedinsomecustomorderpreviously.WewanttopermutethecharactersofTsothattheymatchtheorderthatSwassorted.Morespecifically,ifx
亚马逊云科技大语言模型加速OCR应用场景发展热爱coding的星辰 ocr 自然语言处理人工智能 aws
大语言模型是一种基于神经网络的自然语言处理技术，它能够学习和预测自然语言文本中的规律和模式，可以理解和生成自然语言的人工智能程序。在大型语言模型中，神经网络模型可以通过学习大量的语言数据，自动提取自然语言文本中的特征和模式，以实现自然语言的理解和生成。OCR技术（OpticalCharacterRecognition）是一种广泛应用的人工智能技术，在大语言模型基础上，能够从文档或图像中提取文本、手
halcon深度学习4：深度学习在 OCR的用法-deep_ocr_workflow解析 mlxg99999 halcon深度学习自学
1.什么是OCR技术OCR，全称是OpticalCharacterRecognition,即光学字符识别，面向扫描文件。但是由于现在数字图像的普及，这里泛指文字检测和识别，包括扫描文档和自然场景的文字识别。2、deep_ocr_workflow在深度学习中，只有一篇例子关于OCR就是这一篇，文中介绍了深度OCR模型的建立与使用（如果使用过计量模型的可以较好理解，就是建立模型→设置参数→导入图片→进
解决UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: weixin_33928467 python 数据库
2019独角兽企业重金招聘Python工程师标准>>>字符串在Python内部的表示是unicode编码，因此，在做编码转换时，通常需要以unicode作为中间编码，即先将其他编码的字符串解码（decode）成unicode，再从unicode编码（encode）成另一种编码。Decode的作用是将其他编码的字符串转换成unicode编码，如str1.decode('gb2312')，表示将gb2
mysql连接oceanbase数据库集群+租户 AE_ 数据库 mysql oceanbase
mysql集成的有连接oceanbase数据库的方式，所以只需要对参数进行修改即可。url:jdbc:mysql://[ip地址]:[端口]/[数据库]?useUnicode=true&characterEncoding=UTF-8&serverTimezone=UTC//其他参数根据需求设置username:[用户名]@[租户名]#[集群名]password:*******driver-clas
Install MariaDB weixin_33998125 数据库运维
1.yuminstallmariadb-servermariadb2.systemctlenablemariadb;systemctlstartmariadb3.mysql_secure_installation4.mysql-uroot-pmysql-e"SELECTUser,HostFROMuser"5.mysql-uroot-p6.CREATEDATABASEdb1CHARACTERSETu
Objective-C高级特性浅析与实践指南小鹿撞出了脑震荡 objective-c 学习
OC的学习笔记（二）文章目录OC的学习笔记（二）@property访问控制符点语法自定义`init`方法内存管理retain和release@class处理发生异常的方法NSSrting的常用方法类方法对象方法lengthcharacterAtIndexisEuqalStringcompare@autorelease和自动释放池自动释放池Category类别与扩展category的运用NSNumb
一文解析AI社交网络 CharacterX，确定性空投不可错过小树苗193 人工智能
从OpenAI发布ChatGPT以来，AI迅速成为全球热门话题，围绕AI的衍生应用层出不穷。随着前一段Sora的亮相，AI的优异表现再次震惊世界。人们不得不承认，未来的互联网将不可避免的以AI为主线进行推动和演化。在加密领域，进入2024年以来，AI相关的应用和代币都有非常亮眼的表现，AI版块多次领涨，WLD、RNDR、FET等都有数倍涨幅，整体表现优于其他赛道。AI将彻底改变人们的生活、社交、娱
学习大数据DAY43 Sqoop 安装，配置环境和使用工科小石头大数据培训学习大数据 sqoop hive hadoop
目录sqoop安装配置mysqlsqoop安装sqoop指令集sqoop使用sqoop创建hive表sqoop全量导入表sqoop增量导入表sqoop全量导出表sqoop分区表导入表sqoop分区表导出表上机练习sqoop安装配置mysqlcreatedatabasetestDEFAULTCHARACTERSETutf8DEFAULTCOLLATEutf8_general_ci;--创建数据库sh
mysql主从同步 warrah 岁月云——运维 mysql 主从
1mysql主从1.1主节点1.1.1配置[client]port=3306socket=/data/mysql_3306/tmp/mysql.sock#default-character-set=utf8default-character-set=utf8mb4[mysqld]port=3306socket=/data/mysql_3306/tmp/mysql.sock#log=/data/my
每天一个单词，一起来学习吧！紫央
feature特征；容貌；特色；特写；以……为特色；刊登例：Acharacteristicfeatureofthisareaisthedetachedpillarsofrockthatstandinthesea.这一区域的特色是耸立在海中孤立的石柱。Thisrestaurantseemstofeaturevegetariandishes.这家餐厅似乎以素菜为其特色。同义词：characterist
windows vscode Delete `␍`eslintprettier/prettier 报错 Ciito 前端 windows vscode ide
问题根源罪魁祸首是git的一个配置属性：core.autocrlf由于历史原因，windows下和linux下的文本文件的换行符不一致。Windows在换行的时候，同时使用了回车符CR(carriage-returncharacter)和换行符LF(linefeedcharacter)而Mac和Linux系统，仅仅使用了换行符LF老版本的Mac系统使用的是回车符CRWindowsLinux/Mac
【分布式注册中心】NACOS_2.3.0部署与实战布熬夜了后端开发分布式 java spring boot
部署一准备1依赖：MYSQL2创建数据库CREATEdatabaseifNOTEXISTS`nacos`defaultcharactersetutf8mb4collateutf8mb4_unicode_ci;3导入初始化SQLhttps://raw.githubusercontent.com/alibaba/nacos/develop/distribution/conf/mysql-schema.
only regular white space (\r, \n, \t) is allowed between tokens 梦昼初PurpleShell 经验分享 java springcloud feign
Causedby:com.fasterxml.jackson.core.JsonParseException:Illegalcharacter((CTRL-CHAR,code31)):onlyregularwhitespace(\r,\n,\t)isallowedbetweentokensat[Source:(PushbackInputStream);line:1,column:2]该异常是由于S
JDBC流ASCII和二进制数据智慧浩海 JDBC教程 java
PreparedStatement对象可以使用输入和输出流来提供参数数据。能够将整个文件放入可以容纳大值的数据库列，例如CLOB和BLOB数据类型。有以下方法可用于流式传输数据-setAsciiStream()：此方法用于提供大的ASCII值。setCharacterStream()：此方法用于提供较大的UNICODE值。setBinaryStream()：此方法用于提供较大的二进制值。setXX
Remove Adjacent Repeated Characters II GakkiLove
Removeadjacent,repeatedcharactersinagivenstring,leavingonlytwocharactersforeachgroupofsuchcharacters.Thecharactersinthestringaresortedinascendingorder.Examples“aaaabbbc”istransferredto“aabbc”classSolu
java常见单词汇总2（非常使用哦）糟糕透了的都精彩极了 java 学习 java常用英文
lang包：字符串类的方法：character类：isLetter():判断是不是字母isDigit():判断是不是数字isWhiteSpace():判断是不是空格isUpperCase():判断是不是大写isLowerCase():判断是不是小写String类：equals():比较对象中值是否相等length():返回字符串长度CompareTo():比较相同索引位置上字符的ASCIIStar
LeetCode[位运算] - #137 Single Number II Cwind java Algorithm LeetCode 题解位运算
原题链接：#137 Single Number II 要求：给定一个整型数组，其中除了一个元素之外，每个元素都出现三次。找出这个元素注意：算法的时间复杂度应为O(n)，最好不使用额外的内存空间难度：中等分析：与#136类似，都是考察位运算。不过出现两次的可以使用异或运算的特性 n XOR n = 0, n XOR 0 = n，即某一
《JavaScript语言精粹》笔记 aijuans JavaScript
0、JavaScript的简单数据类型包括数字、字符创、布尔值（true/false）、null和undefined值，其它值都是对象。 1、JavaScript只有一个数字类型，它在内部被表示为64位的浮点数。没有分离出整数，所以1和1.0的值相同。 2、NaN是一个数值，表示一个不能产生正常结果的运算结果。NaN不等于任何值，包括它本身。可以用函数isNaN(number)检测NaN,但是
你应该更新的Java知识之常用程序库 Kai_Ge java
在很多人眼中，Java 已经是一门垂垂老矣的语言，但并不妨碍 Java 世界依然在前进。如果你曾离开 Java，云游于其它世界，或是每日只在遗留代码中挣扎，或许是时候抬起头，看看老 Java 中的新东西。 Guava Guava[gwɑ:və]，一句话，只要你做Java项目，就应该用Guava（Github）。 guava 是 Google 出品的一套 Java 核心库，在我看来，它甚至应该
HttpClient 120153216 httpclient
/** * 可以传对象的请求转发，对象已流形式放入HTTP中 */ public static Object doPost(Map<String,Object> parmMap,String url) { Object object = null; HttpClient hc = new HttpClient(); String fullURL
Django model字段类型清单 2002wmj django
Django 通过 models 实现数据库的创建、修改、删除等操作，本文为模型中一般常用的类型的清单，便于查询和使用： AutoField：一个自动递增的整型字段，添加记录时它会自动增长。你通常不需要直接使用这个字段；如果你不指定主键的话，系统会自动添加一个主键字段到你的model。(参阅自动主键字段) BooleanField：布尔字段,管理工具里会自动将其描述为checkbox。 Cha
在SQLSERVER中查找消耗CPU最多的SQL 357029540 SQL Server
返回消耗CPU数目最多的10条语句 SELECT TOP 10 total_worker_time/execution_count AS avg_cpu_cost, plan_handle, execution_count, (SELECT SUBSTRING(text, statement_start_of
Myeclipse项目无法部署，Undefined exploded archive location 7454103 eclipse MyEclipse
做个备忘！错误信息为： Undefined exploded archive location 原因：在工程转移过程中，导致工程的配置文件出错；解决方法：
GMT时间格式转换 adminjun GMT 时间转换
普通的时间转换问题我这里就不再罗嗦了，我想大家应该都会那种低级的转换问题吧，现在我向大家总结一下如何转换GMT时间格式，这种格式的转换方法网上还不是很多，所以有必要总结一下，也算给有需要的朋友一个小小的帮助啦。 1、可以使用 SimpleDateFormat SimpleDateFormat EEE-三位星期 d-天 MMM-月 yyyy-四位年
Oracle数据库新装连接串问题 aijuans oracle数据库
割接新装了数据库，客户端登陆无问题，apache/cgi-bin程序有问题，sqlnet.log日志如下： Fatal NI connect error 12170. VERSION INFORMATION: TNS for Linux: Version 10.2.0.4.0 - Product
回顾java数组复制 ayaoxinchao java 数组
在写这篇文章之前，也看了一些别人写的，基本上都是大同小异。文章是对java数组复制基础知识的回顾，算是作为学习笔记，供以后自己翻阅。首先，简单想一下这个问题：为什么要复制数组？我的个人理解：在我们在利用一个数组时，在每一次使用，我们都希望它的值是初始值。这时我们就要对数组进行复制，以达到原始数组值的安全性。java数组复制大致分为3种方式：①for循环方式 ②clone方式 ③arrayCopy方
java web会话监听并使用spring注入 bewithme Java Web
在java web应用中，当你想在建立会话或移除会话时，让系统做某些事情，比如说，统计在线用户，每当有用户登录时，或退出时，那么可以用下面这个监听器来监听。 import java.util.ArrayList; import java.ut
NoSQL数据库之Redis数据库管理(Redis的常用命令及高级应用) bijian1013 redis 数据库 NoSQL
一 .Redis常用命令 Redis提供了丰富的命令对数据库和各种数据库类型进行操作，这些命令可以在Linux终端使用。 a.键值相关命令 b.服务器相关命令 1.键值相关命令 &
java枚举序列化问题 bingyingao java 枚举序列化
对象在网络中传输离不开序列化和反序列化。而如果序列化的对象中有枚举值就要特别注意一些发布兼容问题: 1.加一个枚举值新机器代码读分布式缓存中老对象，没有问题，不会抛异常。老机器代码读分布式缓存中新对像，反序列化会中断，所以在所有机器发布完成之前要避免出现新对象，或者提前让老机器拥有新增枚举的jar。 2.删一个枚举值新机器代码读分布式缓存中老对象，反序列
【Spark七十八】Spark Kyro序列化 bit1129 spark
当使用SparkContext的saveAsObjectFile方法将对象序列化到文件，以及通过objectFile方法将对象从文件反序列出来的时候，Spark默认使用Java的序列化以及反序列化机制，通常情况下，这种序列化机制是很低效的，Spark支持使用Kyro作为对象的序列化和反序列化机制，序列化的速度比java更快，但是使用Kyro时要注意，Kyro目前还是有些bug。 Spark
Hybridizing OO and Functional Design bookjovi erlang haskell
推荐博文： Tell Above, and Ask Below - Hybridizing OO and Functional Design 文章中把OO和FP讲的深入透彻，里面把smalltalk和haskell作为典型的两种编程范式代表语言，此点本人极为同意，smalltalk可以说是最能体现OO设计的面向对象语言，smalltalk的作者Alan kay也是OO的最早先驱，
Java-Collections Framework学习与总结-HashMap BrokenDreams Collections
开发中常常会用到这样一种数据结构，根据一个关键字，找到所需的信息。这个过程有点像查字典，拿到一个key，去字典表中查找对应的value。Java1.0版本提供了这样的类java.util.Dictionary(抽象类)，基本上支持字典表的操作。后来引入了Map接口，更好的描述的这种数据结构。 &nb
读《研磨设计模式》-代码笔记-职责链模式-Chain Of Responsibility bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /** * 业务逻辑：项目经理只能处理500以下的费用申请，部门经理是1000，总经理不设限。简单起见，只同意“Tom”的申请 * bylijinnan */ abstract class Handler { /*
Android中启动外部程序 cherishLC android
1、启动外部程序引用自： http://blog.csdn.net/linxcool/article/details/7692374 //方法一 Intent intent=new Intent(); //包名包名+类名（全路径） intent.setClassName("com.linxcool", "com.linxcool.PlaneActi
summary_keep_rate coollyj SUM
BEGIN /*DECLARE minDate varchar(20) ; DECLARE maxDate varchar(20) ;*/ DECLARE stkDate varchar(20) ; DECLARE done int default -1; /* 游标中注册服务器地址 */ DE
hadoop hdfs 添加数据目录出错 daizj hadoop hdfs 扩容
由于原来配置的hadoop data目录快要用满了，故准备修改配置文件增加数据目录，以便扩容，但由于疏忽，把core-site.xml, hdfs-site.xml配置文件dfs.datanode.data.dir 配置项增加了配置目录，但未创建实际目录，重启datanode服务时，报如下错误： 2014-11-18 08:51:39,128 WARN org.apache.hadoop.h
grep 目录级联查找 dongwei_6688 grep
在Mac或者Linux下使用grep进行文件内容查找时，如果给定的目标搜索路径是当前目录，那么它默认只搜索当前目录下的文件，而不会搜索其下面子目录中的文件内容，如果想级联搜索下级目录，需要使用一个“-r”参数： grep -n -r "GET" . 上面的命令将会找出当前目录“.”及当前目录中所有下级目录
yii 修改模块使用的布局文件 dcj3sjt126com yii layouts
方法一：yii模块默认使用系统当前的主题布局文件，如果在主配置文件中配置了主题比如: 'theme'=>'mythm', 那么yii的模块就使用 protected/themes/mythm/views/layouts 下的布局文件；如果未配置主题，那么 yii的模块就使用 protected/views/layouts 下的布局文件，总之默认不是使用自身目录 pr
设计模式之单例模式 come_for_dream 设计模式单例模式懒汉式饿汉式双重检验锁失败无序写入
今天该来的面试还没来，这个店估计不会来电话了，安静下来写写博客也不错，没事翻了翻小易哥的博客甚至与大牛们之间的差距，基础知识不扎实建起来的楼再高也只能是危楼罢了，陈下心回归基础把以前学过的东西总结一下。 *********************************
8、数组豆豆咖啡二维数组数组一维数组
一、概念数组是同一种类型数据的集合。其实数组就是一个容器。二、好处可以自动给数组中的元素从0开始编号，方便操作这些元素三、格式 //一维数组 1,元素类型[] 变量名 = new 元素类型[元素的个数] int[] arr =
Decode Ways hcx2013 decode
A message containing letters from A-Z is being encoded to numbers using the following mapping: 'A' -> 1 'B' -> 2 ... 'Z' -> 26 Given an encoded message containing digits, det
Spring4.1新特性——异步调度和事件机制的异常处理 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
squid3(高命中率)缓存服务器配置 liyonghui160com
系统:centos 5.x 需要的软件:squid-3.0.STABLE25.tar.gz 1.下载squid wget http://www.squid-cache.org/Versions/v3/3.0/squid-3.0.STABLE25.tar.gz tar zxf squid-3.0.STABLE25.tar.gz &&
避免Java应用中NullPointerException的技巧和最佳实践 pda158 java
1) 从已知的String对象中调用equals()和equalsIgnoreCase()方法，而非未知对象。　　总是从已知的非空String对象中调用equals()方法。因为equals()方法是对称的，调用a.equals(b)和调用b.equals(a)是完全相同的，这也是为什么程序员对于对象a和b这么不上心。如果调用者是空指针，这种调用可能导致一个空指针异常 Object unk
如何在Swift语言中创建http请求 shoothao http swift
概述：本文通过实例从同步和异步两种方式上回答了”如何在Swift语言中创建http请求“的问题。如果你对Objective-C比较了解的话，对于如何创建http请求你一定驾轻就熟了，而新语言Swift与其相比只有语法上的区别。但是，对才接触到这个崭新平台的初学者来说，他们仍然想知道“如何在Swift语言中创建http请求？”。在这里,我将作出一些建议来回答上述问题。常见的
Spring事务的传播方式 uule spring事务
传播方式：新建事务 required required_new - 挂起当前非事务方式运行 supports &nbs

character set,utf 8, unicode, ansi

你可能感兴趣的:(character)