Unicode Version 5.1 Released
Mountain View, CA, April 4, 2008 – The Unicode Consortium is pleased to announce the release of Unicode 5.1. This release contains over 100,000 characters, and provides significant additions and improvements that extend text processing for software worldwide. Some of the key features are: increased security in data exchange, significant character additions for Indic and South East Asian scripts, expanded identifier specifications for Indic and Arabic scripts, improvements in the processing of Tamil and other Indic scripts, linebreaking conformance relaxation for HTML and other protocols, strengthened normalization stability, new case pair stability, plus others given below.
The Version 5.1.0 data files and documentation are final and posted on the Unicode site. In addition to updated existing files, implementers will find new test data files (for example, for linebreaking) and new XML data files that encapsulate all of the Unicode character properties. For details, see the page for Unicode 5.1.0 at http://www.unicode.org/versions/Unicode5.1.0/.
A major feature of Unicode 5.1.0 is the enabling of ideographic variation sequences. These sequences allow standardized representation of glyphic variants needed for Japanese, Chinese, and Korean text. The first registered collection, from Adobe Systems, is now available at http://www.unicode.org/ivd/.
Unicode 5.1 contains significant changes to properties and behavioral specifications. Several important property definitions were extended, improving linebreaking for Polish and Portuguese hyphenation. The Unicode Text Segmentation Algorithms, covering sentences, words, and characters, were greatly enhanced to improve the processing of Tamil and other Indic languages. The Unicode Normalization Algorithm now defines stabilized strings and provides guidelines for buffering. Standardized named sequences are added for Lithuanian, and provisional named sequences for Tamil.
Unicode 5.1.0 adds 1,624 newly encoded characters. These additions include characters required for Malayalam and Myanmar and important individual characters such as Latin capital sharp s for German. Version 5.1 extends support for languages in Africa, India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. Scholarly support includes important editorial punctuation marks, as well as the Carian, Lycian, and Lydian scripts, and the Phaistos disc symbols. Other new symbol sets include dominoes, Mahjong, dictionary punctuation marks, and math additions. This latest version of the Unicode Standard has exactly the same character assignments as ISO/IEC 10646:2003 plus Amendments 1 through 4.
The Unicode Collation Algorithm (UCA), the core standard for sorting all text, is also being updated at the same time (see http://www.unicode.org/reports/tr10/). The major changes in UCA include coverage of all Unicode 5.1 characters, tightened conformance for canonical equivalence, clearer definitions of internationalized search and matching, specifications of parameters for customizing collation, and definitions of collation folding. There are also important clarifications on the use of contractions (such as "ch" in Slovak) in collation.
The next version of the Unicode locale project (CLDR) is also being prepared on the basis of Unicode 5.1, and is now open for public data submission (see http://www.unicode.org/cldr/).
About The Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.
The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe, Apple, Basis Technology, Denic e.G., Google, Government of India, Government of Pakistan, Government of Tamil Nadu, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun Microsystems, Sybase, The University of California at Berkeley, Yahoo! plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium.
译者: sunboxy
2008年4月4日在加州 Mountain View,U nicode协会很高兴地宣布发布Unicode5 .1。本版本中包含有超过10万个字符,并为世界各地文字处理提供了显着的增加和改善。一些主要特点是:在数据交换上提高安全性,,显着的增补字符为印度及东南亚文字,扩大印度和阿拉伯语文字标识符说明,还改进了泰米尔和其他印度文字的处理,为HTML和其他协议放宽断行的一致性,加强规范化稳定,再加上下面给出的其他内容 。
版本5.1.0数据档案和文件都是最终的并张贴在Unicode的网站。此外,以更新现有的档案,执行者将寻找新的测试数据文件(例如, linebreaking 断行)和新的XML数据文件,并包括了所有的Unicode字符属性。详情请参阅网页上的Unicode 5.1.0在http://www.unicode.org/versions/unicode5.1.0/ 。
Unicode 5.1.0一个主要特点的是允许的表意文字变化序列。这些序列允许日文、中文和朝鲜文标准字符所需的变体。首次登记的集合,来自Adobe系统,现已在http://www.unicode.org/ivd/ 。
Unicode的5.1包含有重大属性改变和行为规范。几个重要属性的定义范围扩大,改善波兰语和葡萄牙语的断行连接符。在 Unicode的文字分割算法,包括句子,词和字符,大大提高泰米尔和其他种语言的处理。Unicode 字符正常化算法,现在确定了稳定的字符串,并为缓冲提供指引。标准化命名序列补充了立陶宛语,及临时命名泰米尔语序列。
Unicode的5.1.0增添新的1624个编码字符。这些增补字符,包括马拉雅拉姆语与缅甸所需要的重要字符,如德语中拉丁大写字符S。 5.1版扩展到支持非洲,印度,印度尼西亚,缅甸和越南等语言。并增设了Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, 和 Vai 文字。学术支持包括重要的编辑标点符号,以及Carian, Lycian, and Lydian文字,和Phaistos磁盘符号。其他新的符号集,包括多米诺骨牌,麻将,字典标点符号,和数学补充。这是最新版的Unicode的标准严密对应,作为国际标准化组织/国际电工技术委员会10646:2003加修正1至4 。
在Unicode系统整理算法( uca ) ,其核心标准排序所有文本,也正在更新,在同一时间内(见http://www.unicode.org/reports/tr10/ )。uca主要的变化包括涵盖所有的Unicode 5.1字符,收紧一致性,为典型的匹配性,更清晰的定义国际化搜索和匹配,规格参数定制整理,并定义整理折叠。也有一些重要的说明对使用省略(如"Ch " ,在斯洛伐克) ,在整理中。
下一个版本的Unicode系统的本地项目( cldr )也正在Unicode 5.1基础上准备 ,现在已经是开放给公众数据意见书(见http://www.unicode.org/cldr/ ) 。
Unicode 协会是一个建立开发,扩大和推广使用的Unicode的标准和相关的全球化标准的非营利性的组织。
协会成员代表着在计算机和信息处理业的广泛的公司和组织。成员名单如下: Adobe公司,苹果电脑,基础技术, denic例如, Google公司,印度政府,巴基斯坦政府,政府的泰米尔纳德邦,惠普, IBM , justsystems ,微软,甲骨文, SAP , Sun Microsystems公司,赛贝斯,加州大学伯克利分校,雅虎!再加上百的合作者、联络和个人会员。