Uniscribe
Uniscribe
是一组APIs用来精细真实控制复杂文本处理。因为字符、符号不是以一个简单的方式排版,所以一个复合文本需要特殊处理以显示和编辑。控制符号的形状和位置的规则被指定在The Unicode Standard:Worldwide Character Encoding ,Version 2.0, Version 2.0, Addison-Wesley Publishing Company.
这个主题讨论处理复杂文本不同的方面,在下面列出。
- About Uniscribe
- Using Uniscribe
- Uniscribe Reference
About Uniscribe
Uniscribe
是处理复杂文本几种方法之一。放它到设备环境中,我们从一个复杂文本的简短描述和特殊问题以及讨论其它处理复杂文本的标准方法开始。
- About Complex Scripts
- Processing Complex Scripts
- OpenType Font Format
About Complex Scripts
一个复杂文本至少有下列一个特征:
l 允许双向绘制
l 有上下主修整
l 有组合字符
l 有专门的字中断和对齐规则
l 筛选出非法字符组合
双向绘制引用文本的能力以处理从左到右和从右到左的读取文本
.举个例子:阿拉伯数字的双向绘制,对文本的默认读取方向是从右到左,但一些数字,它是从左到右,处理一个复杂文本必须解决符号的逻辑顺序和可视顺序之间的不同。另外,必须适当的处理(caret)插入符号的移动和击中测试,在屏幕位置和字符序号之间映射,也就是说文本选择或字符显示需要布局算法知识.
当一个文本的字符依照围绕它的字符改变形状时
”上下文修整”重现,这个重现在英语草写体中比如当一个小写的”l”改变形状时它要取决于它前面的字符例如一个”a”(连接低音到”l”)或一个”o”(连接高音),阿拉伯数字是一个显示上下文修整的文本。
组合字符或连体字当它们一块放置时联合成一个字符。一个例子
”ae”,在英语中联结;它有时由一个单一字符表示。阿拉伯数字是一个有好多组合字符的文本.
专门的字中断和字对齐引用有复杂规则的在一个文本行上在行和对齐文本之间划分字的文本。
当一种语言不允许某些字符组合
,筛选出违例字符组合重现,泰语就是这样的文本。
Built on Tuesday, May 09, 2000
Uniscribe
Uniscribe 能够非常精细的处理复杂文本,它支持在文本中复杂规则的查找,如阿拉伯数字、印度语、泰语,经也处理文本从到到左写,如阿拉伯数字和希伯来文,并且支持混合文本。
OpenType Font Format
Unicode-based Microsoft® OpenType® 字体格式扩展了TrueType 字体文件格式,OpenType字体允许在字符和glyphs(字形)之间映射,允许支持连体字,位置格式,替换和其它代替。OpenType字符也可以包含支持二维字形位置和字形附属的信息,并可以包含TrueTYpe或PostScript形状.
在OpenType字符内部的布局特征是由文本和语言组合的,允许一个单一字体支持多重书写系统,甚至在同一个文本之中,在应用程序中确保在文本布局操作中的一致性(连贯性、相容性)并避免不必要的系统开销。多数文本布局和语言语义算法被包含在Uniscribe中.它减轻了开发者在一个字体中不得不定义广义字体规则的负担。
应用程序关于文本布局可以引用它们自己的知识或参数选择.OpenType布局字体甚至可以包含复制或取代那些由操作系统服务的应用布局规则,操作系统服务的分层结构支持文本布局允许一个客户选择使用哪一个布局信息和如何应用它。
Using Uniscribe
下面的章节展示了处理Uniscribe的代码例子.
- 确定一个文本是否需要字形修整
- Translating Mouse Hit 'x' Offset to Caret Position
- Displaying the Caret in Bidirectional Strings
确定一个文本是否需要字形修整
下面的代码例子调用ScriptGetProperties以检测文本是否需要字形修整.
The following code sample calls
ScriptGetProperties to check if the script requires glyph shaping.
const SCRIPT_PROPERTIES **g_ppScriptProperties;
int g_iMaxScript;
ScriptGetProperties(&g_ppScriptProperties,
&g_iMaxScript);
hResult = ScriptItemize( … , pItems, &cItems);
for (i=0; i<cItems; i++) {
if (g_ppScriptProperties[pItems[i].a.eScript]
>fComplex) {
// Item [i] is complex script text
// requiring glyph shaping
}
}
ScriptGetProperties
ScriptGetProperties函数返回当前文本的信息.
HRESULT WINAPI ScriptGetProperties(
const SCRIPT_PROPERTIES ***ppSp,
int *piNumScripts
);
Parameters
ppSp
[out]接收一个指向一个由文本编入索引的SCRIPT_PROPERTIES结构的指针数组。
piNumScripts
[out] 接收文本数量,这个值的有效范围是0到
NumScripts-1;
Return Values
如果函数成功,返回值是零。
如果函数失败,它返回一个非零埴,如果任何不可校正的错误出现,它也返回一个HRESULT值。举例,从Win32 API出错返回使用
HRESULT_FROM_WIN32宏被转换为HRESULT并返回在HRSULT中给用户。
转换鼠标击中”x”的偏移位置为插入记号位置
按照惯例,插入记号位置(cp)可以通过点击字符二分之一的后面一半或字符二分之一的前面一半。这个可以按下面实现:
int iCharPos;
int iCaretPos
int fTrailing;
ScriptXtoCP(iMouseX, ..., &iCharPos, &fTrailing);
iCaretPos = iCharPos + fTrailing;
对于文本, 揿钮接头
For scripts that snap the caret to cluster boundaries,
ScriptXtoCP returns
ftrailing set to either 0 or the width of the cluster in code points.
在双向字符串中显示插入记号
在单向文本中,在插入记号位置上没有二义性,因为字符前沿与先前字符的尾沿是在相同的位置上,但在双向文本中,插入记号位置在反向(反接方向)的run之间是模糊的。举个例子:在LTR(从左到右)的段落中”helloMAALAS”,最后的字符直接先于”salaam”的第一个字符。在这个串中的最好位置显示插入记号取决于是否它被认为按照以”hello”的”o”或者以”salaam”前面的”s”.
Uniscribe使用下面的插入记号协议。
状态
|
可视的插入记号位置
|
键入
|
最后键入字符的后沿
|
粘贴
|
最后粘贴字符的后沿
|
Caret advancing
|
Trailing edge of last character passed over.
|
Caret retiring
|
Leading edge of last character passed over.
|
Home (键)
|
行的前沿。Leading edge of line.
|
End (键)
|
行的后沿。Trailing edge of line.
|
插入记号可以按照下面定位:
if (advancing) {
ScriptCPtoX(iCharPos-1, TRUE, ..., &iCaretX);
} else {
ScriptCPtoX(iCharPos, FALSE, ..., &iCaretX);
}
或者更简单,给定一个fAdvancing BOOL限制为TRUE或者FALSE:
Or, more simply, given an fAdvancing BOOL restricted to TRUE or FALSE:
ScriptCPtoX(iCharPos-fAdvancing, fAdvancing, ..., &iCaretX);
ScriptCPtoX 逻辑上处理溢出:对于iCharPos<0它返回run的前沿,对于iCharPos=length它返回run的后沿.
handles out-of-range positions logically: It returns the leading edge of the run for iCharPos <0, and the trailing edge of the run for iCharPos =length.
Processing Complex Scripts with Uniscribe
Uniscribe provides APIs to support the display and editing of international text, including the complex rules of Middle Eastern and Asian scripts. Uniscribe provides low level routines for handling fully formatted text, and an easier ScriptString API set for unformatted text.
Using Uniscribe, applications need only manage a backing store of Unicode character codes. Text layout applications do not need to maintain any other buffer or mapping table to track character order. An application only needs to store and manage the order in which the characters were entered by the user, which is the same logical order as defined by Unicode. The application's backing store never changes as a result of layout operations. Uniscribe maintains an index from the reordered clusters to the original character boundaries passed by the application. The following topics are covered in this section.
·
Shaping Engines
·
Caching
·
Displaying Text with Uniscribe
·
The ScriptString Functions
·
Related Processing for Complex Scripts
·
Caret Placement and Hit Testing
·
Word Break Points
·
Character Clusters
·
Relationship Between Caret Positions, Justification Points, and Clusters
·
Notes on ScriptXtoCP and ScriptCPtoX
Shaping Engines
Uniscribe
使用对于特殊文本包含布局知识的多重修整引擎。它也利用
Microsoft® OpenType®
布局修整引擎处理特殊字符文本如字形生成,范围测量,和字中断支持。
uniscribe
使用
Unicode
双向算法管理双向字符重新排序,并且对于阿拉伯数字、希伯来文,和泰文理解
non-OpenType
布局字体格式。
精确的码点赋予每个修整引擎可以多样化,因此除了
SCRIPT_UNDEFINED
之外,文本数字没有被公布,但是,你能够通过调用
ScriptGetProperties
函数测试文本的属性,它访问全部文本属性表。应用程序可以使用全局文本属性以帮助组合它们自己所需的图形引擎划分的布局规则。
所有的复杂文本图形引擎、数字图形引擎和
ASCII
图形引擎在修整
(
图形
)
以前引擎验证
hdc
中的字体,并且如果字体不能包含足够的字形或修整
(
图形
)
表将返回
USP_E_SCRIPT_NOT_IN_FONT.
只有有属性
fComplex
的文本会被以由
ScriptItemize
函数返回的文本修整。所有其它的
runs
可能会被以指定在
SCRIPT_ANALYSIS
结构中的
SCRIPT_UNDEFINED
合并和修整。注意如果字符没有支持的字体,
SCRITP_UNDEFINED
不会以
USP_E_SCRIPT_NOT_IN_FONT
失败
.
缺少的字形通常会以一个空的矩形显示。一个应用程序能够通过调用
ScriptGetFontProperties
函数获得默认字形序号而确定是否一个代码点由一个字体支持,并且
ScriptGetCMap
函数对
Unicode
代码点查找字体字形。但是,一些代码点能够被通过一个字形组合显示,举个例子,
00c
9; LATIN CAPITAL LETTER E WITH ACUTE.
在这个例子中,如果一个字体支持大写字母
E
字形并且敏锐字形但不支持一个单一字形
009c
.ScriptGetCMap
将标记
009c
是未支持的。
对一个包含这些代码点的字符串可靠的确定字体支持调用
ScriptShape.
如果它返回
S_OK
,对缺少的字形检测输出。
Caching
Uniscribe
保存
Unicode
为字形映射
(CMAP)
、字形宽度、和
OpenType
文本图形表。一个用于特定尺寸的特定字体表的句柄被叫做一个
script cache(
文本缓存
)
。许多
Uniscribe
函数要求两个参数一个
HDC
和一个
SCRIPT_CACHE
参数。这些函数通过
script cache
查找第一个信息,使用这个设备环境只有当所需的表不是已经缓存时。当调用
ScriptShape
、
ScriptPlace
或者
ScriptTextOut
函数时你必须提供一个
SCRIPT_CACHE
结构指针,它必须被初始化为
NULL..
一个应用程序可以在任何时候释放一个
script cache,Uniscribe
在它的字体和图形器缓存中维护引用计数,并只有当所有字体的尺寸被释放时释放字体数据。当你使用一种格式时,也就是说,某一套典型的包括字体、尺寸和颜色属性,调用
ScriptFreeecache
函数以释放用于文件的
script cache.
对于
ScriptShape
和
ScriptPlace
,传递一个
NULL
设备环境是有效的。大多数经常调用将是成功的作为所需的表将已经被缓存。如果图形或布局需要访问一个设备环境,
ScriptShape
或
ScriptPlace
将直接返回
E_PENDING
错误代码。然后应用程序必须选择字体进入设备环境,这个除去大大多数对
SelectObject
函数的调用。
For ScriptShape and ScriptPlace, it is valid to pass a NULL device context. Most often the call will be successful as required tables will already be cached. If the shaping or placement requires access to a device context, ScriptShape or ScriptPlace will return immediately with the E_PENDING error code. Then the application must select the font into the device context. This eliminates most calls to the SelectObject function.
© 2002 Microsoft Corporation. All rights reserved.
Displaying Text with Uniscribe
一个使用复杂文本的应用程序有一个简单的接近格式和显示的问题。
首先,复杂文本的宽度取决于它的上下文。保存宽度在简单表中是不可能的。
第二文本中在字之间中断象泰文需要字典支持因为在泰文中的字之间没有分隔字符。
第三,阿拉伯文、希伯来文、波斯语、乌尔都语和其它双向文本在显示前需要记录。
最后,字体关联的格式经常需要容易的使用复杂文本。
充分的处理这些版本,
Uniscribe
使用段落作为显示单元。注意,这个意思是
Uniscribe
必须被用于整个段落。即使段落的章节不是复杂文本。
在使用
Uniscribe
之前,一个应用程序划分段落为
runs,
也就是说,一个有相同风格的字符串,风格取决于应用程序完成的实现,但典型的包括如字体的属性如尺寸和颜色。
Uniscribe
划分段落为
items—
有同一种文本和方向的字符串。应用程序应用
item
信息以产生
run
s,rusn
在文本和方向中是唯一的
.
Uniscribe
在每个
run
中识别
cluster(
串、群集
)
并确定每个
cluster(
串、群集
)
的尺寸,一个
cluster
是一个文本定义,是一个不可分割的组。对于欧洲语言,一个
cluster
是一个单一字符,但在语言中例如泰语,它是一个字形组,
Uniscribe
合计
cluster
以确定一个
run
的尺寸。然后应用程序合计
run
的长度直到它们溢出一行
(
或到达边距
)
。并在当前行和下一行之间划分溢出行的
run
。
对于每一行,一个映射从可视的位置被建立对一个
run
对于每一个
run
,代码点被图形化为字形,它然后被定位和绘制。
以这个溢出智能,我们能够查看详细处理和
Uniscribe
如何装配。一个应用程序做文本布局,或格式一次。然后它保存图形和位置以用于显示或者它每次产生它们它的显示文本。典型的一个应用程序每次显示时将产生字形和位置,因此处理被呈现为一个布局过程和一个显示过程。
使用
Uniscribe
布置文本
这个过程假定应用已经划分了段落为
runs
1.
Call ScriptRecordDigitSubstitution only when the application starts, or when receiving a WM_SETTINGCHANGE message.
2.
(optional) Call ScriptIsComplex to determine if the paragraph requires complex processing.
3.
For automatic digit substitution, call ScriptApplyDigitSubstitution to prepare the SCRIPT_CONTROL and SCRIPT_STATE structures in ScriptItemize. If the application does its own reordering and layout, it must substitute the proper digits for Unicode U+0030 through U+0039 (the Western digits).
4.
Call ScriptItemize to divide the paragraph into items. If an application already knows the bidirectional order -- for example, because of the keyboard layout used to enter the character -- it can call ScriptItemize with NULL for the SCRIPT_CONTROL and SCRIPT_STATE parameters. This generates items only by shaping engine. The application can then reorder the items using its information.
5.
Merge the item information with the run information to produce runs with a single style, script, and direction.
6.
Call ScriptGetCMap to assign a font to a run and get glyphs. If some glyphs are not supported by the font, either substitute another font or set the eScript member to SCRIPT_UNDEFINED. Note that if a font renders a code point by a combination of glyphs instead of a single glyph, this method may indicate that the code point is unsupported. In this case, call ScriptShape, check for an S_OK return code, and then check the output for missing glyphs.
7.
Call ScriptShape to identify clusters and generate glyphs.
8.
Call ScriptPlace to generate advance widths and x and y positions for the run width.
9.
Sum the run widths until the line overflows.
10.
Break the run on a word boundary by using the fSoftBreak and fWhiteSpace members in the logical attributes. To break a single character cluster off the run, use the information returned by calling ScriptBreak.
This completes layout of the line. Repeat steps 6 through 10 for each line in the paragraph. However, if the application needed to break the last run on the line, call ScriptShape to reshape the remaining part of the run as the first run on the next line.
To Display Text Using Uniscribe
This procedure is done for each line. It assumes that the text has already been laid out using Uniscribe, and that the glyphs and positions from the layout process were not saved. If speed is a concern, an application can save the glyphs and positions from the layout procedure and start at #2.
1.
For each run, in logical order.
a.
If the style has changed since the last run, update the hdc.
b.
Call ScriptShape to generate glyphs for the run.
c.
Call ScriptPlace to generate an advance width and an x,y offset for each glyph.
2.
Establish the correct visual order for the runs in this line:
a.
Extract an array of bidi embedding levels, one per run, from the merged item and run information. The embedding level is given by (SCRIPT_ITEM) si.(SCRIPT_ANALYSIS) a. (SCRIPT_STATE) s.uBidiLevel.
b.
Pass this array to ScriptLayout to generate a map of visual to logical positions.
3.
(optional) To justify the text, either call ScriptJustify or use specialized knowledge of the text. For more information, see Related Processing by Uniscribe.
4.
Use the visual to logical map to display the runs in visual order. Starting at the left end of the line, call ScriptTextOut to display the run given by the first entry in the visual to logical map. For each subsequent entry in the visual to logical map, call ScriptTextOut to display the indicated run to the right of the previously displayed run.
Note that step 2 may be omitted if the text contains no characters from right-to-left scripts, contains no bidi control characters, and the base embedding level is left-to-right. In this case, step 4 becomes: start at the left end of the line and call ScriptTextOut to display the first logical run and then to display each logical run to the right of the previous run.
© 2002 Microsoft Corporation. All rights reserved.
ScriptGetCMap
ScriptGetCMap函数接受一个字符串并依照TrueType cmap表或依照为老式字体而实现的标准的 cmap 表返回Unicode字符字形目录.
The
ScriptGetCMap function takes a string and returns the glyph indices of the Unicode characters according to the TrueType cmap table or the standard cmap table implemented for old style fonts.
HRESULT WINAPI ScriptGetCMap(
HDC hdc,
SCRIPT_CACHE *psc,
const WCHAR *pwcInChars,
int cChars,
DWORD dwFlags,
WORD *pwOutGlyphs
);
Parameters
hdc
[in] 设备环境句柄,这个参数是可选的.
psc
[in/out] 指向一个
SCRIPT_CACHE 类型的结构体.
pwcInChars
[in] 指向一个Unicode字符的字符串.
cChars
[in] 在
pwcInChars中的Unicode字符的数量.
dwFlags
[in] 这个参数可能是下面的值This parameter can be the following value.
Value
|
Meaning
|
SGCM_RTL
|
指示glyph数组pwOutGlyps应当包含一个镜像的字形以为那些有一个镜像等价的字形。
Indicates the glyph array
pwOutGlyps should contain mirrored glyphs for those glyphs that have a mirrored equivalent.
|
pwOutGlyphs
[out] 指向一个接收字形目录的数组.
Return Values
如果所有Unicode代码点在字体中是存在的,返回值是S_OK.
如果函数失败,它可以返回下列的非零值之一:
Return value
|
Meaning
|
E_HANDLE
|
字体或系统不支持字形目录
The font or the system does not support glyph indices.
|
S_FALSE
|
一些Unicode代码点被映射到默认字形.
Some of the Unicode code points were mapped to the default glyph.
|
如果其它不可校正的错误遇到,它也返回一个HRESULT值,举个例子,
If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
ScriptGetCMap可以被用于确定所选择的字体支持一个
run中的哪些字符。调用者可以扫描返回的字形缓冲区寻找默认字形以确定哪些字符不是可用的。用于所选的字体的默认字形目录应当由调用ScriptGetFontProperties确定.
返回值标志了任何缺失字形的出现。
The return value indicates the presence of any missing glyphs.
一些代码点可能由一个组合字形绘制,也可以由一个单一字形.举个例子,00C9; LATIN CAPITAL LETTER E WITH ACUTE,在这个例子中,如果字体支持大写E字形并且......,ScriptGetCMap将展示00C9是未支持的。确定字体支持包含这些种类代码点的字符串,调用ScriptShape,如果它返回S_OK,对于缺失的字形检测输出.
Note that some code points can be rendered by a combination of glyphs as well as by a single glyph -- for example, 00C9; LATIN CAPITAL LETTER E WITH ACUTE. In this case, if the font supports the capital E glyph and the acute glyph but not a single glyph for 00C9,
ScriptGetCMap will show 00C9 is unsupported. To determine the font support for a string that contains these kinds of code points, call
ScriptShape. If it returns S_OK, check the output for missing glyphs.
ScriptGetGlyphABCWidth
ScriptGetGlyphABCWidth函数返回一个给定字形的ABC宽度.
HRESULT WINAPI ScriptGetGlyphABCWidth(
HDC hdc,
SCRIPT_CACHE *psc,
WORD wGlyph,
ABC *pABC,
);
Parameters
hdc
[in] 设备环境句柄,它是可选的取决于psc.
psc
[in/out]
SCRIPT_CACHE 结构指针.
wGlyph
[in] 被分析的Glyph .
pABC
[out]
wGlyph的ABC 宽度.
Return Values
如果字形的ABC宽度被返回,函数返回S_OK.
如果字符或系统不支持字形目录,函数返回E_HANDLE,并且如果任何其它不可校正的错误出现,它也返回一个HRESULT值。
If the ABC width of the glyph is returned, the function returns S_OK.
If the font or system does not support glyph indices, the function returns E_HANDLE. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
ScriptGetGlyphABCWidth函数在绘制字形画格表是相当有用的,它不应当用于普通的复杂文本格式化。
The
ScriptGetGlyphABCWidth function may be useful for drawing glyph charts. It should not be used for ordinary complex script text formatting.
ABC
ABC结构包含了在一个TrueType字体中的字符宽度.
The
ABC structure contains the width of a character in a TrueType font.
typedef struct _ABC {
int abcA;
UINT abcB;
int abcC;
} ABC, *PABC;
Members
abcA
指定字符的A间距,A间距是在绘制字符字形之前增加(到)当前位置的距离.
Specifies the A spacing of the character. The A spacing is the distance to add to the current position before drawing the character glyph.
abcB
指定字符的B间距,B间距是字符字形绘制部分的宽度.
Specifies the B spacing of the character. The B spacing is the width of the drawn portion of the character glyph.
abcC
指定字符的C间距,C间距是增加当前位置以对字符字形右侧提供空白的距离。
Specifies the C spacing of the character. The C spacing is the distance to add to the current position to provide white space to the right of the character glyph.
Remarks
一个字符总的宽度是A、B、C的和,A或者C间距可以是负值以标志
The total width of a character is the summation of the A, B, and C spaces. Either the A or the C space can be negative to indicate underhangs or overhangs.
SCRIPT_CACHE
SCRIPT_CACHE是一个不透明指针,指向一个
Uniscribe字形度量调整存储器结构。
SCRIPT_CACHE is an opaque pointer to a Uniscribe font metric cache structure.
typedef void *SCRIPT_CACHE;
Remarks
用户必须为每种使用的字体分配和保持一个SCRIPT_CACHE变量,它必须被客户初始化为NULL.
许多script函数带一个HDC与SCRIPT_CACHE的组合,Uniscribe将首先试图使用SCRIPT_CACHE访问字体数据并且如果所需的数据没有被缓存将只是检查HDC。
Many script functions take a combination of HDC and
SCRIPT_CACHE. Uniscribe will first attempt to access font data by using the
SCRIPT_CACHE and will only inspect the HDC if the required data is not already cached.
HDC可以作为一个NULL传递。如果Uniscribe所需的数据已经被缓存,HDC将不被访问,并且操作正常继续。
The HDC may be passed as NULL. If data required by Uniscribe is already cached, the HDC won't be accessed, and the operation continues normally.
如果HDC作为一个NULL传递,并且因为任何原因Uniscribe需要访问它,Uniscribe将返回E_PENDING.
If the HDC is passed as NULL, and Uniscribe needs to access it for any reason, Uniscribe will return E_PENDING.
E_PENDING被快速返回,允许用户避免耗时的SelectObject调用。下面的例子适用于所有带一个SCRIPT_CACHE并且HDC是一个可选的参数的函数
E_PENDING is returned quickly, allowing the client to avoid time-consuming
SelectObject calls. The following example applies to all functions that take a SCRIPT_CACHE and an optional HDC.
hr = ScriptShape(NULL, &sc, ..);
if (hr == E_PENDING) {
... select font into hdc ...
hr = ScriptShape(hdc, &sc, ...);
}
ScriptStringAnalyse
ScriptStringAnalyse函数分析一个纯文本字符串
HRESULT WINAPI ScriptStringAnalyse(
HDC hdc,
const void *pString,
int cString,
int cGlyphs,
int iCharset,
DWORD dwFlags,
int iReqWidth,
SCRIPT_CONTROL *psControl,
SCRIPT_STATE *psState,
const int *piDx,
SCRIPT_TABDEF *pTabdef,
const BYTE *pbInClass,
SCRIPT_STRING_ANALYSIS *pssa
);
Parameters
hdc [in] 设备环境句柄.如果
dwFlags参数是SSA_GLYPH ,
hdc是需要的,如果
dwFlasg参数是SSA_BREAK ,
hdc是可选的.
如果
hdc存在,在
hdc中的当前字体被检查,如果当前字符是一个符号字体,字符串被对待作为一个单一中性的SCRIPT_UNDEFINED item选项。
If the
hdc is present, the current font in the
hdc is inspected. If the current font is a symbolic font, the character string is treated as a single neutral SCRIPT_UNDEFINED item.
pString
[in] 指向一个被分析的字符串指针,它至少要有一个字符..
cString
[in] 字符串长度,它必须至少是1。.
cGlyphs
[in] 字形缓冲尺寸,这是必需的,默认的尺寸是
cString * 3/2 +1.
iCharset
[in] 字符集描述符,如果这是一个ANSI字符串,这个是字符集,如果这是一个Unicode字符串,这个值是–1.
dwFlags
[in] 分析所需的标志,这个参数可以是下列值之一。
值
|
意思
|
SSA_BREAK
|
返回中断标志,就是说,字符和字停止
Returns break flags, that is, character and word stops.
|
SSA_CLIP
|
夹住字符在
iReqWidth
宽度中
Clips the string at
iReqWidth.
|
SSA_DZWG
|
为控制字符提供表示字形
Provides representation glyphs for control characters.
|
SSA_FALLBACK
|
使用fallback字体
Uses fallback fonts.
|
SSA_FIT
|
调整字符串到
iReqWidth
Justifies string to
iReqWidth.
|
SSA_GLYPHS
|
产生字形、位置和属性
Generates glyphs, positions, and attributes.
|
SSA_GCP
|
返回缺失字形和pwLogClust以GetCharacterPlacement协议.
Returns missing glyphs and
pwLogClust with
GetCharacterPlacement conventions.
|
SSA_HIDEHOTKEY
|
从显示字符串移走第一个’&’.
Removes the first '&' from displayed string.
|
SSA_HOTKEY
|
在后续的代码点上以下划线替换’&’
Replaces '&' with underline on subsequent codepoint.
|
SSA_HOTKEYONLY
|
Displays underline only.
只显示下划线
|
SSA_LINK
|
Applies East Asian font linking and association to noncomplex text
应用东亚字体链接并联合非复杂文本
|
SSA_METAFILE
|
以ExtTextOutW写条目,而不是使用glyphs.
Writes items with
ExtTextOutW calls, not with glyphs.
|
SSA_PASSWORD
|
输入字符串包含一个被复杂iLength次的单一字符
Inputs string contains a single character to be duplicated
iLength times.
|
SSA_RTL
|
Uses base embedding level 1.
使用基本嵌入级别1
|
SSA_TAB
|
Expands tabs.扩展tabs
|
iReqWidth
[in]所需宽度以适合或夹住. Required width for fitting or clipping.
psControl
[in] 一个SCRIPT_CONTROL结构指针.Pointer to a
SCRIPT_CONTROL structure.
psState
[in] Pointer to a
SCRIPT_STATE structure. The
uBidiLevel member of
SCRIPT_STATE is ignored. The value used is derived from the SSA_RTL flag in combination with the layout of the
hdc.
一个SCRIPT_STATE结构指针,SCRIPT_STATE的uBidiLevel成员被忽略,这个值源于在以hdc的布局组合中的SSA_RTL标志。
piDx
[in] 指向一个所需的逻辑dx数组。Pointer to the requested logical dx array.
pTabdef
[in] 指向一个SCRIPT_TABDEF结构体的指针Pointer to a
SCRIPT_TABDEF structure.
pbInClass
[in] Pointer to a
BYTE that indicates
GetCharacterPlacement character classifications.
指向一个BYTE的指针,以标志GetCharacterPlacement字符分类
pssa
[out] Pointer to a
SCRIPT_STRING_ANALYSIS structure.
指向一个SCRIPT_STRING_ANALYSIS结构体.
Return Values
如果函数成员,返回S_OK。
如果有一个无效的参数,它返回E_INVALIDARG.
如果SA_FALLBACK没被指定,或者如果缺少一个标准的fallback字体,它返回USP_E_SCRIPT_NOT_IN_FONT.它也将返回任何Win32错误(使用HRESULT_FROM_WIN32宏转换一个HRESULT),例如那些因缺少内存或GDI使用hdc调用.
If SSA_FALLBACK was not specified, or if a standard fallback font is missing, it returns USP_E_SCRIPT_NOT_IN_FONT. It will also return any Win32 error (converted to an HRESULT by the HRESULT_FROM_WIN32 macro), such as those from lack of memory or GDI calls using the
hdc.
如果任何其它不可校正的错误出现,它也返回一个HRESULT值,举个例子,错误从Win32 API函数返回的错误使用HRESULT_FROM_WIN32宏转换并在HRESULT中返回给客户。
If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
The
ScriptStringAnalyse function is the first step in handling plain text strings. Plain text is a string that has only one font, one style, one size, one color, and so forth.
ScriptStringAnalyse allocates temporary buffers for item analyzes, glyphs, advance widths, and so forth. Then it automatically runs
ScriptItemize,
ScriptShape,
ScriptPlace, and
ScriptBreak. The results are then available through all the other
ScriptString functions.
ScriptStringAnalyse函数是处理纯文本字符串中的第一步。纯文本是一个字符串,在这个字符串中的字符有唯一一种字体、一种风格、一种尺寸、一种颜色等等。ScriptStringAnalyse为条目分析、字形、向前宽度等等分配临时缓冲区,然后它自动的运行ScriptItemsize、ScriptShape、ScriptPlace和ScriptBreak。之后的结果可用于所有其它的ScriptString函数.
Although the functionality of
ScriptStringAnalyse can be implemented by direct calls to other functions,
ScriptStringAnalyse drastically reduces the amount of code required in the application for plain text handling.
虽然ScriptStringAnalyse的性能可能通过直接调用其它函数实现,ScriptStringAnalyse彻底的减小在应用程序中纯文本处理所需的代码数量。
Note that the
uBidiLevel member in the initial
SCRIPT_STATE value is ignored—the
uBidiLevel that is used is derived from the SSA_RTL flag in combination with the layout of the
hdc.
注意:在初始的SCRIPT_STATE值中的uBidiLevel成员被忽略—uBidiLevel它源自于与hdc的布局相结合的的SSA_RTL标志。
Windows NT/2000: Requires Windows 2000.
Header: Declared in Usp10.h.
Library: Use Usp10.lib.
See Also
Uniscribe Overview, Uniscribe Functions,
ExtTextOut,
GetCharacterPlacement,
ScriptBreak,
ScriptItemize,
ScriptPlace,
ScriptShape,
SCRIPT_CONTROL,
SCRIPT_STATE,
SCRIPT_STRING_ANALYSIS,
SCRIPT_TABDEF
ScriptStringOut
The
ScriptStringOut function displays a string generated by a prior call to
ScriptStringAnalyse and optionally adds highlighting.
经先前调用ScriptStringAnalyse后ScriptStringOut函数显示一个字符串并可选的增加高亮.
HRESULT WINAPI ScriptStringOut(
SCRIPT_STRING_ANALYSIS ssa,
int iX,
int iY,
UINT uOptions,
const RECT *prc,
int iMinSel,
int iMaxSel,
BOOL fDisabled
);
Parameters
ssa
[in] A
SCRIPT_STRING_ANALYSIS structure for the string.
一个用于字符串的SCRIPT_STRING_ANALYSIS结构指针
iX
[in] Specifies the x-coordinate of the reference point used to position the string.
指定用于定位字符串基准点的X坐标
iY
[in] Specifies the y-coordinate of the reference point used to position the string.
指定用于定位字符串基准点的Y坐标
uOptions
[in] Specifies how to use the application-defined rectangle. This parameter can be a combination of the following values.
Value
|
Meaning
|
ETO_CLIPPED
|
文本将被夹在矩形内The text will be clipped to the rectangle.
|
ETO_OPAQUE
|
当前底色应当被使用以填充矩形.
The current background color should be used to fill the rectangle.
|
prc
[in] Pointer to a
RECT structure that defines a clipping rectangle. The
puOptions parameter must be set to ETO_CLIPPED.
一个RECT结构指针,定义了一个夹子区域,pwOPtions参数必须设置为ETO_CLIPPED.
iMinSel
[in] Starting position in the string. For no selection, set
iMinSel >= iMaxSel.
字符串的起始位置,或没有选择,则设置iMinSet >= iMaxSel.
iMaxSel
[in] Ending position in the string.
字符串中的结尾位置
fDisabled
[in] If this parameter is TRUE, the system applies disabled-text highlighting by setting the background color to COLOR_HIGHLIGHT behind all selected characters.
如果这个参数是TRUE,系统通过在所有选择的字符后面设置底色为COLOR_HIGHLIGHT应用无效文字高亮。
If this parameter is FALSE, the system applies enabled-text highlighting by setting the background color to COLOR_HIGHLIGHT and the text color to COLOR_HIGHLIGHTTEXT for each selected character.
如果这个参数是FALSE,系统通过设置底色为COLOR_HIGHLIHGT应用有效文本高亮,并且对每个选择的字符的文本颜色设置为COLOR_HIGHLIGHTTEXT.
Return Values
If the function is successful, it returns S_OK.
If the function fails, it returns another HRESULT value. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
The return value can be tested with the SUCCEEDED and FAILED macros.
Remarks
这个函数要求在最初ScriptStringAnalyse的调用中SSA_GLYPHS是必需的。
This function requires that SSA_GLYPHS was requested in the original
ScriptStringAnalyse call.
EnumFontFamiliesEx
The
EnumFontFamiliesEx function enumerates all fonts in the system that match the font characteristics specified by the
LOGFONT structure.
EnumFontFamiliesEx enumerates fonts based on typeface name, character set, or both.
EnumFontFamiliesEx函数枚举系统中所有匹配由LOGFONT结构指定的字体特性的字体.EnumFontFamiliesEx枚举字体基于字体名、字符集或基于这两个。
int EnumFontFamiliesEx(
HDC hdc, // DC句柄
LPLOGFONT lpLogfont, // 字体信息
FONTENUMPROC lpEnumFontFamExProc, // 回调函数
LPARAM lParam, // 附加数据
DWORD dwFlags //不使用,必须是0
);
Parameters
hdc
[in] 设备环境句柄.
lpLogfont
[in] 指向一个包含了要枚举的关于字体的信息的LOGFONT结构体的指针,这个函数检查下列成员.
Pointer to a
LOGFONT structure that contains information about the fonts to enumerate. The function examines the following members.
成员
|
描述
|
lfCharset
|
如果设置为DEFAULT_CHARSET,函数枚举有所有字符集中所有字体。如果设置一个有效的字符集值,函数只枚举在指定字符集中的字体。
|
lfFaceName
|
如果设置一个空字符串,函数在每个可用的字体名称中枚举一种字体。如果设置为一个有效的字体名称,函数枚举所有带指定名称的字体。
If set to an empty string, the function enumerates one font in each available typeface name. If set to a valid typeface name, the function enumerates all fonts with the specified name.
|
lfPitchAndFamily
|
Must be set to zero for all language versions of the operating system.
对于操作系统的所有语言版本必须被设置为0。
|
lpEnumFontFamExProc
[in] Pointer to the application defined–callback function. For more information, see the
EnumFontFamExProc function.
指向一个应用程序定义的回调函数。
lParam
[in] Specifies an application–defined value. The function passes this value to the callback function along with font information.
指定一个应用程序定义的值,函数传递这个值连同字体信息到回调函数
dwFlags
This parameter is not used and must be zero.
这个参数不使用必须是0。
Return Values
The return value is the last value returned by the callback function. This value depends on which font families are available for the specified device.
返回值是由回调函数返回的最后的值。这个值取决于对于指定的设置环境哪个字体家族是可用的。
Remarks
The
EnumFontFamiliesEx function does not use tagged typeface names to identify character sets. Instead, it always passes the correct typeface name and a separate character set value to the callback function. The function enumerates fonts based on the the values of the
lfCharset and
lfFacename members in the
LOGFONT structure.
EnumFontFamiliesEx函数不使用特征字体名称以标识字符集,而是,它始终传递一个正确的字体名称和一个分隔字符对回调函数设置值.函数枚举基于LOGFONT的lfCharset成员和lfFacename成员的值的字体。
As with
EnumFontFamilies,
EnumFontFamiliesEx enumerates all font styles. Not all styles of a font cover the same character sets. For example, Fontorama Bold might contain ANSI, Greek, and Cyrillic characters, but Fontorama Italic might contain only ANSI characters. For this reason, it's best not to assume that a specified font covers a specific character set, even if it is the ANSI character set. The following table shows the results of various combinations of values for
lfCharSet and
lfFaceName.
与EnumFontFamilies一样,EnumFontFamiliesEx枚举所有的字体风格,不是一种字体所有的风格覆盖同一个字符集。举个例子,Font orama Bold 可以包含ANSI、希腊、西尔里字母,但Font orama Italic可以只包含ANSI字符。因为这个原因,最好是不假定一个指定的字体适用一个指定的字符集,即使这旨一个ANSI字符集。下面的表展示了lfCharSet和lfFacename不同组合的结果。
Values
|
Meaning
|
lfCharSet = DEFAULT_CHARSET
lfFaceName = '/0'
|
Enumerates all fonts in all character sets.
在所有字符集中枚举所有字体.
|
lfCharSet = DEFAULT_CHARSET
lfFaceName = a specific font
|
Enumerates all character sets and styles in a specific font.
在一种指定字体中枚举所有字符集和风格
|
lfCharSet =a specific character set
lfFaceName = '/0'
|
Enumerates all styles of all fonts in the specific character set.
在指定字符集中枚举所有字体的所有风格
|
lfCharSet =a specific character set
lfFaceName = a specific font
|
Enumerates all styles of a font in a specific character set.在指定字符集中枚举一种字体的所有风格
|
下面的代码例子展示了这些值如何被使用:
//to enumerate all styles and charsets of all fonts:
枚举所有字体的风格和字符集.
lf.lfFaceName[0] = '/0';
lf.lfCharSet = DEFAULT_CHARSET;
//to enumerate all styles and character sets of the Arial font:
枚举Arial 字体的所有风格和(所有)字符集。
lstrcpy( (LPSTR)&lf.lfFaceName, "Arial" );
lf.lfCharSet = DEFAULT_CHARSET;
//to enumerate all styles of all fonts for the ANSI character set
针对ANSI字符集枚举所有字体的所有风格.
lf.lfFaceName[0] = '/0';
lf.lfCharSet = ANSI_CHARSET;
//to enumerate all styles of Arial font that cover the ANSI charset
枚举适用于ANSI字符集的Arial字体的所有风格。
lstrcpy( (LPSTR)&lf.lfFaceName, "Arial" );
lf.lfCharSet = ANSI_CHARSET;
The callback functions for
EnumFontFamilies and
EnumFontFamiliesEx are very similar. The main difference is that the
ENUMLOGFONTEX structure includes a script field.
用于EnumFontFamilies和EnumFontFamiliesEx的回调函数是非常相似的,主要的区别是ENUMLOGFONTEX结构体包含一个script字段。
Note, based on the values of
lfCharSet and
lfFaceName,
EnumFontFamiliesEx will enumerate the same font as many times as there are distinct character sets in the font. This can create an extensive list of fonts which can be burdensome to a user. For example, the Century Schoolbook font can appear for the Baltic, Western, Greek, Turkish, and Cyrillic character sets. To avoid this, an application should filter the list of fonts.
注意,基于lfCharSet和lfFacename的值,EnumFontFamiliesEx将枚举同一字体和字体中截然不同的字符集一样多。这个可能创建一个大范围的字体表它能够传递给一个用户,举个例子,Century Schoolbook 字体可能出现用于波罗的语、希腊语、土耳其语字符集,为了避免这个,一个应用程序应当筛选字体列表。
The fonts for many East Asian languages have two typeface names: an English name and a localized name.
EnumFonts,
EnumFontFamilies, and
EnumFontFamiliesEx return the English typeface name if the system locale does not match the language of the font.
为许多东亚语言字体有两个字体名称:一个英语名称和一个区域名称,如果系统如果系统Locale不匹配字体语言EnumFonts,EnumFontFamilies和EnumFontFamiliesEx返回英语字体名称。
ScriptItemize
The
ScriptItemize function breaks a Unicode string into individually shapeable items.
ScriptItemize函数中断一个Unicode字符串为一个个单独的可成形的items.
HRESULT WINAPI ScriptItemize(
const WCHAR *pwcInChars,
int cInChars,
int cMaxItems,
const SCRIPT_CONTROL *psControl,
const SCRIPT_STATE *psState,
SCRIPT_ITEM *pItems,
int *pcItems
);
Parameters
pwcInChars
[in] Pointer to a Unicode string to be itemized.
一个要被itemized(分类)的Unicode字符串指针
cInChars
[in] Number of characters in
pwcInChars to be itemized.
在pwcInChars中要被分类的字符数量
cMaxItems
[in] Maximum number of
SCRIPT_ITEMstructures to process.
要处理的SCRIPT_ITEM结构体的最大数量
psControl
[in] Pointer to a
SCRIPT_CONTROL structure containing flags indicating the type of itemization to be performed. Use NULL if this is not needed.
一个SCRIPT_CONTROL结构指针,包含itemization执行之后的标志类型的标记。如果不需要这个参数可使用NULL。
psState
[in] Pointer to a
SCRIPT_STATE structure indicating the initial bidirectional algorithm state. Use NULL if this is not needed.
一个SCRIPT_STATE结构指针,标志初始的双向算法状态,如果不需要这个参数可使用NULL。
pItems
[out] Pointer to a buffer to receive the
SCRIPT_ITEM structures processed. The buffer pointed to by
pItems should be
cMaxItems * sizeof(
SCRIPT_ITEM) bytes in length.
一个接收己处理的SCRIPT_ITEM结构的缓冲区指针,由pItems指向的缓冲区的大小应当是cMaxItem * sizeof(SCRIPT_ITEM)个字节。
pcItems
[out] Pointer to a variable to receive the number of
SCRIPT_ITEM structures processed.
一个变量指针,用于接收处理的SCRIPT_ITEM结构数量。
Return Values
If the function succeeds, the return value is zero.
如果函数成功返回值是0。
If the function fails, it returns a nonzero value. The function returns E_INVALIDARG if
pwcInChars is NULL or
cInChars is 0 or
pItems is NULL or
cMaxItems < 2.
如果函数失败,它返回一个非零值。如果pwcInchars是NULL或cInChars是0或pItems是NULL或cMaxItems<2函数返回E_INVALIDARG
The function returns E_OUTOFMEMORY if the output buffer length (
cMaxItems) is insufficient. Note that in this case, as in all error cases, no items have been fully processed—so no part of the output array contains defined values.
如果输出缓冲区长度(
pcItems )是不够的函数返回E_OUTOFMEMORY,注意倘若这样,与所有的错误一样,没有items被完全处理—所以没有输出数组的部分包含定义的值。
If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
Items are delimited by either a change of shaping engine or a change of direction.
通过一个图形引擎或者一个方向的改变划分来划分item。
The client may create multiple runs from each
SCRIPT_ITEM returned by
ScriptItemize, but should not combine multiple items into a single run. The reason for this is that later the client will call
ScriptShape for each run (when measuring or rendering), and must pass the
SCRIPT_ANALYSISstructure that
ScriptItemize returned. Each
SCRIPT_ITEM contains a
SCRIPT_ANALYSIS structure.
用户可以用从ScriptItemize返回的每个SCRIPT_ITEM创建多重runs,但不应当结合多重items为一个单一的run.这个原因是随后用户将为每个run(当测量或绘制时)调用ScriptShape,并且必须传递ScriptItemize返回的SCRIPT_ANALYSIS结构,每个SCRIPT_ITEM包含一个SCRIPT_ANALYSIS结构。
If
psControl and
psState are NULL on entry,
ScriptItemize breaks the Unicode string purely by character code. If the parameters are all non-NULL,
ScriptItemize performs a full Unicode bidirectional analysis.
如果psControl和psState输入时是NULL,ScriptItemsize纯粹的通过字符编码中断Unicode字符串。如果参数都不为NULL,ScriptItemize执行一个完整Unicode 双向分析。
The
ScriptItemize function always adds a terminal item to the item analysis array (
pItems) such that the length of an item at
pItem is always available as (in the case of one item):
ScriptItemize函数始终增加一个终止item到item分析数组(pItems).这样一个Item在pItem这个指针的长度始终可用(在一个item的情况下:)
pItem[1].iCharPos – pItem[0].iCharPos
因为这个原因,以少于两个SCRIPT_ITEM结构体的缓冲区调用ScriptItemize是无效的。
执行一个正确的Unicode 双向分析,SCRIPT_STATE结构应当依照在段落开始点的读取顺序被初始化,并且ScriptItemize应当被传递给整个段落。
To perform a correct Unicode bidirectional analysis, the
SCRIPT_STATE structure should be initialized according to the reading order at paragraph start, and
ScriptItemize should be passed the whole paragraph.
The bidirectional stack is not large, just 16 bytes. It should be shared between calls.
双向堆栈是不大的。仅16个字节,它应当在调用之间被共享。
The
fRTL member of
SCRIPT_ANALYSIS (referenced in
SCRIPT_ITEM) and the
fNumeric member of
SCRIPT_PROPERTIES (which is returned by
ScriptGetProperties) together provide the same classification as the
lpClass member of
GCP_RESULTS that is referenced by
lpResults in
GetCharacterPlacement.
SCRIPT_ANALISYS(在SCRIPT_ITEM中引用)的 fRTL成员和SCRIPT_PROPERTIES(由ScriptGetProperties返回)的fNumeric成员共同提供了与GCP_RESULTS的lpClass成员相同的分类。GCP_RESULTS由lpResults在GetCharacterPlacement中被引用。
If shaping is disabled (
fDisableGlyphShape in
SCRIPT_STATE), complex scripts are substituted by SCRIPT_UNDEFINED, causing shaping to be performed with contextual substitution following the one-to-one code point to glyph mapping provided by the fonts cmap table. The rendering direction is still set appropriately.
如果整形被禁止(fDisableGlyphShape 在 SCRIPT_STATE中),复杂文本由SCRIPT_UNDEFINED取代,导致整形以上下文代替随后的由cmap表提供的一个到一个对字形编码点映射的执行。绘制方向仍然被适当的设置。
European digits U+0030 through U+0039 may be rendered as national digits as shown in the following table.
欧洲数字U+0030到U+0039可以被绘制作为标准数字。在下面列出:
fDigitSubstitute
|
FContextDigits
|
Digit shapes displayed for Unicode U+0030 through U+0039
|
False
|
Any
|
Western (European / American) digits
西文(欧洲/美国)数字
|
True
|
False
|
As specified in
SCRIPT_CONTROL.uDefaultLanguage.
在SCRIPT_CONTROL的uDefaultLanguage中指定。
|
True
|
True
|
As prior strong text, defaulting to
SCRIPT_CONTROL.uDefaultLanguage.
|
Note that in context digit mode, any digits encountered before the first letters are rendered in
SCRIPT_CONTROL.uDefaultLanguage if that script is in the same direction as the output, and in Arabic-Indic, that is, Western, digits if the direction is opposite. For example if
SCRIPT_CONTROL.uDefaultLanguage is LANG_ARABIC, initial digits will be in Arabic-Indic in a RTL embedding, but in Western, which is also known as Arabic, in a LTR embedding.
Effect of Unicode control characters on
SCRIPT_STATE.
SCRIPT_STATE flag
|
Set by
|
Cleared by
|
fDigitSubstitute
|
NADS
|
NODS
|
fInhibitSymSwap
|
ISS
|
ASS
|
fCharShape
|
AAFS
|
IAFS
|
Note: The Unicode control characters are defined in the following table. For more information, see the Unicode Standard.
Unicode Control Characters
|
Meaning
|
NADS
|
Overrides Western digits (NODS) with national digit shapes.
|
NODS
|
Nominal digit shapes, otherwise known as Western digits. See NADS.
|
ASS
|
Activates swapping of symmetric pairs (for example, parentheses). For these characters,
left and
right are interpreted as
opening and
closing. This is the default. See ISS.
|
ISS
|
Inhibits swapping of symmetric pairs. See ASS.
|
AAFS
|
Activates Arabic form shaping, that is, ligatures or cursive connections, for Arabic presentation forms. See IAFS.
|
IAFS
|
Inhibits Arabic form shaping, that is ligatures and cursive connections, for Arabic presentation forms. Nominal Arabic characters are not affected. This is the default. See AAFS.
|
SCRIPT_STATE.
fArabicNumContext controls the Unicode EN-AN rule. At the beginning of a paragraph it should normally be initialized to TRUE for an Arabic locale, FALSE for any other. The
ScriptItemize function will update it as it processes strong text.
SCRIPT_ANALYSIS
The
SCRIPT_ANALYSIS structure describes an item, that is, a portion of a Unicode string. This structure is filled by
ScriptItemize, which breaks a Unicode string into individually shapeable items. The structure also includes a copy of the Unicode algorithm state (
SCRIPT_STATE).
SCRIPT_ANSLYSIS结构描述一个item,更确切的说,一个Unicode字符串的一部分。这个结构通过ScriptItemize填充,它中断一个Unicode字符串为一个个单独的可成形(图形化的)items.这个结构也包括一个Unicode 算法状态(SCRIPT_STATE)的拷贝。
typedef struct tag_SCRIPT_ANALYSIS {
WORD eScript :10;
WORD fRTL :1;
WORD fLayoutRTL :1;
WORD fLinkBefore :1;
WORD fLinkAfter :1;
WORD fLogicalOrder :1;
WORD fNoGlyphIndex :1;
SCRIPT_STATE s ;
} SCRIPT_ANALYSIS;
成员:
eScript
Opaque value identifying which engine Uniscribe will use when calling the
ScriptShape,
ScriptPlace, and
ScriptTextOut functions for this item. The value of
eScript is undefined and will change in future releases, but attributes of
eScript may be obtained by calling
ScriptGetProperties.
To disable shaping, set this parameter to SCRIPT_UNDEFINED.
不透明的值,当用户调用ScriptShape、ScriptPlace和ScriptTextOut时用于标识Uniscribe将对这个item使用哪个引擎 .eScript的值是未定义的并且在将来的版本中将改变,但eScript的属性可以通过调用ScriptGetProperties获得。
禁用图形化,设置这个参数为SCRIPT_UNDEFINED.
fRTL
Rendering direction. Normally identical to the parity of the Unicode embedding level, but it may differ if overridden by
GetCharacterPlacement legacy support.
绘制方向,通常和Unicode嵌入层的奇偶性一致,但如果以GetCharacterPlacement遗传支持超过它可以不同。
fLayoutRTL
Logical direction, whether left-to-right or right-to-left. Although this is usually the same as
fRTL, for a number in a RTL run,
fRTL is False (because digits are always displayed LTR), but
fLayoutRTL is True (because the number is read as part of the RTL sequence).
逻辑方向,是从左到右还是从右到左,虽然这通常是与fRTL一样的,对于一个在一个RTL run中的数字。fRTL是False(因为数字始终是从左到右显示)。但fLayoutRTL是TRUE(因为数字是作为一个从右到左的部分序列部分而读取)。
fLinkBefore
If set, the shaping engine will shape the first character of this item as if it were joining with a previous character. Set by
ScriptItemize, it may be overridden before calling
ScriptShape.
如果设置,图形引擎将图形化这个item的第一个字符好象它结合先前的一个字符,通过ScriptItemize设置,在调用ScriptShape之前它可以被覆盖。
fLinkAfter
If set, the shaping engine will shape the last character of this item as if it were joining with a subsequent character. Set by
ScriptItemize, it may be overridden before calling
ScriptShape.
如果设置,图形化引擎将图形化这个item最后的字符,好象它与一个后续字符结合。通过ScriptItemze设置,在调用ScriptShap之前它可以被覆盖。
fLogicalOrder
If set, the shaping engine will generate all glyph-related arrays in logical order. By default, glyph-related arrays are in visual order, the first array entry corresponding to the leftmost glyph. Set to FALSE by
ScriptItemize, it may be overridden before calling
ScriptShape.
如果设置,图形化引擎将以逻辑顺序产生所有字形相关的数组。默认时,字形相关数组是以可视顺序,第一个数组条目对应最左边的字形。通过ScriptItemize设置这个值为FALSE,在调用ScriptShape之前它可以被覆盖。
fNoGlyphIndex
Typically FALSE. Set to TRUE for bitmap, vector, and device fonts; and for Asian scripts. It may be set to TRUE on input to
ScriptShape to disable use of glyphs for this item. Additionally,
ScriptShape will set it to TRUE for an
hdc containing symbolic, unrecognized, and device fonts.
典型的为FALSE , 设置为TURE用于位图、vector和设备字体、亚洲文本。它可以被设置为TRUE,
Disabling glyphing disables complex script shaping. When set, shaping and placing for this item is implemented directly by calls to
GetTextExtentExPoint and
ExtTextOut.
s
A
SCRIPT_STATE structure.
Requirements
Windows NT/2000: Requires Windows 2000.
Header: Declared in Usp10.h.
See Also
Uniscribe Overview, Uniscribe Structures,
ExtTextOut,
GetCharacterPlacement,
GetTextExtentExPoint,
ScriptGetProperties,
ScriptItemize,
ScriptPlace,
ScriptShape,
ScriptTextOut,
SCRIPT_STATE
SCRIPT_STATE
The
SCRIPT_STATE structure is used both to initialize the Unicode algorithm state as an input parameter to
ScriptItemize, and is also a component of the analysis returned by
ScriptItemize.
SCRIPT_STATE结构被用于作为ScriptItemize函数的一个输入参数以初始化Unicode算法状态。也用于通过ScriptItemize函数返回的分析结果(SCRIPT_ANALYSIS)的一个组成部分。
typedef struct tag_SCRIPT_STATE {
WORD uBidiLevel :5;
WORD fOverrideDirection :1;
WORD fInhibitSymSwap :1;
WORD fCharShape :1;
WORD fDigitSubstitute :1;
WORD fInhibitLigate :1;
WORD fDisplayZWG :1;
WORD fArabicNumContext :1;
WORD fGcpClusters :1;
WORD fReserved :1;
WORD fEngineReserved :2;
} SCRIPT_STATE;
Members
uBidiLevel
The embedding level associated with all characters in this run according to the Unicode bidirectional algorithm. When passed to
ScriptItemize, should be initialized to zero for an LTR base embedding level, or 1 for RTL.
嵌入级别以Unicode双向算法关联这个run中的所有字符。当我们传递给ScriptItemize时,应当初始化它为零以用于一个LTR基本嵌入级别,或初始化它为1以用于RTL。
fOverrideDirection
TRUE if this level is an override level (LRO/RLO). In an override level, characters are laid out in one direction only, either left-to-right or right-to-left. No reordering of digits or strong characters of opposing direction takes place. Note that this initial value is reset by LRE, RLE, LRO or RLO codes in the string.
fInhibitSymSwap
TRUE if the shaping engine is to bypass mirroring of Unicode mirrored glyphs such as brackets. Set by Unicode character ISS, cleared by ASS.
fCharShape
TRUE if character codes in the Arabic Presentation Forms areas of Unicode should be shaped. (Not implemented).
fDigitSubstitute
TRUE if character codes U+0030 through U+0039 (European digits) are to be substituted by national digits. Set by Unicode NADS, cleared by NODS.
fInhibitLigate
TRUE if ligatures are not to be used in the shaping of Arabic or Hebrew characters.
fDisplayZWG
TRUE if control characters are to be shaped as representational glyphs. (Typically, control characters are shaped to the blank glyph and given a width of zero).
fArabicNumContext
TRUE indicates prior strong characters were Arabic for the purposes of rule P0 as discussed in
The Unicode Standard, version 2.0. This should normally be set to TRUE before itemizing an RTL paragraph in an Arabic language, and FALSE otherwise.
fGcpClusters
For
GetCharacterPlacement legacy support only. Initialize to TRUE to request
ScriptShape to generate the
pwLogClust array the same way as
GetCharacterPlacement does in Arabic and Hebrew Windows 95. Affects only Arabic and Hebrew items.
fReserved
Reserved. Always initialize to zero.
fEngineReserved
Reserved. Always initialize to zero.
SCRIPT_ITEM
The
SCRIPT_ITEM structure includes a
SCRIPT_ANALYSISwith the string offset of the first character of the item.
SCRIPT_ITEM结构包含一个SCRIPT_ANALYSIS
typedef struct tag_SCRIPT_ITEM {
int iCharPos;
SCRIPT_ANALYSIS a;
} SCRIPT_ITEM;
Members
iCharPos
Specifies the offset from the beginning of the itemized string to the first character of this item, counted in Unicode code points, that is, in words.
指定从要分条列举的字符串开始到这个条目的首字符的偏移量
a
Specifies a
SCRIPT_ANALYSIS structure containing analysis specific to this item.
指定一个SCRIPT_ANALYSIS结构包含对这个条目的详细分析。
SCRIPT_VISATTR
The
SCRIPT_VISATTR structure contains the visual (glyph) attribute buffer generated by
ScriptShape that identifies clusters and justification points.
SCRIPT_VISATTR结构包含了由ScriptShape生成的用于标识簇(cluster)和对齐点的可视(字形)属性缓冲
typedef struct tag_SCRIPT_VISATTR {
WORD uJustification :4;
WORD fClusterStart :1;
WORD fDiacritic :1;
WORD fZeroWidth :1;
WORD fReserved :1;
WORD fShapeReserved :8;
} SCRIPT_VISATTR;
Members
uJustification
Justification class for this glyph. See
SCRIPT_JUSTIFY.
用于字形的对齐分类.
fClusterStart
Set for the logical first glyph in every cluster, even for clusters containing just one glyph.
设置在每个簇中的逻辑首字形,
fDiacritic
Set for glyphs that combine with base characters.
设置用于结合基本字符字形
fZeroWidth
Set by the shaping engine for some, but not all, zero-width characters, such as ZWJ and ZWNJ.
由图形引擎设置,但不全是,零-宽度字符,如ZWJ和ZWNJ.
fReserved
保留字必须是0
fShapeReserved
保留字由图形引擎使用。
GOFFSET
The
GOFFSET structure contains the x and y offsets of the combining glyph.
GOFFSET结构包含了组合字形的x和y坐标
typedef struct tagGOFFSET
{
LONG du;
LONG dv;
} GOFFSET;
Members
du
x offset, in logical units, for the combining glyph.
x偏移,逻辑单位,用于组合字形
dv
y offset, in logical units, for the combining glyph.
y偏移,逻辑单位,用于组合字形
ScriptShape
The
ScriptShape function takes a Unicode run and generates glyphs and visual attributes.
ScriptShape函数用一个Unicode run并产生字形和可视属性
HRESULT WINAPI ScriptShape(
HDC hdc,
SCRIPT_CACHE *psc,
const WCHAR *pwcChars,
int cChars,
int cMaxGlyphs,
SCRIPT_ANALYSIS *psa,
WORD *pwOutGlyphs,
WORD *pwLogClust,
SCRIPT_VISATTR *psva,
int *pcGlyphs
);
Parameters
hdc
[in] 设备环境句柄.
psc
[in/out] 一个
SCRIPT_CACHE结构指针
pwcChars
[in] 一个包含run的Unicede字符串指针.
cChars
[in] 在Unicode run中的字符数量.
cMaxGlyphs
[in] Maximum number of glyphs to generate. 生成字形的最大数.
psa
[in/out] Pointer to the
SCRIPT_ANALYSISstructure for the run, containing the results from an earlier call to
ScriptItemize.
一个关于run的SCRIPT_ANALYSIS结构体指针,包含早期调用ScriptItemize而获得的结果.
pwOutGlyphs
[out] 接收字形的数组指针.
pwLogClust
[out] Pointer to an array that receives the logical cluster information. Each element in *
pwLogClust corresponds to a character in
*pwcChars; the value of each element is the offset from the first glyph in the run to the first glyph in the cluster containing the corresponding character. For an example, see Remarks. Note, when *
psa.fRTL is TRUE, the elements in
*pwLogClust decrease when reading the array.
一个接收逻辑簇信息的数组指针。在*pwLogClust中的每个元素对应*pwcChars中的一个字符。每个元素的值是从在run中首个字形到包含相应字符的簇中第一个字形的偏移量。当*
psa.fRTL为TRUE时,当读取数组时在*pwLogClust中的元素递减。
psva
[out] Pointer to an array of
SCRIPT_VISATTRstructures that receives visual attributes information. There is one visual attribute per glyph, so *
psva has
cMaxGlyphs elements.
一个SCRIPT_VISATTR结构体数组的指针,用以接收可视属性信息。每个字形有一个可视属性,因此*
psva有
cMaxGlyphs个元素.
pcGlyphs
[out] Pointer to an integer that receives a count of the number of glyphs written to
pwOutGlyphs.
一个整数指针,用以接收一个写入
pwOutGlyphs中的字形编号(数量)计数
Return Values
如果函数成功,返回值是非零的.
If the function fails, it returns a nonzero value. The function returns E_OUTOFMEMORY if the output buffer length,
cMaxGlyphs, is insufficient. Note that in all error cases the content of all output parameters are undefined.
如果函数失败,它返回一个非零值.如果输出缓冲区长度
cMaxGlyphs不够用函数返回E_OUTOFMEMORY,注意在所有的出错情况下输出参数是未定义的。
If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
如果其它不可校正的错误被遇到,它也返回HRESULT值。举个例子,从Win32 API函数返回的错误使用HRESULT_FROM_WIN32宏被转换为HRESULT,并在HRESULT中返回给用户.
Remarks
If
ScriptShape returns E_OUTOFMEMORY,
pcGlyphs is undefined; you may need to call
ScriptShape repeatedly until a large enough buffer is found. The number of glyphs generated by a code point varies according to the script and the font. For a simple script, a Unicode code point may generate a single glyph. However, a complex script font might construct characters from components, and thus generate several times as many glyphs as characters. Also, there are special cases like invalid character representations, where extra glyphs are added to represent the invalid sequence. Therefore, a reasonable guess for the buffer pointed to by
pwOutGlyphs might be 1.5 times the length of the character buffer, plus an additional 16 glyphs for rare cases like invalid sequence representation.
如果ScriptShape返回E_OUTOFMEMORY,pcGlyphs是未定义的;你可能需要不断(重复)的调用ScriptShape直到一个足够大的缓冲区被发现.由一个编码点生成的字形数量按照文字和字体改变。对于一个简单的文字,一个Unicode编码点可能产生一个单一字形。然而,一个复杂文字字体可能用组件构成字符。并且因此产生几倍于字符的字形。同样特殊的例子像无效字符表示,在这儿附加字符被增加以表示无效序列.因此,一个合理的对于由pwOutGlyphs指向的缓冲区的推测是长度为字符缓冲区长度的1.5倍,加上一个附加的用于稀少情况(如无效的序列表示)的16个字形.
If
ScriptShape returns E_OUTOFMEMORY it will be necessary to call
ScriptShape again, and possibly more than once, until a large enough buffer is found.
如果ScriptShape返回E_OUTOFMEMORY它将必需再次调用ScriptShape,并且可能多于一次,直到一个足够大的缓冲区被发现。
Clusters are sequenced uniformly within the run, as are glyphs within the cluster. The
fRTL item flag (from
ScriptItemize) identifies whether sequencing is left-to-right, or right-to-left.
在run内部Clusters被均匀的排序, 然后作为custer内部的字形。fRTL条目标志(从ScriptItemize获得的)标识是否排序是从左到右,或从右到左。
ScriptShape may set the
fNoGlyphIndex flag in
psa if the font or operating system cannot support glyph indices.
如果字体或操作系统不支持字形目录ScriptShape可能设置在
psa中的fNoGlyphIndex标志。
To determine if a font supports the characters in a given string, call
ScriptShape. If it returns S_OK, check the output for missing glyphs.If
fLogicalOrder is requested in
psa, glyphs will be always be generated in the same order as the original Unicode characters. If
fLogicalOrder is not set, RTL items are generated in reverse order, so
ScriptTextOut does not need to reverse them before calling
ExtTextOut.
调用ScriptShape确定是否一个字体支持在给定字符串中的字符,如果它返回S_OK,为缺失字形检查输出 ,如果在
psa中需要fLogicalOrder,字形将始终被以最初Unicode字符相同的顺序生成。如果fLogicalOrder没有被设置。RTL条目被以相反顺序生成,因此ScriptTextOut在调用ExtTextOut之前不需要反序它们。
If
SCRIPT_ANALYSIS.eScript is set to SCRIPT_UNDEFINED, shaping is disabled and
ScriptShape displays whatever glyph is in the font cmap table; if none, it displays the missing glyph.
如果SCRIPT_ANALYSIS.eScript被设置为SCRIPT_UNDEFINED,图形化被禁止并且字形在字体cmap表中是什么ScriptShape就显示什么,如果cmap表中没有,它显示缺失字形(可能是方框)
The following example shows how
ScriptShape would generate the logical cluster array (
*pwLogClust) from a character array (*
pwcChars) and a glyph array (*
pwOutGlyphs). The run has four clusters:
- 1st cluster: one character represented by one glyph
- 2nd cluster: one character represented by three glyphs
- 3rd cluster: three characters represented by one glyph
- 4th cluster: two characters represented by three glyphs
下面的例子展示了ScriptShape如何用一个字符数组(*
pwcChars)生成逻辑cluster数组(*
pwLogClust)和一个字形数组(*
pwOutGlyphs),有四个clusters返回:
1. 第一个cluster:由一个字形代表的一个字符。
2. 第二个cluster:由三个字形代表的一个字符。
3. 第三个cluster:由一个字形代表的三个字符。
4. 第四个cluster:由三个字形代表的二个字符。
Character array (where c<n>u<m> means cluster n, Unicode codepoint m):
- | c1u1 | c2u1 | c3u1 c3u2 c3u3 | c4u1 c4u2 |
字符数组(在这儿 c<n>u<m> 的意思是 cluster n , Unicode 编码点 m):
l
Glyph array (where c<n>g<m> means cluster n, glyph m):
- | c1g1 | c2g1 c2g2 c2g3 | c3g1 | c4g1 c4g2 c4g3 |
Cluster array (the offset, in glyphs, to the cluster containing the character):
SCRIPT_VISATTR
The
SCRIPT_VISATTR structure contains the visual (glyph) attribute buffer generated by
ScriptShape that identifies clusters and justification points.
SCRIPT_VISATTR结构包含了由ScriptShape生成的可视(字形)属性缓冲区,用以标识clusters和对齐点。
typedef struct tag_SCRIPT_VISATTR
{
WORD uJustification :4;
WORD fClusterStart :1;
WORD fDiacritic :1;
WORD fZeroWidth :1;
WORD fReserved :1;
WORD fShapeReserved :8;
} SCRIPT_VISATTR;
Members
uJustification
Justification class for this glyph. See
SCRIPT_JUSTIFY.
对于这个字形的调整类型。
fClusterStart
Set for the logical first glyph in every cluster, even for clusters containing just one glyph.
设置在每个cluster中的逻辑首字形,即使clusters只包含一个字形。
fDiacritic
Set for glyphs that combine with base characters.
设置与基本字符联合的字形。
fZeroWidth
Set by the shaping engine for some, but not all, zero-width characters, such as ZWJ and ZWNJ.
由图形引擎设置用于一些,但不是所有,zero-width(宽度为零),如ZWJ和ZWNJ。
fReserved
Reserved. Always initialize to zero.
保留始终初始化为0。
fShapeReserved
Reserved for use by the shaping engines.
由图形化引擎为用户保留。
Design and Implementation of a Win32 Text Editor
Part 14 - Drawing styled text with Uniscribe
Download the UspLib Demo (40Kb)
Introduction
The last tutorial saw the completion of the
UspAnalyze
function, one of the main APIs of the new UspLib text-rendering engine. We will now switch our attention to the implementation of the
UspTextOut
function. Our goal is to divide up the glyph-lists we ceated in the last tutorial, and apply colour information prior to display with
ScriptTextOut
. The method we will use to identify which colour belongs to each glyph is the central theme of this tutorial.
Now, there is alot of very specific information in this tutorial related to Unscribe and you are only going to find it interesting if you have also been trying to understand how to draw styled text. So feel free to skip to the next tutorial if you want to see UspLib in action.
The image above shows another small utility I wrote whilst working with Uniscribe. The purpose of this app is to demonstrate (and test) the UspLib library. You can download the demo, and also the UspLib sourcecode, at the top of this article.
6. Drawing styled text
At this point we could quite simply call
ScriptTextOut
with a whole run of glyphs and be done with it. It would display correctly and we would have succeeded in our goal to display Unicode text. However the text would only be drawn in a single font and colour, and it would have been much simpler to use the
ScriptString
API instead! Remember, the entire reason we are looking at Uniscribe is because we need to apply font and colour information in a very fine-grained manner.
Back in part#10 of this series I proposed a new method for rendering text in Neatpad, using three separate passes. I have implemented this rendering scheme with the
UspTextOut
function:
void UspTextOut(USPDATA * uspData,
HDC hdc,
int xpos,
int ypos,
RECT * bounds
)
{
//
// 1. Draw all background colours, including selection-highlights;
// selected areas are added to the HDC clipping region which prevents
// step#2 (below) from drawing over them
//
PaintBackground(uspData, hdc, xpos, ypos, bounds);
//
// 2. Draw the text normally. Selected areas are left untouched
// because of the clipping-region created in step#1
//
SetBkMode(hdc, TRANSPARENT);
PaintForeground(uspData, hdc, xpos, ypos, bounds, FALSE);
//
// 3. Redraw the text using a single text-selection-colour (i.e. white)
// in the same position, directly over the top of the text drawn in step#2
// Before we do this, the HDC clipping-region is inverted,
// so only selection areas are modified this time
//
PaintForegound(uspData, hdc, xpos, ypos, bounds, TRUE);
}
UspTextOut
is quite similar to
ScriptStringOut
, in that it requires a string to be analyzed prior to display. It takes as input the
USPDATA
object that contains the information generated by
UspAnalyze
. Although there are three passes involved, there are only two functions that need to be implemented -
DrawBackground
, and
DrawForeground
which will be used to draw both regular (styled) as well as 'selected' text. We will take a look at the implementation of these functions a little further down.
Characters vs Glyphs vs Clusters
The major problem with Uniscribe is understanding how to decipher the results of the
ScriptShape
and
ScriptPlace
calls. There is just so much information returned about each run of text that it takes a fair amount of time and effort to understand it all. Hopefully by the end of this tutorial you will have a little more insight into how all of the Uniscribe functions hang together.
使用
Uniscribe
主要的问题是如何解释
ScriptShape
和
ScriptPlace
调用的结果。正是返回如此多关于每个文本的
run
的信息
……,
在这个指南的结尾你将对所有的
Uniscribe
函数是如何结合在一起的有一些较多的了解。
The key detail to understand about Uniscribe (and computer Typography in general) is the difference between characters and glyphs. Up until this point the main focus with Neatpad has been logical Unicode character sequences. However once Uniscribe has been involved the focus is very much on glyphs. The thing to understand here, is that there is no direct relationship between characters and glyphs.
解答详细对于理解关于
Uniscribe(
和通常的计算机排字版式
)
对于字符和字形之间是不同的。直到这个主要的焦点以
Neatpad
逻辑
Unicode
字符排序。然后一旦
Uniscribe
被焦点
For simple scripts such as English a font usually contains one glyph per Unicode character. However for more complex scripts this relationship can change. Sometimes a single Unicode character can result in more than one glyph. The opposite is also true - there can also be multiple Unicode characters resulting in just a single glyph. This behaviour various depending on what font is being used. This separation between characters and glyphs presents a problem because our attribute style-runs are all character based, and we somehow need to translate this styling information onto specific glyphs.
对于一个简单的文字如英文一种字体对于一个
Unicode
字符通常包含一个字形。然而对于更多复杂的文字这个关系可能改变。有时一个单一的
Unicode
字符能够产生多于一个字形。相反也成立
----
也可能多个
Unicode
字符只生成一个字形。这个不同的特性取决于被使用的字体。这个字符与字形之间的区别存在一个问题因为我们的属性风格
----runs
全部都是基于字符的。并且我们由于某种原因需要转换这个格式信息到具体字形。
To make things even more complicated, the concept of glyph clusters must be understood. A cluster is basically a grouping of glyphs which must be treated as a single selectable unit. Whilst this is not a problem in itself, it does make rendering glyph sequences a little more complicated because cluster boundaries must be respected.
使最后一行排足是错综复杂的
/
具体做时更是错综复杂,
glyph clusters
的概念必须被理解。一个
cluster
基本上是一个字形组,它必须被作为一个单一的可选择的单位。虽然这本身不是一个问题,
Understanding the Logical Cluster List
The logical-cluster list is key to establishing a relationship between characters and glyphs. This list is returned by
ScriptShape
in the
pwLogClust[]
array. It provides the mapping between logical character-positions and glyph-cluster positions. UspLib stores each run's logical-cluster information inside the
clusterList[]
field of the
USPDATA
object.
对于在字符和字形之间建立一个联系
logical-cluster
列表是关键。这个列表由
ScriptShaped
在
pwLogClust(
参数
)
中返回。它提供了逻辑
character-position
和
glyph-cluster
之间的映射。
To support this idea of character-to-glyph mapping, the logical-cluster list must represent two important concepts:
·
Firstly, it identifies the cluster boundaries in the original Unicode string - that is, the offsets in logical character units (WCHARs) of each cluster. Each entry in the
clusterList
corresponds exactly to a single character in the original string - so
clusterList
is always the same length as the Unicode string we are processing.
·
Secondly, this same array also identifies the offsets of each glyph-cluster, within the glyph-buffers generated by
ScriptShape
and
ScriptPlace
.
支持这种字符
-
到
-
字形的观念,
logical-cluster
列表必须表示两个重要的概念:
首先,在原始
Unicode
字符串中它标识了
cluster
界限,换句话说,每个
cluster
的以逻辑字符单位
(WCHARs)
的偏移。在
clusterList
中每个条目正好对应在原始字符串中的单一字符
—
这样
clusterList
始终是与我们要处理的
Unicode
字符串相同的长度。
第二,这个数组也标识了每个
glyph-cluster
在由
ScriptShape
和
ScriptPlace
生成的
glyph-buffer
内部的偏移。
In other words, the individual element values (the content) of the
clusterList
defines the glyph-clusters, whilst the positions of the array elements represents the clusters in logical character terms.
换句话说,
ClusterList
的单个元素值
(
内容
)
定义了
glyph-Clusters,
和代表
clusters
在逻辑字符项数组元素的位置
As an example we will use the same Arabic string "
يُساوِي
" we were looking at previously. This string of seven Unicode characters results in the following logical cluster information being generated by
ScriptShape
. Note that the logical array-index positions are listed across the top of the table.
作为一个例子我们将使用同一个我们先前看到的阿拉伯字符串
”
يُساوِي
”
。这个字符串有
7
个
Unicode
字符,导致下面的逻辑
cluster
信息由
SciprtShape
生成。注意逻辑数组下标位置被列出交叉表的顶部。
Array
|
[0]
|
[1]
|
[2]
|
[3]
|
[4]
|
[5]
|
[6]
|
WCHAR wszText[]
|
U+064A
|
U+064F
|
U+0633
|
U+0627
|
U+0648
|
U+0650
|
U+064A
|
WORD clusterList[]
|
6
|
6
|
4
|
3
|
2
|
2
|
0
|
Whole clusters are identified by grouping together any identical numbers in the logical-cluster list. As you can see from the cluster-list in the table above, there are two 6's and two 2's (in addition to the other singlular numbers), resulting in a total of five whole clusters all together. The image below illustrates this grouping concept.
整个
clusters
由集合标识,任何在
logical-cluster
中列表中相同的数字,你能够从上面的
cluster-list
表中看到。有两个
6
和两个
2(
除了其它单个数字
)
,产生一个整个所有为
5
的
clusters
。下面的图像救命了这个组的概念。
Notice that cluster-list is always stored in logical order, whilst the glyph-list is always in visual order. This means that for right-to-left scripts (such as the Arabic string above), the cluster-list elements will decrease when reading the array. As a result of this, the first glyph that must be drawn will be at the very end of the glyph-list. Bearing this in mind, the breakdown of the clusters is as follows:
注意
cluster-list
始终以逻辑顺序存储。同时
glyph-list
始终以可视顺序存储。这意味着对于从右到左的文本
(
如上面的阿拉伯字符串
)
,在读取数组时
cluster-list
元素将递减。作为结果,必须被绘制的首字形将是在
glyph-list
的最末尾
.
记住这点,
clusters
的分解在下面列出:
·
1st cluster: two characters represented by two glyphs.
·
2nd cluster: one character represented by one glyph.
·
3rd cluster: one character represented by one glyph.
·
4th cluster: two characters represented by two glyphs.
·
5th cluster: one character represented by one glyph.
·
第一:两个字形代表两个字符。
·
第二:一个字形代表一个字符
·
第三:一个字形代表一个字符
·
第四:两个字形代表两个字符
·
第五:一个字形代表一个字符
Hopefully it should be fairly obvious how the logical clusters were identified - with the number of WCHARs in each cluster calculated by the number of characters in each grouping. Calculating the number of glyphs in each cluster is less obvious. The key here is looking at the difference between the cluster values. This is how the identification of each cluster occurred:
逻辑
clusters
如何被标识是相当清楚的
----
由在每个组中的字符数目以在每个
cluster
中
WCHARs
的数目计算。在每个
cluster
中计算字形的数目不太清楚。这儿主要是看
cluster
值之间的不同。这是每个出现的
cluster
如何标识。
1.
The first two 6's identify cluster#1, comprising two WCHARs (in character-positions 0 and 1). This value of 6 points to the end of the
glyphList
which contains the glyphs for this cluster. We know that this cluster is represented by two glyphs (#5 and #6) because:
头两个
6
标识
1
号
cluster,
组成两个
WCHARS
字符
(
在字符中位置是
0
和
1)
。这个为
6
的值指向了包含对应于这个
cluster
字形的
glyphList
的末尾。我们知道这个
cluster
通过两个字形
(#5
和
#6)
表示。
2.
The next value in the cluster-list (4) tells us two things. Obviously this cluster starts at glyph #4 in the glyph-list. However this also means that there were 2 (two) glyphs in the last cluster (6-4 = 2).
下一个在
cluster-list
中的值
(4)
告诉我们两件事。很明显这个
cluster
开始于
glyph-list
中的字形
#4(4
号字形
)
,然而它也意味着在上一次
(
最前面
)
的
cluster
中有两个字形
(6-4=2).
3.
The third cluster is comprised of the single glyph#3, and a single WCHAR.
第三个
cluster
由一个单一的
3
号字形
(#3)
组成
,
对应一个单一的
WCHAR.
4.
The fourth cluster is comprised of two WCHARs again, which are represented by glyphs #1 and #2.
第四个
cluster
又是由两个
WCHARs
组成。它由字形
#1
和
#2
表示。
5.
The fifth and final cluster is again a single WCHAR, represented by glyph#0 in the glyph-list.
第五个也是最后一个
cluster
也是一个单一的
WCHAR,
由
glyph-list
中的
#0
字形表示。
As you can see the key detail here is looking at the difference between glyph-indices in order to count the number of glyphs in each cluster. Special consideration must also be taken with right-to-left scripts because of the way that glyphs are stored in reverse order. The way I handled this was to advance the x-coordinate to the end of the run, call
SetTextAlign(TA_RIGHT)
, and then move the output-location to the left each time, resulting in the glyphs being output in logical (right-to-left) order.
The important thing to understand is that we always follow the cluster-list in logical order, even for right-to-left scripts. We rely on the ordering of the element values to locate each glyph-cluster as it should be drawn.
你能够领会这一块的关键细节是观察在
glyph-indices
之间的不同以便计算在每个
cluster
中的字形数量。必须特殊考虑的是从右到左的文字,因为字形是以反序存储。我处理的方法是
x-
坐标向前直到
run
的末尾,调用
SetTextAlign(TA_RIGHT),
然后每次向左移动
output-location(
输出位置
)
,致使字形以逻辑顺序
(
从右到左
)
输出。
Another example
The previous example was of cause a right-to-left script and highlighted the unique way in which these scripts are represented by Uniscribe. The example shown next is based on the example in MSDN under the
ScriptShape
documentation, and highlights how complex left-to-right text is represented by Uniscribe.
先前的例子是一个从右到左文字并突出了在这些文字中以
Uniscribe
表示的唯一方法,在这种情况下唯一的方向以
Uniscirbe
来表现。展示的下一个例子是基于
MSDN
中
ScriptShape
文本中的例子。并突出复杂的从左到右文本是如何用
Uniscribe
来表示的。
U+920, U+911, U+915, U+94D, U+937, U+91D, U+949
The string this time is from the Devanagari script. I have no idea what it means because I just strung a sequence of code-points together which happened to have the right "look". If anyone can supply me with a Unicode phrase of 7 characters, which results in the glyph+cluster properties shown below, then please get in touch!
这次的这个字符串是来自于梵文文字。我不知道它意味着什么意思,因为我只排列一个
code-points(
编码点
)
序列同时产生可以看的图形。如果任何人能够提供给我一个
7
个字符的
Unicode
短语,它生成
glyph+cluster
特性在下面给出。于是请进入触觉。
Array
|
[0]
|
[1]
|
[2]
|
[3]
|
[4]
|
[5]
|
[6]
|
Unicode string
|
U+0920
|
U+0911
|
U+0915
|
U+094D
|
U+0937
|
U+091D
|
U+0949
|
clusterList[]
|
0
|
1
|
4
|
4
|
4
|
5
|
5
|
The key difference here is how the cluster-list elements increase when reading the array. For left-to-right runs, the glyphs are stored in the same order as the original Unicode characters. This is the ordering that many Western readers will find most natural.
这儿关键的区别是当读取数组时
cluster-list
元素如何递增。对于从左到右的
runs
,字形以与最初的
Unicode
字符相同的顺序存储。这是许多西文阅读器的顺序。
图表又形象的说明了逻辑字符与
glyph-clusters
之间的关系。这次用于一个从左到右文本的
run.
这个例子完全是虚构的,因为我不能找到任何对它所需的
glyph+cluster
感到满意的短语、字体与文字。再一次,每个
cluster
的字符数量通过
cluster-list
元素之间的不同被计算。
The diagram hopefully again illustrates the relationship between logical characters and glyph-clusters, this time for a left-to-right run of text. This example is purely fictitious as I couldn't find any phrase, font & script which satisfied the required glyph+cluster properties. Again, the number of glyphs per cluster is calculated by the difference between the cluster-list elements.
·
1st cluster: one character represented by one glyph (1-0=1)
·
2nd cluster: one character represented by three glyphs (4-1=3)
·
3rd cluster: three characters represented by one glyph (5-4=1)
·
4th cluster: two characters represented by three glyphs (8-5=3)
第一个cluster:一个字形表示一个
字符(1-0 = 1)
第二个
cluster
:
三个字形表示一个字符(4-1=3)
第三个cluster:一个字形表示三个字符(5-4=1)
第四个cluster:三个字形表示二个字符(5-5=3)
The number of glyphs for the last cluster was calculated because we knew how many glyphs (8 in total) were generated for this run by
ScriptShape
.
最后一个
cluster
字形数量被计算因为我们知道
ScriptShape
为这个
run
生成了多少个字形
(
总共
8
个
)
。
Interpolation is the key
The one thing to understand about Uniscribe is the separation between characters and glyphs. So looking now at another example, what happens when we have three characters comprising two glyphs? The problem we face is, how do we distribute the colour information for each character across the glyphs, and which glyph takes which colour?
要理解的是关于
Uniscribe
在字符和字形之间的区别。因此,现在看另一个例子,当我们有两个字形表示三个字符时发生了什么?表面问题是,我们如何为每个穿越字形的字符分配颜色信息,并且哪个字形使用哪种颜色?
U+0635 U+0651 U+0650
For some scripts where the glyphs typically order horizontally you can almost infer the colour relationships. However when glyphs stack vertically on top of each other within a cluster, or when there is an unequal number of characters-to-glyphs, there is no easy way to associate a colour with a particular glyph.
对于一些典型字形水平顺序的文字,你总能够推断颜色关系。但是在一个
cluster
内部当字形垂直的堆叠在另一个的上面。或当有一个字符到字形的数量不相等。没有容易的方法把一个特殊字形与一种颜色联系。
With UspLib I solved this problem in two ways - using one method for drawing the background, and another when drawing the actual glyphs themselves (the foreground). Drawing the foreground was easy: I decided simply to paint all glyphs in a cluster as a single colour. Should there be multiple colour-attributes for the cluster, only the first is chosen and the rest are ignored. This is by far the easiest method and in reality you wouldn't expect individual glyphs to have their own colours when in a cluster.
使用
UspLib
我以两个方法解决了这个问题
----
使用一种方法绘制底色,并且另一种方法当绘制实际字形本身
(
前景
)
时。绘制前景是容易的:我决定以单一颜色简单的绘制在一个
cluster
中所以字形。对于这个
cluster
应当是多重颜色属性,只是首先的选择并忽略其它的。到目前为止这是非常容易的方法并且在一个
cluster
中实际上你不希望单个字形有它们自己的颜色。
Painting the background is rather different, because the inversion-highlighting scheme must be taken into account. The strategy I have used here is to interpolate the colours across the width of each cluster, when drawing the background. This method is hinted at by Microsoft in the following quote from MSDN, in the section "notes on ScriptXtoCP and ScriptCPtoX":
绘制底色是相当困难的,因为反相
—
突出方案必须被考虑到。我在这儿使用的策略是当绘制前景时内插颜色穿越每个
cluster
的宽度,这个方法经微软在下面来自于
MSDN
的引号部分被暗示。在这个章节中
”notes on ScriptXtoCP and ScriptCPtoX”.
"Cluster information in the logical cluster array is used to share the width of a cluster of glyphs equally among the logical characters they represent."
It took me a while to make the logical leap that this strategy could also be used for text-rendering, however after implementing it I realised it is the exact same method used by the ScriptString API.
The process is very simple. We know how many characters make up each cluster, and also how many glyphs make up each cluster. We therefore sum the width of these glyphs to calculate the total width of the cluster. We then divide the cluster-width by the number of characters in the cluster - this number tells us how wide each colour band should be.
advanceWidth = clusterWidth / charCount;
For some scripts, dividing the clusters this way makes alot of sense, especially for Arabic because the caret is conventially positioned at character boundaries rather than glyph-cluster boundaries. However for most scripts this would be viewed as incorrect. Rather than having a special-case just for Arabic, I have instead written
UspTextOut
so that it always interpolates over glyph-clusters, should any colour-attributes happen to be this fine-grained. We will rely on the fact that
ScriptCPtoX
will only allow the caret (and therefore selection-highlghts) to be placed in the middle of clusters when appropriate.
Lastly, using integer math for the cluster-division will be result in potential rounding errors. Whilst this is not a massive problem, we need to have the exact same results that
ScriptCPtoX
produces when it does its own presumed division-operations (otherwise we could be out by a pixel occasionally). Presumably
ScriptCPtoX
uses
MulDiv
in its calculations because this appears to give the correct results, and is what I have used for UspLib.
Drawing the background
As mentioned above, drawing the background is a little different because of the use of interpolation. We will start by looking at the
PaintBackground
routine:
void PaintBackground(USPDATA * uspData, HDC hdc, int xpos, int ypos, RECT * bounds)
{
int i;
ITEM_RUN * itemRun;
// Process the item-runs in visual-order
for(i = 0; i < uspData->itemRunCount; i++)
{
itemRun = GetItemRun(uspData, i);
// paint the background of the specified item-run
PaintItemRunBackground(uspData, itemRun, hdc, xpos, bounds);
xpos += itemRun->width;
}
}
As you can see this function is very simple. It merely processes the item-runs in visual-order and advances the x-coordinate by the item-width for each run. Each item-run background are rendered individually by the
PaintItemRunBackground
function.
void PaintItemRunBackground(USPDATA *uspData, ITEM_RUN *itemRun, HDC hdc, int xpos, int ypos)
{
int i, lasti;
// locate the item-run buffers
WORD * clusterList = uspData->clusterList + itemRun->charPos;
ATTR * attrList = uspData->attrList + itemRun->charPos;
int * widthList = uspData->widthList + itemRun->glyphPos;
for(lasti = 0, i = 0; i < itemRun->len; i++)
{
// search for a logical cluster boundary (or end of run)
if(i == itemRun->len || clusterList[lasti] != clusterList[i])
{
<< process cluster >>
}
}
}
The primary task is to identify the logical-cluster positions. The two loop-indices (
lasti
and
i
) represent these cluster positions in the original text-string. The number of
WCHAR
s in each cluster is therefore (
i-lasti
). Because we always iterate in logical order, this is true for both LTR and RTL texts.
<< process cluster >>
int glyphIdx1, glyphIdx2;
// locate glyph-positions for the cluster
GetGlyphClusterIndices(itemRun, clusterList, i, lasti, &glyphIdx1, &glyphIdx2);
// measure width of this group of glyphs
for(runWidth = 0; glyphIdx1 <= glyphIdx2; )
runWidth += widthList[glyphIdx1++];
// divide the cluster-width by the number of code-points that cover it
advanceWidth = MulDiv(runWidth, 1, i-lasti);
Once a cluster has been identified,
GetGlyphClusterIndices
is callled. This function inspects the
clusterList
and returns the corresponding glyph-index positions for
i
and
lasti
.
The width of the glyph-cluster is computed next, by simply iterating between
glyphIdx1
and
glyphIdx2
, before dividing the cluster-width by the number of characters (WCHARs). We now know how far to advance each time we paint a bit of background.
for(a = lasti; a <= i; a++)
{
// look for change in attribute background
if(a == itemRun->len ||
attr.bg != attrList[a].bg ||
attr.sel != attrList[a].sel )
{
PaintRectBG(uspData, itemRun, hdc, xpos, &rect, &attr);
rect.left = rect.right;
}
}
The final task is to interpolate the colour-attributes over the cluster. We only ever paint the background if we detect a change in colour, so most of the time an item-run background is painted with just one operation. Missing from the code listing above is the small detail of correcting for rounding errors in the (integer) division - however this is not necessary for understanding the code.
I won't bother including the code for the PaintRectBG function - suffice to say it is not really very interesting, other than the fact that it calls
ExcludeClipRect
after drawing any selection-highlight background area.
void GetGlyphClusterIndices( USPDATA * uspData,
ITEM_RUN * itemRun,
int clusterIdx1,
int clusterIdx2,
int * glyphIdx1,
int * glyphIdx2
)
{
WORD *clusterList = uspData->clusterList + itemRun->charPos;
// locate glyph-positions for the cluster
if(itemRun->analysis.fRTL)
{
// RTL scripts
*glyphIdx1 = clusterIdx1 < itemRun->len ? clusterList[clusterIdx1] + 1 : 0;
*glyphIdx2 = clusterList[clusterIdx2];
}
else
{
// LTR scripts
*glyphIdx1 = clusterList[clusterIdx2];
*glyphIdx2 = clusterIdx1 < itemRun->len ? clusterList[clusterIdx1] - 1 : itemRun->glyphCount - 1;
}
}
Above is the
GetGlyphClusterIndices
function. Note the two distinct cases for LTR and RTL scripts - this is required because the cluster-elements decrease when reading the cluster array (for RTL scripts), but increase for LTR scripts.
Drawing the Foreground
The process for drawing the text is so similar to that of the background that I won't bother including too much code this time. We'll jump straight in to the start of the
DrawForegroundItemRun
function:
// right-left runs can be drawn backwards for simplicity
if(itemRun->analysis.fRTL)
{
oldMode = SetTextAlign(hdc, TA_RIGHT);
xpos += itemRun->width;
runDir = -1;
}
The first thing we do is set the text-alignment to
TA_RIGHT
for any right-to-left string, and advance the x-coordinate to the end of the run. This will allow us to draw the text in logical order (as we walk the logical-cluster-list). This is important because apart from this one detail, it means we can maintain a single function for drawing both LTR and RTL texts.
// loop over all the logical character-positions
for(lasti = 0, i = 0; i <= itemRun->len; i++)
{
// find a change in attribute
if(i == itemRun->len || attrList[i].fg != attrList[lasti].fg )
{
// scan forward to locate end of cluster (we must always
// handle whole-clusters because the attr[] might fall in the middle)
for( ; i < itemRun->len; i++)
if(clusterList[i - 1] != clusterList[i])
break;
// locate glyph-positions for the cluster [i,lasti]
GetGlyphClusterIndices(itemRun, clusterList, i, lasti, &glyphIdx1, &glyphIdx2);
<< display text >>
}
}
The next difference between foreground and background rendering is how we identify cluster-boundaries. This time we look for changes in colour first of all. Once a new colour is found we scan forward to locate the end of the cluster. This means we can paint the whole cluster in one colour and not worry about interpolation.
<< display text >>
// measure the width (in pixels) of the run
for(runWidth = 0, g = glyphIdx1; g <= glyphIdx2; g++)
runWidth += widthList[g];
// only need the text colour as we are drawing transparently
SetTextColor(hdc, forcesel ? uspData->selFG : attrList[lasti].fg);
//
// Finally output the run of glyphs
//
hr = ScriptTextOut(
hdc,
&uspFont->scriptCache,
xpos,
ypos,
0,
NULL,
&itemRun->analysis,
NULL,
0,
glyphList + glyphIdx1,
glyphIdx2 - glyphIdx1 + 1,
widthList + glyphIdx1,
NULL,
offsetList + glyphIdx1
);
// +ve/-ve depending on run direction
xpos += runWidth * runDir;
lasti = i;
Once the text-colour has been set,
ScriptTextOut
is called with the range of glyphs which fall within the cluster. Once again, we only output any text should there be a change in colour so usually there would only be one call to
ScriptTextOut
.
7. ScriptTextOut
For the sake of completeness here's the prototype for
ScriptTextOut
:
HRESULT WINAPI ScriptTextOut(
HDC hdc,
SCRIPT_CACHE * psc,
int x,
int y,
UINT fuOptions, // ExtTextOut options
RECT * rect,
SCRIPT_ANALYSIS * analysis,
WCHAR * pwcReserved,
int iReserved,
WORD * pwGlyphs, // in - results of ScriptShape
int cGlyphs,
int * piAdvance, // in - results of ScriptPlace
int * piJustify,
GOFFSET * pGoffset // in - results of ScriptPlace
);
That's a pretty intimidating function by anyone's standards! The parameters of note are:
·
fuOptions
is can be one of
ETO_CLIPPED
,
ETO_OPAQUE
, or zero. These are the standard
ExtTextOut
flags. Because we have drawn the background ourselves there is no need to use these parameters. Note that if opaque was specified, whole clusters of glyphs must be passed to this function.
·
pwGlyphs
and
cGlyphs
identify the list of glyph values returned by
ScriptShape
.
·
piAdvance
and
pGoffset
point to the glyph-placement buffers returned by
ScriptPlace
.
·
piJustify
points to an optional array of justified advance values.
ScriptTextOut
is basically a wrapper around
ExtTextOut
- however you will notice that there is no
WCHAR*
parameter to this function. This is because
ScriptTextOut
calls
ExtTextOut
with the
ETO_GLYPH_INDEX
option, and passes the buffer of glyphs we specifed.
ScriptTextOut
may perform additional processing (such as glyph-reordering) before calling into GDI, so don't be tempted to bypass
ScriptTextOut
by calling
ExtTextOut
directly.
Uniscribe Limitations
One of the drawbacks of Uniscribe is the very thing it does best - the breaking up of a string into individually shapable items. The problem is that some strings containing alot of whitespace or punctuation result in a large number of item-runs. Whilst this is not bad in itself, it does present a problem when it comes to rendering the line of text. The shear number of calls to
ScriptTextOut
has a performance penalty - in comparison to calling
ExtTextOut
with the same line of text.
For complex-scripts there is no alternative but to break up the string using
ScriptItemize
. However it would be nice if for non-complex (i.e. English) scripts we could somehow re-combine the item-runs and reduce the potential number of calls to
ScriptTextOut
. I haven't ventured too far down this path yet, but it is certainly possible to identify if an item-run is complex or not by inspecting the
SCRIPT_ANALYSIS::eScript
field.
struct SCRIPT_ANALYSIS
{
WORD eScript : 10;
WORD fRTL : 1;
...
};
Now, the
eScript
field is 'opaque' which means we shouldn't make any assumptions about its value. However it can be used as an index into the "global script table", which contains information about the specific script-shaping engines installed in a system.
HRESULT WINAPI ScriptGetProperties(SCRIPT_PROPERTIES ***ppSp, int *piNumScripts);
The
ScriptGetProperties
function returns a pointer to this global-script-table, and each entry in the table is a pointer to a
SCRIPT_PROPERTIES
structure:
struct SCRIPT_PROPERTIES
{
DWORD langid;
DWORD fNumeric;
DWORD fComplex;
...
};
There are many information-fields in this structure, however the interesting one for us is the
fComplex
flag. Drawing all this together results in the following function, which returns a boolean indicating if an item-run is complex or not:
BOOL IsRunComplex(ITEM_RUN *itemRun)
{
SCRIPT_PROPERTIES ** propList;
int propCount;
int scriptIndex;
// get pointer to the global script table
ScriptGetProperties(&propList, &propCount);
// the SCRIPT_ANALYSIS::eScript is an index to the global script table
scriptIndex = itemRun->analysis.eScript;
// locate the script from the script-index
return propList[scriptIndex]->fComplex;
}
Any non-complex item-runs could theoretically be identified and then merged together into a single run, with the
SCRIPT_ANALYSIS::eScript
field set to
SCRIPT_UNDEFINED
. All this should happen before
ScriptShape
is called.
Coming up in Part 15
Every time I post a new tutorial I promise that there'll be another update to Neatpad, and of course it hasn't happened (again!). Uniscribe is just so damn complicated it has taken me far more time to document than I first anticpated. For now you can download the UspLib demo at the top of this tutorial, and next time we really will be seeing a new-and-improved Neatpad.
Please send any comments or suggestions to: [email protected]
Last modified: 20 May 2006 12:44:09
2 Character to Glyph Mapping (‘cmap’)
The ‘cmap’ table is used to convert from an outside encoding (such as Unicode) to internal glyph ids. The rendering system uses the ‘cmap’ to convert the Unicode code points in a string to glyph ids and then renders the appropriate glyph shapes at the proper positions on the screen or printer. Glyph ids are used exclusively to reference glyphs in all other font tables.
cmap表被用于转换一个外部编码(例如Unicode)到一个内部字形 ID,绘制系统使用cmap转换一个字符串中的Unicode编码点为字形ids,然后在屏幕或打印机的适当位置绘制适当字形图形。字形ids唯一的用于引用所有其它字体表中的字形。
The ‘cmap’ table consists of a set of mapping subtables for different technologies and architectures. This allows the same font to work on multiple operating systems. For example, most TrueType fonts have three subtables within the ‘cmap’, two for Apple and one for Microsoft.
cmap表由一组用于不同工艺和结构的子表组成。允许同一种字体工作在多重操作系统上。举个例子,在cmap之中最大(大多数)的TureType字体有三个子表。两个用于苹果一个用于微软。
Each mapping subtable has two numbers associated with it: the platform id and the encoding id. The platform id indicates what architecture the subtable is designed for. For example, a platform id of 0 indicates Apple Unicode, 1 indicates Apple Script Manager, 2 indicates ISO, and 3 indicates Microsoft. The meaning of the encoding id depends on the platform. For example, some older fonts have only two subtables: Platform 1, encoding 0 (Mac Roman 8-bit simple 256 glyph encoding) and Platform 3, encoding 1 (Windows Unicode).
每一个映射子表有两个号码关联它:平台id和编码id.平台id标志子表被设计用于什么体系结构。举例:一个为0的平台id标志苹果Unicode,1标志Apple Script Manager,2标志ISO,3标志微软。编码id的意思取决于平台。举例,一个老字体只有两个子表:平台1、编码0(Mac Roman 8-bit simple 256 glyph encoding)和平台3、编码1(Widows Unicode).
The mapping subtables use various formats to reduce their size, but all formats map from a character code to a glyph id. Multiple character codes can map to the same glyph id. For example, space (U+0020) and no-break space (U+00A0) often map to the same glyph id.
There are some special glyph ids which are reserved by convention as described in the below table. It is advisable to follow these conventions when modifying fonts. Glyph 0 is normally an open rectangle and is used by rendering systems to substitute for characters that are not present in the font.
Glyph Id
|
Character
|
0
|
unknown glyph
|
1
|
null
|
2
|
carriage return
|
3
|
Space
|
映射子表使用不同的格式减小它们的尺寸。但所有从一个字符代码到一个字形id字形的格式映射。多个字符代码能被映射为相同字形id.举例:空格(u+0020)和不间断空格(u+00A0)通常映射为同一个字形id.
一些特殊字形id由协议被保留。作为描述在下面列出。当修改字体时下面这些协议是可取的。字形0通常是一个空矩形(方框),它用于绘制系统代替没能出现在字体中的字符。
ScriptPlace
The
ScriptPlace function takes the output of a
ScriptShape call and generates glyph advance width and two-dimensional offset information.
ScriptPlace函数带一个对ScriptShape函数调用的结果为参数,返回字符前进宽度和二维偏移信息。
HRESULT WINAPI ScriptPlace(
HDC hdc,
SCRIPT_CACHE *psc,
const WORD *pwGlyphs,
int cGlyphs,
const SCRIPT_VISATTR *psva,
SCRIPT_ANALYSIS *psa,
int *piAdvance,
GOFFSET *pGoffset,
ABC *pABC
);
参数
hdc
[in] 设备环境句柄
psc
[in/out] Pointer to a
SCRIPT_CACHE structure.
一个SCRIPT_CACHE结构指针。
pwGlyphs
[in] Pointer to a glyph buffer obtained from an earlier call to the
ScriptShape function.
从先前调用ScriptShape函数而获取的一个字形缓冲区指针。
cGlyphs
[in] Count of glyphs in the glyph buffer.
在字形缓冲区中的字形数量。
psva
[in] Pointer to an array of
SCRIPT_VISATTRstructures.
一个SCRIPT_VISATTR结构数组指针。
psa
[in/out] Pointer to an array of
SCRIPT_ANALYSIS structures obtained from a previous call to
ScriptItemize.
从先前调用ScriptItemize函数而获得的一个SCRIPT_ANALYSIS结构数组指针。
piAdvance
[out] Pointer to an array that receives the advance width information.
一个数组指针用于接收前进宽度信息。可能是每个字形的宽度,即每个字形的abcA+abcB+abcC.
pGoffset
[out] Pointer to a
GOFFSET structure that receives the x and y offset of the combining glyph.
一个GOFFSET结构指针用于接收组合字形的x和y偏移量.
pABC
[out] Pointer to an
ABC structure that receives the ABC widths for the entire run.
一个ABC结构指针用于接收整个run的ABC宽度。
返回值:
如果函数成功,返回值是零。
If the function fails, it returns a nonzero value. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
如果函数失败,返回一个非零值。并且如果遇到任何不可校正的错误。它也返回一个HRESULT类型的值。
Remarks
The composite ABC width for the whole item identifies how much the glyphs overhang to the left of the start position and to the right of the length implied by the sum of the advance widths. The total advance width of the line is exactly abcA+abcB+abcC. The abcA and abcC values are maintained as proportions of the cell height represented in 8 bits and are thus roughly +/-1%. The total width returned, which is the sum of the abcA+abcB+abcC values pointed to by
piAdvance, is accurate to the resolution of the TrueType shaping engine.
All arrays are in visual order unless the
fLogicalOrder member is set in the
psa parameter.
对整个条目的合成ABC宽度标识了多少个字形向起始位置的左边突出和向长度右边突出。这个长度是前进宽度的总数。行总的前进宽度正好是abcA+abcB+abcC.
abcA和abcC值被维持为以8位表示并且大概是+/-1%的单元高度的比例,由piAdvance返回的总的宽度是abcA+abcB+abcC的值。是准确的TureType图形引擎的分辨率。
所有数组是以可视顺序除非在psa参数中的fLogicalOrder成员被设置。
ScriptLayout
The
ScriptLayout function converts an array of run-embedding levels to a map of visual-to-logical position and/or logical-to-visual position.
ScriptLayout函数转换一个run-embedding级别数组为一个可视到逻辑位置的映射表或逻辑到可视位置的映射表。
HRESULT WINAPI ScriptLayout(
int cRuns,
const BYTE *pbLevel,
int *piVisualToLogical,
int *piLogicalToVisual
);
Parameters
cRuns
[in] 要处理的run的数量。
pbLevel
[in] Pointer to an array of run-embedding levels.
一个run-embedding级别数组指针。
piVisualToLogical
[out] Pointer to an array that receives the run levels reordered to visual order.
一个用于接收run级别重新排序为可视顺序的数组指针。
piLogicalToVisual
[out] Pointer to an array that receives the visual run positions.
接收可视run位置的数组指针。
Return Values
If the function succeeds, the return value is zero.
若函数成功,返回值是零。
If the function fails, it returns a nonzero value. If any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
The run-embedding levels, defined in the Unicode bidirectional algorithm, describe the direction of a run, the direction of any runs it is embedded in, and the direction of the paragraph.
run-embedding级别,定义了在Unicode双向算法中,描述了一个run的方向,任何runs的方向被嵌入到,和段落的方向。
Level
|
Meaning
|
0
|
A left-to-right run in a left-to-right paragraph.
在一个从左到右的段落中一个从左到右的run
|
1
|
A right-to-left run embedded in a left-to-right run in a left-to-right paragraph, or
a right-to-left run (not embedded in another run) in a right-to-left paragraph.
一个从右到左的run嵌入在一个从左到右的段落中的一个从左到右的run中或者一个从右到左的run(没有嵌入在另一个run中)在一个从右到左的段落中。
|
2
|
A left-to-right run embedded in a right-to-left run of type 1.
一个从左到右的run嵌入在一个从右到左的类型为1的run
|
3
|
A right-to-left run embedded in a left-to-right run of type 2.
一个从右到左的run嵌入在一个从左到右的类型为2的run
|
etc.
|
The embedding levels can continue as far as necessary.
嵌入级别能继续直到必须。
|
The
logical position refers to the placement of a run relative to other runs. It is the position in backing store, and corresponds to the order in which you would read the text aloud. The
visual position of a run is the way the run is visually displayed on the line, taking into account the possible directions of each run.
逻辑位置引用了一个run相对于其它runs的位置,这是向后存储的位置,对应于你大声读取文本顺序。一个run的可视位置是run在视觉上显示在行上的方法,并为每个run考虑可能的方向性。
The
pbLevel parameter must contain the embedding levels for all runs on the line, ordered logically.
pbLevel参数必须包含在一行上对于所有runs在逻辑顺序上的嵌入级别。我们通过ScriptItemize返回的SCRIPT_ITEM数组(其实ScriptItemize返回的是一个数组的指针)划分出run,并由划分出的run组成了一个链表,如果我们把些run是按SCRIPT_ITEM(ScriptItemize返回的)中的元素的在数组中的顺序(由小到大)而创建一个链表。那么从这个run链表的头节点到这个链表的尾节点就是逻辑顺序。而由这个run链表的每个节点的SCRIPT_ANALYSIS成员的SCRIPT_STATE成员的uBidiDirection所组成的一个数组就是逻辑顺序上的embedding level(双向性是从左到右还是从右到左).pbLevel就是这样的一个数组指针。
On output,
piVisualToLogical[0] is the logical index of the run to display at the far left. Subsequent entries should be displayed progressing from left-to-right.
在输出上,piVisualToLogical[0]是显示在最左边的run的逻辑序号。以后的条目应当被显示从左到右进行。是否可以这样理解:这个数组的元素是以显示位置排序的即数组的下标代码显示位置,而该下标所引用的元素值则是逻辑run的索引。
The
piLogicalToVisual[0] parameter is the relative visual position where the first logical run should be displayed, the leftmost display position being zero.
piLogicalToVisual[0]参数相对于第一个逻辑run应当被显示的可视位置,最左边显示位置是0。是不是可以这样理解,所谓位置指的是从左到右,既然最左边的是0,那么它的下一个位置即右边的一个就是1,依此类推。那么这个数组中元素的下标就是逻辑run的索引,而元素的值则是索引所对应的逻辑run应当被显示的位置。
上面这两个输出参数所存储的内容是相反的。
piVisualToLogical是以显示位置为下标,元素的值则是该在位置上应当显示的run的索引。
pvLogiclaVisual是以run的索引为下标,而元素的值则是索引为该下标的run应当显示的位置值。
The caller may request either
piLogicalToVisual or
piVisualToLogicalI, or both.
调用者需要piLogicalToVisual或piVisualToLogical两个之中的一个。
Note that no other input is required because the embedding levels give all necessary information for layout.
注意,没有其它输入被需要因为嵌入级别给出了所有用于布局的必要信息。
ScriptGetLogicalWidths
The
ScriptGetLogicalWidths function converts the glyph advance widths for a specific font into logical widths.
ScriptGetLogicalWidhts函数转换一个指定字体的字形向前宽度为逻辑宽度。
HRESULT WINAPI ScriptGetLogicalWidths(
const SCRIPT_ANALYSIS *psa,
int cChars,
int cGlyphs,
const int *piGlyphWidth,
const WORD *pwLogClust,
const SCRIPT_VISATTR *psva,
int *piDx,
);
Parameters
psa
[in]一个
SCRIPT_ANALYSIS 结构指针.
cChars
[in] 在RUN中的逻辑编码点数量
cGlyphs
[in] 在一个RUN中的字形数量
piGlyphWidth
[in] 字形向前宽度的数组指针
pwLogClust
[in] 逻辑Cluster的数组指针
psva
[in]
SCRIPT_VISATTR 结构指针
piDx
[out] 逻辑宽度的数组指针
Return Values
目前,
ScriptGetLogicalWidths 始终返回 S_OK.
Remarks
ScriptGetLogicalWidths is useful for recording widths in a font-independent manner.
ScriptGetLogicalWidths converts the glyph advance widths calculated for a specific font into logical widths, one per code point, in the same order as the code points. If the same string is then displayed on a different device using a different font, the logical widths may be applied, by using
ScriptApplyLogicalWidth, to approximate the original placement. This would be useful when implementing print preview—on the preview screen it is important to match the layout and placement of the final printed result.
Ligature glyph widths are divided evenly among the characters they represent.
ScriptGetLogicalWidths对于以一个字体独立的方式记录宽度是有用的。ScriptGetLoaicalWidhts转换对于指定字体计算出的字形向前宽度为逻辑宽度。 每个代码点一个逻辑宽度,和编码点相同的顺序。如果同一个字符串使用不同的字体被显示在不同的设备上。通过使用ScriptApplyLogicalWidth函数逻辑宽度可以被应用。以近似最初的安排。当在预览屏幕上实现打印预览这个将是有用的,以匹配布局和最终打印结果的位置。
连字字形宽度以它们代表的字符均匀的划分。
ScriptBreak
The
ScriptBreak function returns information for determining line breaks.
ScriptBreak函数返回用于确定行中断的信息。
HRESULT WINAPI ScriptBreak(
const WCHAR *pwcChars,
int cChars,
const SCRIPT_ANALYSIS *psa,
SCRIPT_LOGATTR *psla
);
Parameters
pwcChars
[in]要处理的Unicode字符.
cChars
[in]要处理的Unicode字符数量.
psa
[in] Pointer to the
SCRIPT_ANALYSIS structure obtained from an earlier call to the
ScriptItemize function.
从先前调用ScriptItemize函数而获得的一个SCRIPT_ANALYSIS结构指针
psla
[out] Pointer to a buffer that receives the character attributes as a
SCRIPT_LOGATTR structure.
用于接收字符属性的SCRIPT_LOGATTR结构体缓冲区指针。
Return Values
If the function succeeds, the return value is zero.
If the function fails, it returns a nonzero value. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
The
ScriptBreak function returns cursor movement and formatting break positions for an item in an array of
SCRIPT_LOGATTR structures. To support mixed formatting within a single word correctly,
ScriptBreak should be passed whole items as returned by
ScriptItemize and not the finer formatting runs.
ScriptBreak在一个SCRIPT_LOGATTR结构数组中对一个item函数返回光标移动和格式化中断位置.在一个单字内字支持复合格式。ScriptBreak应当被传递给ScriptItemize返回的整个item并且不是格式化的run
ScriptBreak does not require an
hdc and does not perform shaping.
ScriptBreak不需要一个hdc也不执行图形化。
The
SCRIPT_LOGATTR structure, pointed to by
psla, identifies valid caret positions and line breaks. The
SCRIPT_LOGATTR.fCharStop flag marks cluster boundaries for those scripts where it is conventional to restrict from moving inside clusters. The same boundaries could also be inferred by inspecting the
pwLogCLust array returned by
ScriptShape, however
ScriptBreak is considerably faster in implementation and does not require an
hdc to be prepared. The fWordStop, fSoftBreak, and fWhiteSpace flags in
SCRIPT_LOGATTR are only available through
ScriptBreak.
Most shaping engines that identify invalid sequences do so by setting the fInvalid flag in
ScriptBreak. The fInvalidLogAttr flag in SCRIPT_PROPERTIES identifies which scripts do this.
由psla指向的SCRIPT_LOGATTR结构,标识了有效的插入记号位置和行中断,SCRIPT_LOGATTR的fCharStop标记标志了cluster那些文字的界限,这个位置是个协议以限制在clusters内部移动。同一个限制也能被以检查由ScriptShape返回的pwLogClust数组而推断。 然而ScriptBreak是以相当快的并且不需要准备hdc,在SCRIPT_LOGATTR中的fWordStop、fSoftBreak、fWhiteSpace标记只是通过ScriptBreak才生效。
SCRIPT_LOGATTR
The
SCRIPT_LOGATTR structure describes attributes of logical characters that are useful when editing and formatting text.
SCRIPT_LOGATTR结构描述了逻辑字符属性,当编辑和格式化文本时它是有用的。
typedef struct tag_SCRIPT_LOGATTR
{
BYTE fSoftBreak :1;
BYTE fWhiteSpace :1;
BYTE fCharStop :1;
BYTE fWordStop :1;
BYTE fInvalid :1;
BYTE fReserved :3;
} SCRIPT_LOGATTR;
Members
fSoftBreak
It is valid to break the line in front of this character. This member is set on the first character of Southeast Asian words.
在这个字符前面中断行是有效的。这个成员被设置在东南亚字的首字符上。
fWhiteSpace
This character is one of the many Unicode characters that are classified as breakable white space. Breakable white space can break words—that is, it is all white space except NBSP (nonbreaking space) and ZWNBSP (zero-width nonbreaking space).
这个字符是许多被划分为一个可中断空白的Unicode字符之一,可中断空白能够中断字,也就是说,它是所有除了NBSP(nonbreaking space非中断空格)和ZWNBSP(zero-width nonbreaking space 零宽度非中断空格)的空白。
fCharStop
Valid caret position. Set on most characters, but not on code points inside Indian and Southeast Asian character clusters. May be used to implement LEFT ARROW and RIGHT ARROW operations in editors.
有效的插入记号位置,设置在大多数的字符上,但不在印度和东南亚字符clustersr的编码点上。可以被使用以在编辑中实现LEFT_ARROW和RIGHT_ARROW操作。
fWordStop
Valid caret position. It is the correct place to show the caret when you use a word movement keyboard action such as CTRL+LEFT ARROW and CTRL+RIGHT ARROW. May be used to implement the CTRL+LEFT ARROW and CTRL+RIGHT ARROW operations in editors.
有效的插入记号位置,当你使用一个字移动键盘的方式如CTRL+LEFT ARROW和CTRL+RIGHT ARROW,这是以显示插入记号的正确位置,可以被使用于在编辑中实现CTRL+LEFT ARROW和CTRL+RIGHT ARROW操作。
fInvalid
Marks characters which form an invalid or undisplayable combination. Scripts which can set this flag have the flag
fInvalidLogAttr set in their
SCRIPT_PROPERTIES structure.
标记字符哪种格式一个无效的不可显示的组合。能被设置为这个标志的文字在它们的SCRIPT_PROPERTIES结构有fInvalidLogAttr标志设置。
ScriptBreak
ScriptBreak
works alongside
ScriptItemize
to identify the
logical attributes
of each character in a string.
ScriptBreak
must be called once for each individual item-run in the string (as identified by
ScriptItemize
) and returns an array of
SCRIPT_LOGATTR
structures. Each entry in the array represents a single
WCHAR
in the Unicode string, and must be allocated by the caller to have the same number of elements as there are WCHARs in the run.
ScriptBreak
工作依靠
ScriptItemize
以标识一个字符串中每个字符的逻辑属性。对于在字符串中每一个单个的
item-run (
以
ScriptItemize
标识
)ScriptBrak
必须被调用一次并返回一个
SCRIPT_LOGATTR
的结构数组。每个数组元素代表了
Unicode
字符串中一个单一
WCHAR
,并且必须通过调用者分配与
run
中的
WCHARs
数量相同的元素个数即数组大小。
HRESULT WINAPI
ScriptBreak
(
WCHAR * pwcChars,
int cChars,
SCRIPT_ANALYSIS * psa,
SCRIPT_LOGATTR * psla
);
The individual attributes for each character are held within the
SCRIPT_LOGATTR
structure, shown below:
针对每个字符的单个属性被固定在
SCRIPT_LOGATTR
结构之内,展现在下面:
struct SCRIPT_LOGATTR
{
BYTE fSoftBreak : 1;
BYTE fWhiteSpace : 1;
BYTE fCharStop : 1;
BYTE fWordStop : 1;
BYTE fInvalid : 1;
BYTE fReserved : 3;
};
Although each field of the
SCRIPT_LOGATTR
structure has a specific purpose, this information as returned by
ScriptBreak
is generally useful for two purposes: word-wrapping and keyboard navigation:
虽然
SCRIPT_LOGATTR
结构体的每个字段有一个特殊用途,由
ScriptBreak
返回的这个信息通常对于两个目的有用:
word-wrapping(
字绕加,自动换行,字环绕
)
和键盘导航。
·
fSoftBreak
indicates the positions within a string where word-wrapping can take place - in other words, the positions where the string can be broken into smaller units suitable for display over multiple lines.
·
fSoftbreak
标志在一个字符串中字回绕能产生
(
进行
)
的位置,换名话说,在这个位置上字符串能被中断为较小的单位适合于以多行显示。
·
fWhiteSpace
indicates that the corresponding character should be treated as white-space. This could potentially be set for many more characters than just tabs and spaces.
·
fWhiteSpace
标志相应的字符应当作为一个空白对待,这个可能为许多字符被设置
·
fCharStop
and
fWordStop
identify valid caret positions within the string. These positions can be used to support single character- based navigation and 'word' navigation.
·
fCharStop
和
fWordStop
标识在一个字符串中有效的插入记号位置,这些位置能被用于以支持单一字符
-
基于导航和“字”导航。
Don't under-estimate just how
much
work
ScriptBreak
is doing on our behalf. The identification of character and word positions alone saves us a tremendous amount of effort. Added to this is the fact that
ScriptBreak
supports all of the various Unicode scripts, so for languages such as Thai (which require dictionary support to identify 'soft breaks'), all of the hard work is already done.
不能低估
ScriptBreak
在我们的利益上仅仅作了多少工作。字符的标识和字位置就保存给我们一个极大的工作数量。增加到这个事实上
Scriptbreak
支持所有不同的
Unicode
文字,因此对于语言如泰语
(
这种语言需要方向支持以标识
”soft breaks”),
所有困难工作已经被做掉了。
The task of calling
ScriptBreak
for each item-run is handled by the
UspAnalyze
function, which we looked at in previous tutorials. The
SCRIPT_LOGATTR
buffer is allocated and stored inside the
USPDATA
object's
breakList
* member. A simple loop is then used to iterate over each item-run, and the results of
ScriptBreak
stored inside the
USPDATA::breakList
array. The array holds the results for all item-runs, concatenated together:
对于每个
item-run
调用
ScriptBreak
的任务由
UspAnalyza
函数处理,我们在先前的指南中看这个,
SCRIPT_LOGATTR
被分配和存储在
USPDATA
内部对象的
breakList *
成员。一个简单循环是然后使用迭代器遍历每个
item-run,Scriptbreak
的结果存储在
USPDATA::breakList
数组中。数组为所有
item-runs
保存结果,连接在一起。
<< UspLib.c - UspAnalyze(...) >>
// allocate memory for SCRIPT_LOGATTR structures
uspData->breakList = malloc(wlen * sizeof(SCRIPT_LOGATTR));
// Generate the word-break information for each item-run
for(i = 0; i < uspData->itemRunCount; i++)
{
ITEM_RUN *itemRun = &uspData->itemRunList[i];
ScriptBreak(
wstr + itemRun->charPos,
itemRun->len,
&itemRun->analysis,
uspData->breakList + itemRun->charPos
);
}
Any string (or paragraph) of text analyzed with UspAnalyze will therefore automatically have it's
SCRIPT_LOGATTR
information stored inside the
USPDATA
object. Because the information for each run has been concatenated into the same buffer in effect individual item-runs do not need to be taken into account when inspecting the logical-attributes for each character in the string.
以
UspAnalyze
生成的文本分析的任何字符串
(
或段落
)
将因此自动有它的存储在
USPDATA
对象内部的
SCRIPT_LOGATTR
信息,因为对于每个
run
的信息
Let's look at a quick example and see how the string "Hello
يُساوِي
World" would be treated by
ScriptBreak
. Note that there are two spaces in the string, one either side of the Arabic phrase:
让我们看一个快速的例子并且看字符串
”Hello
يُساوِي
World”
如何由
ScriptBreak
对待。注意在这个字符串中有两个空格,在阿拉伯语的两端。
SCRIPT_LOGATTR
|
|
H
|
E
|
L
|
L
|
O
|
|
ي
|
ُ
|
س
|
ا
|
و
|
ِ
|
ي
|
|
W
|
O
|
R
|
L
|
D
|
SoftBreak
|
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
WhiteSpace
|
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
CharStop
|
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
0
|
1
|
1
|
1
|
0
|
1
|
1
|
1
|
1
|
1
|
1
|
1
|
WordStop
|
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
0
|
0
|
0
|
Invalid
|
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
ScriptXtoCP
The
ScriptXtoCP function converts an x offset from the left end (!fLogical) or leading edge (fLogical) of a run to a logical character position and a flag that indicates whether the x position fell in the leading or the trailing half of the character.
ScriptXtoCP函数转换一个x偏移量
HRESULT WINAPI ScriptXtoCP(
int iX,
int cChars,
int cGlyphs,
const WORD *pwLogClust,
const SCRIPT_VISATTR *psva,
const int *piAdvance,
const SCRIPT_ANALYSIS *psa,
int *piCP,
int *piTrailing
);
Parameters
iX
[in]偏移量,逻辑单位,从run左侧.
cChars
[in] run中逻辑编码点的数量
cGlyphs
[in] run中的字形数量
pwLogClust
[in] 指向一个包含逻辑clusters的数组。
psva
[in]一个SCRIPT_VISATTR结构体数组指针,这个数组包含字形的可视信息
piAdvance
[in] Pointer to an array of advance widths. 前进宽度数组指针
psa
[in] SCRIPT_ANALYSIS结构体指针
piCP
[out] Pointer to an integer to receive the character position.
一个整数指针,接收字符位置
piTrailing
[out] Pointer to a flag to receive information about whether the position is the leading or trailing edge of the character.
标志指针,接收关于是否位置是字符的前沿或者后沿的信息。
Return Values
If the function succeeds, the return value is zero.
If the function fails, it returns a nonzero value. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
For scripts where the caret may conventionally be placed into the middle of clusters (for example, Arabic and Hebrew), the returned character position may be for any code point in the line, and
piTrailing will be either zero or one.
For scripts in which the caret is conventionally snapped to the boundaries of a cluster, the returned character position is always the position of the first code point in a cluster (considered logically), and
piTrailing is either zero or the number of code points in the cluster.
Thus the appropriate caret position for a mouse hit is always the returned character position plus the value of
fTrailing.
If the x position passed is not in the item at all, the resulting position will be the trailing edge of character –1 (for x positions before the item), or the leading edge of character "cChars" (for x positions following the item).
ScriptTextOut
The
ScriptTextOut function takes the output of both
ScriptShape and
ScriptPlace calls and calls the operating system
ExtTextOut function appropriately.
All arrays are in display order unless the
fLogicalOrder member is set in the
SCRIPT_ANALYSIS structure pointed to by
psa.
HRESULT WINAPI ScriptTextOut(
const HDC hdc,
SCRIPT_CACHE *psc,
int x,
int y,
UINT fuOptions,
const RECT *lprc,
const SCRIPT_ANALYSIS *psa,
const WCHAR *pwcReserved,
int iReserved,
const WORD *pwGlyphs,
int cGlyphs,
const int *piAdvance,
const int *piJustify,
const GOFFSET *pGoffset
);
Parameters
hdc
[in] Handle to a device context.
psc
[in/out] Pointer to a
SCRIPT_CACHE structure.
x
[in] Value of the x-coordinate of the first glyph.
y
[in] Value of the y-coordinate of the first glyph.
fuOptions
[in] Options equivalent to the
fuOptions parameter of
ExtTextOut. It may contain ETO_CLIPPED or ETO_OPAQUE (or neither of them, or both of them).
lprc
[in] Pointer to a rectangle used to clip the display. This parameter is optional and can be NULL.
psa
[in] Pointer to a
SCRIPT_ANALYSISstructure obtained from a previous call to
ScriptItemize.
pwcReserved
Reserved, must be NULL.
iReserved
Reserved, must be zero.
pwGlyphs
[in] Pointer to an array of glyphs obtained from a previous call to
ScriptShape.
cGlyphs
[in] Count of the glyphs in the
pwGlyphs array.
piAdvance
[in] Pointer to an array of advance widths obtained from a previous call to
ScriptPlace.
piJustify
[in] Pointer to an array of justified advance widths. This parameter is optional and can be NULL.
pGoffset
[in] Pointer to a
GOFFSET structure containing the x and y offsets for the combining glyph.
Return Values
If the function succeeds, the return value is zero.
If the function fails, it returns a nonzero value. And if any other unrecoverable error is encountered, it is also returned as HRESULT. For example, error returns from Win32 API functions are converted to HRESULT using the HRESULT_FROM_WIN32 macro and returned to the client in the HRESULT.
Remarks
For any run that is rendered right-to-left (fRTL) and was generated in logical order by forcing the fLogicalOrder flag, call
SetTextAlign(hdc, TA_RIGHT) and give the right-side coordinate before calling
ScriptTextOut.
The
piJustify array provides requested cell widths for each glyph. When the
piJustify width of a glyph differs from the unjustified width (in
piAdvance), space is added to or removed from the glyph cell at its trailing edge. The glyph is always aligned with the leading edge of its cell. (This rule applies even in visual order.)
When a glyph cell is extended the extra space is usually made up by the addition of white space, however for Arabic scripts, the extra space is made up by one or more kashida glyphs, unless the extra space is insufficient for the shortest kashida glyph in the font. (The width of the shortest kashida is available by calling
ScriptGetFontProperties.)
The
piJustify parameter should be passed only if the string must be justified again. Normally, pass NULL to this parameter.
The
pwcinChars and
cChars parameters are required only if output is to a metafile DC. If
hdc is not a metafile, these parameters may be passed as NULL and zero.
Do not use
ScriptTextOut to write to a metafile unless you are sure that the metafile will be played back without any font substitution.
ScriptTextOut records glyph numbers in the metafile, and since glyph numbers vary considerably from one font to another, such a metafile is unlikely to play back correctly when different fonts are substituted. For example, when a metafile is played back at a different scale, a
CreateFont request that is recorded in the metafile may resolve to a bitmap instead of a TrueType font. Likewise, if the metafile is played back on a different machine, the requested fonts may not be installed. To write complex scripts in a metafile in a font-independent manner, use
ExtTextOut to write the logical characters directly, so that glyph generation and placement do not occur until the text is played back.