08. Words, Paragraphs, and Line Breaks

相关链接:
https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Strings/Articles/stringsParagraphBreaks.html#//apple_ref/doc/uid/TP40005016-SW1

This article describes how word and paragraph boundaries are defined, how line breaks are represented, and how you can separate a string by paragraph.

  • 本文介绍如何定义单词和段落边界,如何表示换行符,以及如何按段落分隔字符串。

Word Boundaries

The text system determines word boundaries in a language-specific manner according to Unicode Standard Annex #29 with additional customization for locale as described in that document. On OS X, Cocoa presents APIs related to word boundaries, such as the NSAttributedString methods doubleClickAtIndex: and nextWordFromIndex:forward:, but you cannot modify the way the word-boundary algorithms themselves work.

  • 文本系统根据Unicode标准附件#29以特定语言的方式确定字边界,并对该文档中描述的区域设置进行额外定制。 在OS X上,Cocoa提供了与字边界相关的API,例如NSAttributedString方法doubleClickAtIndex:和nextWordFromIndex:forward:,但是您无法修改字边界算法本身的工作方式。

Line and Paragraph Separator Characters

  • 线和段落分隔符

There are a number of ways in which a line or paragraph break can be represented. Historically, \n, \r, and \r\n have been used. Unicode defines an unambiguous paragraph separator, U+2029 (for which Cocoa provides the constant NSParagraphSeparatorCharacter), and an unambiguous line separator, U+2028 (for which Cocoa provides the constant NSLineSeparatorCharacter).

  • 可以通过多种方式表示行或段落。 从历史上看,已使用\ n,\ r和\ r \ n。 Unicode定义了一个明确的段落分隔符,U + 2029(Cocoa为其提供常量NSParagraphSeparatorCharacter),以及一个明确的行分隔符U + 2028(Cocoa为其提供常量NSLineSeparatorCharacter)。
  • 在Cocoa文本系统中,NSParagraphSeparatorCharacter被一致地视为段落,并且NSLineSeparatorCharacter被一致地视为不是段落中断的换行符,即段落中的换行符。 但是,在其他情况下,几乎不能保证如何处理这些字符。 例如,POSIX级软件通常只识别\ n作为中断。 某些较旧的Macintosh软件仅识别\ r \ n,某些Windows软件仅识别\ r \ n。 通常,行和段落之间没有区别。

In the Cocoa text system, the NSParagraphSeparatorCharacter is treated consistently as a paragraph break, and NSLineSeparatorCharacter is treated consistently as a line break that is not a paragraph break—that is, a line break within a paragraph. However, in other contexts, there are few guarantees as to how these characters will be treated. POSIX-level software, for example, often recognizes only \n as a break. Some older Macintosh software recognizes only \r, and some Windows software recognizes only \r\n. Often there is no distinction between line and paragraph breaks.

  • 在Cocoa文本系统中,NSParagraphSeparatorCharacter被一致地视为段落,并且NSLineSeparatorCharacter被一致地视为不是段落中断的换行符,即段落中的换行符。 但是,在其他情况下,几乎不能保证如何处理这些字符。 例如,POSIX级软件通常只识别\ n作为中断。 某些较旧的Macintosh软件仅识别\ r \ n,某些Windows软件仅识别\ r \ n。 通常,行和段落之间没有区别。

Which line or paragraph break character you should use depends on how your data may be used and on what platforms. The Cocoa text system recognizes \n, \r, or \r\n all as paragraph breaks—equivalent to NSParagraphSeparatorCharacter. When it inserts paragraph breaks, for example with insertNewline:, it uses \n. Ordinarily NSLineSeparatorCharacter is used only for breaks that are specifically line breaks and not paragraph breaks, for example in insertLineBreak:, or for representing HTML
elements.

  • 您应该使用哪个行或段落中断字符取决于您的数据的使用方式以及在哪些平台上。 Cocoa文本系统将\ n,\ n或\ r \ n全部识别为段落符号 - 相当于NSParagraphSeparatorCharacter。 当它插入段落符号时,例如insertNewline:,它使用\ n。 通常,NSLineSeparatorCharacter仅用于特定换行符而不是段落符的断点,例如在insertLineBreak:中,或用于表示HTML
    元素。

If your breaks are specifically intended as line breaks and not paragraph breaks, then you should typically use NSLineSeparatorCharacter. Otherwise, you may use \n, \r, or \r\n depending on what other software is likely to process your text. The default choice for Cocoa is usually \n.

  • 如果您的休息时间是专门用作换行符而不是分段符,那么您通常应该使用NSLineSeparatorCharacter。 否则,您可以使用\ n,\ r或\ r \ n取决于其他软件可能处理您的文本。 Cocoa的默认选择通常是\ n。

Separating a String “by Paragraph”

  • 按段落分隔字符串

A common approach to separating a string “by paragraph” is simply to use:

  • “按段落”分隔字符串的常用方法是使用:
NSArray *arr = [myString componentsSeparatedByString:@"\n"];

This, however, ignores the fact that there are a number of other ways in which a paragraph or line break may be represented in a string—\r, \r\n, or Unicode separators.

  • 但是,这忽略了一个事实,即在字符串 - “\ r \ n”,“\ r \ n”或Unicode分隔符中可以表示段落或换行符。

Instead you can use methods—such as enumerateSubstringsInRange:options:usingBlock: and enumerateLinesUsingBlock:—that take into account the variety of possible line terminations, as illustrated in the following example.

  • 相反,您可以使用诸如enumerateSubstringsInRange:options:usingBlock:和enumerateLinesUsingBlock之类的方法: - 它考虑了各种可能的行终止,如以下示例所示。
NSString *string = /* assume this exists */;
NSRange range = NSMakeRange(0, string.length);
[string enumerateSubstringsInRange:range
                           options:NSStringEnumerationByParagraphs
                        usingBlock:^(NSString * _Nullable paragraph, NSRange paragraphRange, NSRange enclosingRange, BOOL * _Nonnull stop) {             // ... }];

你可能感兴趣的:(08. Words, Paragraphs, and Line Breaks)