Bleve 文档翻译计划(5)——字符过滤器

Character Filters(字符过滤器)

Regular Expression(正则表达式)

正则表达式字符过滤器配置有正则表达式和替换字节数组。匹配正则表达式的所有字符序列都被替换字节替换。
The regular expression character filter is configured with a regular expression and a replacement array of bytes. All sequences of characters matching the regular expression are replaced with the replacement bytes.

通常,索引不需要的字符会被空白替换。这使得原始输入中的原始字节偏移不受影响。
Typically, characters that are undesirable for indexing are replaced with whitespace. This allows the original byte offsets in the original input to remain unaffected.

HTML
超文本标记语言字符过滤器试图从输入文本中识别超文本标记语言标签,并用空格替换它们。当前实现是正则表达式字符筛选器的一个实例。
The html character filter attempts to identify HTML tags from the input text and replace them with spaces. The current implementation is an instance of the Regular Expression character filter.

Zero-width Non-Joiner(零宽度非连接器)
零宽度非连接字符过滤器用空格替换零宽度非连接字符。
The zero-width non-joiner character filter replaces zero-width non-joiner characters with a space.

你可能感兴趣的:(Bleve 文档翻译计划(5)——字符过滤器)