https://www.elastic.co/guide/en/elasticsearch/guide/current/case-folding.html
参考下这篇文章
package main
import (
"fmt"
"unicode"
)
func equiv(r rune) (e []rune) {
r0 := r
for {
r = unicode.SimpleFold(r)
if r == r0 {
return e
}
e = append(e, r)
}
return e
}
func f(r rune) {
fmt.Printf("For %q (%d): %q\n", r, r, equiv(r))
}
func main() {
for r := 'A'; r <= 'Z'; r++ {
f(r)
}
for r := 'a'; r <= 'z'; r++ {
f(r)
}
}
我的理解, 大小写在不同的语言环境下是不同, 而unicode要匹配全世界的语言. 所以 k 的相似 字母不仅有 K 还有 '\u212A' (Kelvin symbol, K) 所以需要考虑的更多. 而不只是简单的大小写.
asciiFold 映射表 caseOrbit 映射表 这两个映射表就是所谓的 case folding ... 打印了下 caseOrbit 所有的环绕...
From: K, To: k
From: S, To: s
From: k, To: K
From: s, To: ſ
From: µ, To: Μ
From: Å, To: å
From: ß, To: ẞ
From: å, To: Å
From: İ, To: İ
From: ı, To: ı
From: ſ, To: S
From: DŽ, To: Dž
From: Dž, To: dž
From: dž, To: DŽ
From: LJ, To: Lj
From: Lj, To: lj
From: lj, To: LJ
From: NJ, To: Nj
From: Nj, To: nj
From: nj, To: NJ
From: DZ, To: Dz
From: Dz, To: dz
From: dz, To: DZ
From: ͅ, To: Ι
From: Β, To: β
From: Ε, To: ε
From: Θ, To: θ
From: Ι, To: ι
From: Κ, To: κ
From: Μ, To: μ
From: Π, To: π
From: Ρ, To: ρ
From: Σ, To: ς
From: Φ, To: φ
From: Ω, To: ω
From: β, To: ϐ
From: ε, To: ϵ
From: θ, To: ϑ
From: ι, To: ι
From: κ, To: ϰ
From: μ, To: µ
From: π, To: ϖ
From: ρ, To: ϱ
From: ς, To: σ
From: σ, To: Σ
From: φ, To: ϕ
From: ω, To: Ω
From: ϐ, To: Β
From: ϑ, To: ϴ
From: ϕ, To: Φ
From: ϖ, To: Π
From: ϰ, To: Κ
From: ϱ, To: Ρ
From: ϴ, To: Θ
From: ϵ, To: Ε
From: В, To: в
From: Д, To: д
From: О, To: о
From: С, To: с
From: Т, To: т
From: Ъ, To: ъ
From: в, To: ᲀ
From: д, To: ᲁ
From: о, To: ᲂ
From: с, To: ᲃ
From: т, To: ᲄ
From: ъ, To: ᲆ
From: Ѣ, To: ѣ
From: ѣ, To: ᲇ
From: ᲀ, To: В
From: ᲁ, To: Д
From: ᲂ, To: О
From: ᲃ, To: С
From: ᲄ, To: ᲅ
From: ᲅ, To: Т
From: ᲆ, To: Ъ
From: ᲇ, To: Ѣ
From: ᲈ, To: Ꙋ
From: Ṡ, To: ṡ
From: ṡ, To: ẛ
From: ẛ, To: Ṡ
From: ẞ, To: ß
From: ι, To: ͅ
From: Ω, To: Ω
From: K, To: K
From: Å, To: Å
From: Ꙋ, To: ꙋ
From: ꙋ, To: ᲈ