字符串方法0x02 -- 大小写转换

转载须注明出处：@Orca_J35 | GitHub@orca-j35

字符串不仅支持所有通用序列操作，还实现了很多附件方法。
我会以『字符串方法』为标题，分几篇笔记逐一介绍这些方法。
我会在这仓库中持续更新笔记：https://github.com/orca-j35/python_notes

capitalize

str.capitalize()

Return a copy of the string with its first character capitalized and the rest lowercased.

# 将字符串的首个字符转换为大写，剩余字符转换为小写，不具备大小写样式的字符原样保留
>>> 'abc DEF 鲸'.capitalize()
'Abc def 鲸'
>>> 'à'.capitalize()
'À'
# 如果字符没有大小写形式，则保持不变
>>> '鲸 abc DEF'.capitalize()
'鲸 abc def'

casefold

str.casefold()

Return a casefolded copy of the string. Casefolded strings may be used for caseless matching.

Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter 'ß' is equivalent to "ss". Since it is already lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".

The casefolding algorithm is described in section 3.13 of the Unicode Standard.

# 将字符串中具备小写形式的字符全部转换为小写字符，不具备大小写样式的字符原样保留
>>> 'ABC !@#%^\n 鲸'.casefold()
'abc !@#%^\n 鲸'
# 和lower的区别是casefold会将所有字符转换为最终的小写形式
>>> 'ß'.casefold() # 不是最终的小写字符，会继续转换
'ss'
>>> 'ß'.lower() # 已是小写字符，不会继续转换
'ß'

lower

str.lower()

Return a copy of the string with all the cased characters converted to lowercase.

Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase). —— 详见本文附录 Letter 小节。

The lowercasing algorithm used is described in section 3.13 of the Unicode Standard.

# 将字符串中具备小写形式的字符全部转换为小写字符，不具备大小写样式的字符原样保留
>>> 'ABC !@#%^\n 鲸'.lower()
'abc !@#%^\n 鲸'
>>> 'ß'.lower()
'ß'

upper

str.upper()

Return a copy of the string with all the cased characters converted to uppercase. Note that s.upper().isupper() might be False if s contains uncased characters or if the Unicode category of the resulting character(s) is not “Lu” (Letter, uppercase), but e.g. “Lt” (Letter, titlecase).

# 将字符串中具备大写形式的字符全部转换为大写字符，不具备大小写样式的字符原样保留
>>> 'abc !@#%^\n 鲸'.upper()
'ABC !@#%^\n 鲸'
>>> '!@#%^\n 鲸'.upper().isupper()
False

swapcase

str.swapcase()

Return a copy of the string with uppercase characters converted to lowercase and vice versa. Note that it is not necessarily true that s.swapcase().swapcase() == s.

# 将字符串中具备大写形式的字符全部转换为小写字符,反之亦然；
# 不局部大小写样式的字符原样保留
>>> 'abc ß !@#%^\n 鲸'.swapcase()
'ABC SS !@#%^\n 鲸'
# s.swapcase().swapcase() 不一定等于 s
>>> 'ABC SS !@#%^\n 鲸'.swapcase()
'abc ss !@#%^\n 鲸'

title

str.title()

Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase.

# 将字符串中每个单词的首字符转换为大写，剩余字符转换为小写，不具备大小写样式的字符原样保留
>>> 'Hello world'.title()
'Hello World'
>>> 'Hello world 鲸鱼 !@$#\n\t'.title()
'Hello World 鲸鱼 !@$#\n\t'

The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result:

# 该算法将非大小写字符作为单词的分组依据，在部分情况下生成的结果和预期不符
# 例如在字符串中遇到名词所有格时，便会对引号后的字符大写
>>> "they're bill's friends from the UK".title()
"They'Re Bill'S Friends From The Uk"
>>> "hello鲸hello\thello#hello".title()
'Hello鲸Hello\tHello#Hello'

非大小写字符是指不属于 Letter 的字符，详见本文附录 Letter 小节。

A workaround for apostrophes can be constructed using regular expressions:

# 可以使用正则表达式构建正确使用撇号的方案
>>> import re
>>> def titlecase(s):
...     return re.sub(r"[A-Za-z]+('[A-Za-z]+)?",
...                   lambda mo: mo.group(0)[0].upper() +
...                              mo.group(0)[1:].lower(),
...                   s)
...
>>> titlecase("they're bill's friends.")
"They're Bill's Friends."

还可使用 string.capwords(s, sep=None) 方法：

>>> import string
>>> s = "they're bill's friends from the UK"
>>> string.capwords(s)
"They're Bill's Friends From The Uk"