本文为系列第一篇,简单的字符串处理. Pyhton内置的str模块提供很多常用的字符串处理的功能。本文将其分类介绍一下
Python中的文本处理(二)re 模块的常用方法
函数 | 说明 | 语法 | 样例 |
⭐️ startswith | 判断字符串是否以指定前缀开头 | S.startswith(prefix[, start[, end]]) -> bool |
① 'this'.startswith('th') ➡ True ② 'this'.startswith('hi') ➡ False ③ 'this'.startswith('hi', 1) ➡ True |
⭐️ endswith | 判断字符串是否以指定后缀结尾 | S.endswith(suffix[, start[, end]]) -> bool | ① 'this'.endswith('is') ➡ True ② 'this'.endswith('hi') ➡ False ③ 'this'.endswith('hi', 0,3) ➡ True |
isalnum | 判断alpha + num格式 [a-z][A-Z][0-9] |
S.isalnum()-> bool | ① 'fsj23289'.isalnum() ➡ True ② 'fsj23@289'.isalnum() ➡ False |
isalpha | 判断字母格式 [a-z][A-Z] |
S.isalpha()-> bool | ① 'fsj'.isalnum() ➡ True ② 'fsj@'.isalnum() ➡ False |
isascii | 判断是否是ascii码 | S.isascii()-> bool | ① 'Aa*%'.isascii() ➡ True ② '中国'.isascii() ➡ False |
⭐️ isdigit | 判断数字格式 [0-9] |
S.isdigit()-> bool | ① '2374812328'.isdigit() ➡ True ② '237,481,328'.isdigit() ➡ False |
⭐️ islower | 判断小写字母格式 [a-z] |
S.islower()-> bool | ① 'this string'.islower() ➡ True ② 'This string'.islower() ➡ False |
⭐️ isupper | 判断大写字母格式 [a-z] |
S.isupper()-> bool | ① 'THIS STRING'.isupper() ➡ True ② 'THIS sTRING'.isupper() ➡ False |
istitle | 判断首字母大写 [A-Z][a-z]?+ |
S.istitle()-> bool | ①'This string'.istitle() ➡ False ②'This String'.istitle() ➡ True |
isspace | 判断是否均为空格 \s+ |
S.isspace()-> bool | ① ' \t \n \v'.isspace() ➡ True |
isdecimal | |||
isidentifier | |||
isnumeric | |||
isprintable |
函数 | 说明 | 语法 | 样例 |
⭐️ find | 返回第一个匹配的子串的坐标,不存在子串时 返回 -1 | S.find(sub[, start[, end]]) -> int | ① 'this is'.find('is') ➡ 2 ② 'this is'.find('not') ➡ -1 |
index | 返回第一个匹配的子串的坐标,不存在子串时 报错ValueError | S.index(sub[, start[, end]]) -> int | ① 'this is'.index('is') ➡ 2 ② 'this is'.index('not') ➡ ValueError |
rfind | 从右边匹配第一个子串的坐标,不存在子串时 返回 -1 |
S.rfind(sub[, start[, end]]) -> int | ① 'this is'.rfind('is') ➡ 5 ② 'this is'.rfind('not') ➡ -1 |
rindex | 从右边匹配第一个子串的坐标,不存在子串时 报错ValueError | S.rindex(sub[, start[, end]]) -> int | ① 'this is'.rindex('is') ➡ 5 ② 'this is'.rindex('not') ➡ ValueError |
⭐️ count | 返回字符串中出现子串的次数 | S.count(sub[, start[, end]]) -> int |
① 'this is'.count('is') ➡ 2 ② 'this is'.count('not') ➡ 0 |
函数 | 说明 | 语法 | 样例 | 返回 |
⭐️ lower | 将字符串转换为小写 | S.lower() ➡ str | 'This string'.lower() |
'this string' |
⭐️ upper | 将字符串转换为大写 | S.upper() ➡ str | 'This string'.upper() |
capitalize | 所有单词首字母大写 |
S.capitalize() ➡ str | 'this string'.capitalize() | 'This String' |
title | 第一个字母大写 |
S.title() ➡ str | 'this string'.title() | 'This string' |
swapcase | 反转大小写 | S.swapcase() ➡ str | 'This String'.swapcase() | 'tHIS sTRING' |
casefold | 基本等于lower,处理一些特殊字符时使用 | S.casefold() ➡ str | "der Fluß".casefold() | 'der fluss' |
函数 | 说明 | 语法 | 样例 | 返回 |
⭐️ ljust | 使用指定字符,将字符串向右填充到一定长度 | S.ljust(width, fillchar=' ') ➡ str | 'This'.ljust(20) |
'This ' |
'This'.ljust(20, '#') | 'This################' | |||
rjust | 使用指定字符,将字符串向左填充到一定长度 | S.rjust(width, fillchar=' ') ➡ str | 'This'.rjust(20) | ' This' |
'This'.rjust(20, '-') | '----------------This' | |||
center | 使用指定字符,将字符串向两侧填充到一定长度 | S.center(width, fillchar=' ') ➡ str | 'This'.center(20) | ' This ' |
'This'.center(20, 'a') | 'aaaaaaaaThisaaaaaaaa' | |||
zfill | 在字符串左侧补零到指定长度 认为时rjust的简写 |
S.zfill(self, width) ➡ str | '89'.zfill(8)' == '89'.rjust(8, '0') |
'00000089' |
⭐️ strip | 移除两侧的空格,如果指定chars,移除chars内的字符 | S.strip(width, chars=' ') ➡ str | ' This '.strip() | 'This' |
'abcabcThisaaaaaaaa'.strip('abc') | 'This' | |||
lstrip | 移除左侧的空格,如果指定chars,移除chars内的字符 | S.lstrip(width, chars=' ') ➡ str | ' This '.lstrip() | 'This ' |
'abcabcThisaaaaaaaa'.lstrip('abc') | 'Thisaaaaaaaa' | |||
rstrip | 移除右侧的空格,如果指定chars,移除chars内的字符 | S.rstrip(width, chars=' ') ➡ str | ' This '.rstrip() | ' This' |
'abcabcThisaaaaaaaa'.rstrip('abc') | 'abcabcThis' |
⭐️ replace 将匹配的子串替换成新的指定字符串 S.replace(old, new, count=-1)
In :'this is my string'.replace('is', 'notytall')
Out: 'thnotytall notytall my string'
In : 'this is my string'.replace('is', 'notytall', 1)
Out: 'thnotytall is my string'
expandtabs 将tab字符转换成空格 S.expandtabs(tabsize=8) , 可认为是replace('\t', ' ')的简写
In : '\tthis is my \tstring'.expandtabs()
Out: ' this is my string'
In : '\tthis is my \tstring'.expandtabs(4)
Out: ' this is my string'
translate & maketrans 将匹配的子串替换成新的指定字符串 S.translate(table)
In : tab = str.maketrans('abcde', '12345')
In : 'The match was abandoned because of bad weather'.translate(tab)
Out: 'Th5 m1t3h w1s 121n4on54 2531us5 of 214 w51th5r'
⭐️ format 将{}括起来的字符串替换为指定参数 S.format(*args, **kwargs)
In : 'this is {} {} string'.format('not', 'my')
Out: 'this is not my string'
In : 'this is {} {} string. it belongs to {who}.'.format('not', 'my', who='jason')
Out: 'this is not my string. it belongs to jason.'
format_map 功能与format类似,参数类型不同。format_map不支持position args S.format_map(mapping)
In : map = {'a': 'not', 'b': 'my', 'who': 'jason'}
In : 'this is {a} {b} string. it belongs to {who}.'.format_map(map)
Out: 'this is not my string. it belongs to jason.'
In : 'this is {a} {b} string. it belongs to {who}.'.format(**map)
Out: 'this is not my string. it belongs to jason.'
⭐️ split rsplit 将字符串按照指定字符分割成,返回一个数组 S.split(sep=None, maxsplit=-1)
参数 sep 指定分隔符,默认使用空格, maxsplit指定最大分组个数
In : 'this is not my string'.split()
Out: ['this', 'is', 'not', 'my', 'string']
# 指定分割符
In : 'this_is_not_my_string'.split('_')
Out: ['this', 'is', 'not', 'my', 'string']
# 指定分组大小
In : 'this is not my string'.split(maxsplit=2)
Out: ['this', 'is', 'not my string']
# rsplit从右边开始切分
In : 'this is not my string'.rsplit(maxsplit=2)
Out: ['this is not', 'my', 'string']
splitlines 将字符串按照换行符分割,返回一个数组 S.splitlines( keepends=False)
In : 'this is not my string\n second line'.splitlines()
Out: ['this is not my string', ' second line']
partition rpartition 将字符串按照指定字符分割成2半,返回一个元组 S.partition((sep)
In : 'this is not my string'.partition(' ')
Out: ('this', ' ', 'is not my string')
In : 'this is not my string'.rpartition(' ')
Out: ('this is not my', ' ', 'string')
⭐️ join 完成和split相反的动作,指定连接符将一个可迭代对象合并成一个字符串,返回一个数组 S.join(iterable)
In : ' '.join(['this', 'is', 'not', 'my', 'string'])
Out: 'this is not my string'
In : '^-^'.join(['this', 'is', 'not', 'my', 'string'])
Out: 'this^-^is^-^not^-^my^-^string'
encode 将字符串转换为不同格式的编码
In : '你好'.encode('gbk')
Out: b'\xc4\xe3\xba\xc3'
In : '你好'.encode('utf-8')
Out: b'\xe4\xbd\xa0\xe5\xa5\xbd'
ascii_letters | abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ |
ascii_lowercase | abcdefghijklmnopqrstuvwxyz |
digits | 0123456789 |
hexdigits | 0123456789abcdefABCDEF |
octdigits | 01234567 |
printable | 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c |
punctuation | !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~' |
whitespace | \t\n\r\x0b\x0c |