强化训练:第二篇
摘要:心好累.
问题来源
- 爬虫中会经常会遇到字符串的处理
主要内容
- 拆分字符串
- 字符串开头结尾
- 调整字符串格式
- 拼接字符串
- 字符串对齐
- 出掉不需要的字符
- 匹配字符
- 搜索字符
1.
拆分字符串
- 内置str.split(): 只能分割一个
- re.split(): 按模式进行分割
import re
data_one = "ab;cd|efg|hi,jkl|mn\topq;rst,uv\twx\t y\nz"
pattern = r";+|,+|\t+|\n+|\s+|\|+"
result = re.split(pattern, data_one)
print(result) #['ab', 'cd', 'efg', 'hi', 'jkl', 'mn', 'opq', 'rst', 'uv', 'wx', '', 'y', 'z']
2.
字符串开头结尾
- str.startswith()
- str.endswith()
filename = "learnpython.py"
print(filename.startswith("learn"))
print(filename.endswith(".py"))
3.
调整字符串格式
将2016-10-31
替换成31/10/2016
- re.sub():替换
A="2016-10-31"
print(re.sub(r"(?P\d{4})-(?P\d{2})-(?P\d{2})", r"\g/\g/\g", A))
#31/10/2016
4.
拼接字符串
-
- join
values = ["apple", 'orange', "pear", "banana"]
str_temp = ""
for i in values:
str_temp += i
print(str_temp) #appleorangepearbanana
str_other = ''.join(values) #appleorangepearbanana
str_one = "+".join(values) #apple+orange+pear+banana
str_two = "====".join((values)) #apple====orange====pear====banana
print(str_other, str_one, str_two)
5.
字符串对齐
- str.ljust()
- str.rjust()
- str.center()
- format()
sentence = 'Shanghai University'
print(sentence.ljust(50))
print(sentence.rjust(50))
print(sentence.center(50))
print(format(sentence, "<50"))
print(format(sentence, ">50"))
print(format(sentence, "^50"))
#Shanghai University
# Shanghai University
# Shanghai University
#Shanghai University
# Shanghai University
# Shanghai University
6.
出掉不需要的字符
- str.strip()
- str.lstrip()
- str.rstrip()
- re.sub()
words = '============Shanghai++++++University==============='
print(words.strip("=")) #Shanghai++++++University
print(words.lstrip("=")) #Shanghai++++++University===============
print(words.rstrip("=")) #============Shanghai++++++University
word_pattern = r'=+|\++'
print(re.sub(word_pattern, '', words)) #ShanghaiUniversity
7.
匹配字符
- re.match()
8.
搜索字符
- str.find()
- re.findall()
参考:[python cookbook]