由于Python的函数是对象,很多在其他语言中比较难的构造在Python中非常容易实现。假设我们正在进行数据清洗,需要将一些变形应用到下列字符串列表中:
states = ['Alabama','Geidsj!','Geidsj','FlOjY','south','duhwugdj##','West xishj']
import re
def clean_strings(strings):
result = []
for value in strings:
value = value.strip()
value = re.sub('[!#?]','',value)
value = value.title()
result.append(value)
return result
clean_strings(states)
Out[4]: ['Alabama', 'Geidsj', 'Geidsj', 'Flojy', 'South', 'Duhwugdj', 'West Xishj']
.strip() 方法用于移除字符串头尾指定的字符(默认为空)
.split()方法通过指定分隔符对字符串进行切片
re.sub(pattern,repl,string,count,flags)
其中:
官方文档:sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the Match object and must return
a replacement string to be used.
注意:要用.sub(),要import re!!!
.title():返回"标题化"的字符串,就是说所有单词都是以大写开始。
.append():该方法用于在列表末尾添加新的对象
菜鸟教程:有关re:Python3 正则表达式 | 菜鸟教程
有关字符串:Python 列表(List) | 菜鸟教程
def remove_punctuation(value):
return re.sub('[!#?]','',value)
clean_ops = [str.strip,remove_punctuation,str.title]
def clean_strings1(strings):
result = []
for value in strings:
value = value.strip()
value = re.sub('[!#?]','',value)
value = value.title()
result.append(value)
return result
clean_strings1(states)
Out[8]: ['Alabama', 'Geidsj', 'Geidsj', 'Flojy', 'South', 'Duhwugdj', 'West Xishj']
该方法使得clean_strings函数现在也具有更强的复用性和通用性。
可以将函数作为一个参数传给其他的函数,比如内建的map函数,可以将一个函数应用到一个序列上