python字符串怎么处理_python 正确字符串处理(自己踩过的坑)

不管是谁,只要处理过由用户提交的调查数据,就能明白这种乱七八糟的数据是怎么一回事。为了得到一组能用于分析工作的格式统一的字符串,需要做很多事情:去除空白符、删除各种标点符号、正确的大写格式等。做法之一是使用内建的字符串方法和正则表达式re模块:

一般写法

states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',

'south carolina##', 'West virginia?']

import re

def clean_strings(strings): # 一般对数据的处理步骤

result = []

for value in strings:

value = value.strip()

value = re.sub('[!#?]', '', value)

value = value.title()

result.append(value)

return result

In [173]: clean_strings(states)

Out[173]:

['Alabama',

'Georgia',

'Georgia',

'Georgia',

'Florida',

'South Carolina',

'West Virginia']

推荐写法

def remove_punctuation(value):

return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title] # 函数也是对象

def clean_strings(strings, ops):

result = []

for value in strings:

for function in ops:

value = function(value)

result.append(value)

return result

In [175]: clean_strings(states, clean_ops)

Out[175]:

['Alabama',

'Georgia',

'Georgia',

'Georgia',

'Florida',

'South Carolina',

'West Virginia']

# 或者

In [176]: for x in map(remove_punctuation, states): #

.....: print(x)

Alabama

Georgia

Georgia

georgia

FlOrIda

south carolina

West virginia

你可能感兴趣的:(python字符串怎么处理)