Python读取有空行的txt文件+将内容分割保存到列表中

  txt文件长这样:
Python读取有空行的txt文件+将内容分割保存到列表中_第1张图片
  我们想要将所有单词读取出来并存储到list当中,需要经历以下几个步骤:

  1. 先将空行去掉
data = open(r'E:\Program Files\PyCharm 2019.2\machinelearning\homework\Emails\Training\spam\3.txt')
    cab = []
    for line in data.readlines():
        cab.append(line.strip().split(','))
    print(cab)

输出cab:

[[‘You Have Everything To Gain!’], [’’], [‘Incredib1e gains in length of 3-4 inches to yourPenis’, ’ PERMANANTLY’], [’’], [‘Amazing increase in thickness of yourPenis’, ’ up to 30%’], [‘BetterEjacu1ation control’], [‘Experience Rock-HardErecetions’], [‘Explosive’, ’ intenseOrgasns’], [‘Increase volume ofEjacu1ate’], [‘Doctor designed and endorsed’], [‘100% herbal’, ’ 100% Natural’, ’ 100% Safe’], [‘The proven NaturalPenisEnhancement that works!’], [‘100% MoneyBack Guaranteeed’]]

可以看到cab[1]为一个异常值。

  1. 除掉类似cab[1]这样的异常值
cab_f=[]
for i in range(len(cab)):
    for j in range(len(cab[i])):
        if cab[i][j] != '':
            cab_f.append(cab[i][j].strip())

输出cab_f:

[‘You Have Everything To Gain!’, ‘Incredib1e gains in length of 3-4 inches to yourPenis’, ‘PERMANANTLY’, ‘Amazing increase in thickness of yourPenis’, ‘up to 30%’, ‘BetterEjacu1ation control’, ‘Experience Rock-HardErecetions’, ‘Explosive’, ‘intenseOrgasns’, ‘Increase volume ofEjacu1ate’, ‘Doctor designed and endorsed’, ‘100% herbal’, ‘100% Natural’, ‘100% Safe’, ‘The proven NaturalPenisEnhancement that works!’, ‘100% MoneyBack Guaranteeed’]

可以看到我们将list的维数变成了一维,且除去了异常值。

  1. 分割单词
cab_final = []
    for i in cab_f:
        for j in i.split(' '):
            cab_final.append(j)

输出cab_final:

[‘You’, ‘Have’, ‘Everything’, ‘To’, ‘Gain!’, ‘Incredib1e’, ‘gains’, ‘in’, ‘length’, ‘of’, ‘3-4’, ‘inches’, ‘to’, ‘yourPenis’, ‘PERMANANTLY’, ‘Amazing’, ‘increase’, ‘in’, ‘thickness’, ‘of’, ‘yourPenis’, ‘up’, ‘to’, ‘30%’, ‘BetterEjacu1ation’, ‘control’, ‘Experience’, ‘Rock-HardErecetions’, ‘Explosive’, ‘intenseOrgasns’, ‘Increase’, ‘volume’, ‘ofEjacu1ate’, ‘Doctor’, ‘designed’, ‘and’, ‘endorsed’, ‘100%’, ‘herbal’, ‘100%’, ‘Natural’, ‘100%’, ‘Safe’, ‘The’, ‘proven’, ‘NaturalPenisEnhancement’, ‘that’, ‘works!’, ‘100%’, ‘MoneyBack’, ‘Guaranteeed’]

可以看到,得到了我们想要的结果!!!

完整代码:

def read_txt():
    data = open(r'E:\Program Files\PyCharm 2019.2\machinelearning\homework\Emails\Training\spam\3.txt')
    cab = []
    for line in data.readlines():
        cab.append(line.strip().split(','))
    cab_f = []
    for i in range(len(cab)):
        for j in range(len(cab[i])):
            if cab[i][j] != '':
                cab_f.append(cab[i][j].strip())
    cab_final = []
    for i in cab_f:
        for j in i.split(' '):
            cab_final.append(j)
    return cab_final


if __name__=='__main__':
    print(read_txt())

你可能感兴趣的:(Python)