python 3.5 报错TypeError:can't concat str to bytes 、TypeError: write() argument must be str, not bytes

原程序:
out = open('train_data.txt', 'w')
for sentence in sentences:
    out.write(sentence.encode("utf-8")+"\n")
print("done!")

报错:TypeError:can't concat str to bytes

修改为:

    out.write(sentence.encode("utf-8")+b"\n")

此错误消失,原因:encode返回的是bytes型的数据,不可以和str相加,将‘\n’前加b


新的错误:TypeError: write() argument must be str, not bytes

修改为:

    out.write(str(sentence.encode("utf-8")+b"\n"))

原因:write函数参数需要为str类型,需转化为str


后续:

发现写入的中文文本格式是:\xc2\xa0\n\xef\xbc\x88\xe8\x8b\xb1\xe5\x9b\xbd\xe5\x8f\x91\xe9\x9f\xb3\xef\xbc\x

修改程序为:

out = open('train_data.txt', 'w',encoding='utf-8')
for sentence in sentences:
    out.write(sentence+"\n")
print("done!")

所有错误消失!

原因:

在windows下面,新文件的默认编码是gbk,python解释器会用gbk编码去解析我们的网络数据流txt,然而txt此时已经是decode过的unicode编码,这样的话就会导致解析不了,解决的办法就是,改变目标文件的编码:



参考:

https://blog.csdn.net/dawei_01/article/details/79569466

https://stackoverflow.com/questions/40740150/python-3-cant-concat-bytes-to-str-for-a-list

https://www.imooc.com/qadetail/227268

你可能感兴趣的:(python)