朴素贝叶斯中报错UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 884: invalid start

在机器学习第四章朴素贝叶斯中进行垃圾邮件分类时,原语句:

wordList = textParse(open('email/spam/%d.txt' % i, 'rb').read())

报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 884: invalid start

改正方法:

第一种,修改为:wordList = textParse(open('email/spam/%d.txt' % i, 'rb').read().decode('utf8','ignore'))

第二种:在终端用vim分别打开文件,输入set,查看那个不是utf-8的编码,然后set fileencoding=utf-8,保存退出(这个方法没有试过)

参考的链接python编码错误解决:UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 884: invalid start_wiki347552913的博客-CSDN博客

你可能感兴趣的:(朴素贝叶斯中报错UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 884: invalid start)