Python中的遇到的错误(持续更新)

 

 

1、TypeError: 'dict_keys' object does not support indexing

    机器学习实战第三章决策树中遇到的,主要是Python的版本问题,下面这段是Python2的写法:

firstStr = myTree.keys()[0]

    Python3:先转换成list

firstStr = list(myTree.keys())[0]

2、TypeError: write() argument must be str, not bytes

    使用pickle存储的时候出现错误

    错误代码:

try:
    with open(fileName, 'w') as fw:
        pickle.dump(inputTree, fw)
except IOError as e:
    print("File Error : " + str(e))

    错误原因:pickle的存储方式默认是二进制

    修正:

try:
    with open(fileName, 'wb') as fw:
        pickle.dump(inputTree, fw)
except IOError as e:
    print("File Error : " + str(e))

3、UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 199: illegal multibyte sequence

  • 文件中包含了非法字符,gbk无法解析
def spamTest():
    docList = []
    classList = []
    fullList = []
    for i in range(1, 26):
        wordList = textParse(open('email/spam/%d.txt' % i).read())
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(1)
        wordList = textParse(open('email/ham/%d.txt' % i).read()) # 出错部分
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    trainingSet = list(range(50))
    testSet = []
    for i in range(10):
        randIndex = int(random.uniform(0, len(trainingSet)))
        testSet.append(trainingSet[randIndex])
        del trainingSet[randIndex]
    trainMat = []
    trainClasses = []
    for docIndex in trainingSet:
        trainMat.append(bayes.setOfWords2Vec(vocabList, docList[docIndex]))
        trainClasses.append(classList[docIndex])
    p0V, p1V, pSpam = bayes.trainNB0(array(trainMat), array(trainClasses))
    errorCount = 0
    for docIndex in testSet:
        wordVector = bayes.setOfWords2Vec(vocabList, docList[docIndex])
        if bayes.classifyNB(array(wordVector), p0V, p1V, pSpam) != classList[docIndex]:
            errorCount += 1
    print('the error rate is:', float(errorCount) / len(testSet))

1、尝试使用比gbk包含字符更多的gb18030,卒

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030').read())

2、忽略错误,通过

wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030', errors='ignore').read())

3、打开文件看看哪个是非法字符,我选择放弃
 

4、TypeError: 'range' object doesn't support item deletion

# spamTest():
def spamTest():
    docList = []
    classList = []
    fullList = []
    for i in range(1, 26):
        wordList = textParse(open('email/spam/%d.txt' % i, encoding='gb18030', errors='ignore').read())
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(1)
        wordList = textParse(open('email/ham/%d.txt' % i, encoding='gb18030', errors='ignore').read())
        docList.append(wordList)
        fullList.extend(wordList)
        classList.append(0)
    vocabList = bayes.createVocabList(docList)
    trainingSet = range(50) # 需要修改部分
    testSet = []
    for i in range(10):
        randIndex = int(random.uniform(0, len(trainingSet)))
        testSet.append(trainingSet[randIndex])
        del trainingSet[randIndex] # 出错代码部分
    trainMat = []
    trainClasses = []
    for docIndex in trainingSet:
        trainMat.append(bayes.setOfWords2Vec(vocabList, docList[docIndex]))
        trainClasses.append(classList[docList])
    p0V, p1V, pSpam = bayes.trainNB0(array(trainMat), array(trainClasses))
    errorCount = 0
    for docIndex in testSet:
        wordVector = bayes.setOfWords2Vec(vocabList, docList[docIndex])
        if bayes.classifyNB(array(wordVector), p0V, p1V, pSpam) != classList[docIndex]:
            errorCount += 1
    print('the error rate is:', float(errorCount) / len(testSet))

python3.x , 出现错误 'range' object doesn't support item deletion

原因:python3.x   range返回的是range对象,不返回数组对象

解决方法:

把 trainingSet = range(50) 改为 trainingSet = list(range(50))

5、TypeError: 'numpy.float64' object cannot be interpreted as an integer

出错代码:随机梯度上升算法

# 随机梯度上升算法
def stocGradAscent0(dataMatrix, classLabels):

    m, n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i] * weights))
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights

出错原因:error 是一个float64,

weights :

dataMatrix[i] :

在Python中,如果是一个整型n乘以一个列表L, 列表长度会变成n*len(L),而当你用一个浮点数乘以一个列表,自然而然也就出错了,而且我们要的也不是这个结果,而是对于当前向量的每一位乘上一个error。

其实这地方就是Python 中的list和numpy的array混用的问题,对dataMatrix进行强制类型转换就行了(也可以在参数传递之前进行转换,吐槽Python的类型机制)

# 随机梯度上升算法
def stocGradAscent0(dataMatrix, classLabels):
    # 强制类型转换,避免array和list混用
    dataMatrix = array(dataMatrix)
    m, n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i] * weights))
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights

6. copy和copy.deepcopy

copy对于一个复杂对象的子对象并不会完全复制,什么是复杂对象的子对象呢?就比如序列里的嵌套序列,字典里的嵌套序列等都是复杂对象的子对象。对于子对象,python会把它当作一个公共镜像存储起来,所有对他的复制都被当成一个引用,所以说当其中一个引用将镜像改变了之后另一个引用使用镜像的时候镜像已经被改变了。

deepcopy的时候会将复杂对象的每一层复制一个单独的个体出来。 

7. /和//

python3

/ 保留小数位, 3/2 = 1.5; 2/2 = 1.0

// floor(), 3/2 = 1 2//2 = 1

 

你可能感兴趣的:(Python,踩过的坑)