Python计算信息熵

 计算信息熵的公式:n是类别数,p(xi)是第i类的概率

H = -\sum_{i=1}^{n} p(x_{i})log_{2}p(x_{i})

假设数据集有m行,即m个样本,每一行最后一列为该样本的标签,计算数据集信息熵的代码如下:

from math import log

def calcShannonEnt(dataSet):
    numEntries = len(dataSet) # 样本数
    labelCounts = {} # 该数据集每个类别的频数
    for featVec in dataSet:  # 对每一行样本
        currentLabel = featVec[-1] # 该样本的标签
        if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0
        labelCounts[currentLabel] += 1 
    shannonEnt = 0.0
    for key in labelCounts:
        prob = float(labelCounts[key])/numEntries # 计算p(xi)
        shannonEnt -= prob * log(prob, 2)  # log base 2
    return shannonEnt

 

你可能感兴趣的:(python,Python,信息熵)