熵值法python实现

在信息论中,熵是对不确定性的一种度量。不确定性越大,熵就越大,包含的信息量越大;不确定性越小,熵就越小,包含的信息量就越小。

根据熵的特性,可以通过计算熵值来判断一个事件的随机性及无序程度,也可以用熵值来判断某个指标的离散程度,指标的离散程度越大,该指标对综合评价的影响(权重)越大。比如样本数据在某指标下取值都相等,则该指标对总体评价的影响为0,权值为0.

熵权法是一种客观赋权法,因为它仅依赖于数据本身的离散性。但其实通过熵值法确定权重并不是特别合理。
熵值法python实现:

# -*- coding:utf-8 -*-
"""
@author: 1
@file: entropy_method3.py
@time: 2020/3/8 11:08
"""

# import
import pandas as pd
import numpy as np
import math
from numpy import array


def cal_weight(x):
    """
    @param x: 
    @return: df
    """
    # standardization
    x = x.apply(lambda x: ((x - np.min(x)) / (np.max(x) - np.min(x))))
    rows = x.index.size   
    cols = x.columns.size   
    k = 1.0 / math.log2(rows)

    # information entropy
    x = array(x)
    lnf = [[None] * cols for i in range(rows)]
    lnf = array(lnf)
    for i in range(0, rows):
        for j in range(0, cols):
            if x[i][j] == 0:
                lnfij = 0.0
            else:
                p = x[i][j] / x.sum(axis=0)[j]
                lnfij = math.log(p) * p * (-k)
            lnf[i][j] = lnfij
    lnf = pd.DataFrame(lnf)
    E = lnf

    # Calculate redundancy
    d = 1 - E.sum(axis=0)

    # Calculate the weight of each index
    w = [[None] * 1 for i in range(cols)]
    for j in range(0, cols):
        wj = d[j] / sum(d)
        w[j] = wj
    w = pd.DataFrame(w)
    return w


if __name__ == '__main__':
    data_end = pd.read_csv('py_challenge/data_end.csv', index_col=0)
    df = data_end[['pagerank_value', 'NA', 'FA']]
    w = cal_weight(df)
    w.index = df.columns
    w.columns = ['weight']
    print(w)

你可能感兴趣的:(数据挖掘专栏,机器学习,python)