特征选择之Fisher Score算法思想及其python代码实现

一、算法思想

1、特征选择

特征选择是去除无关紧要庸余的特征,仍然还保留其他原始特征,从而获得特征子集,从而以最小的性能损失更好地描述给出的问题。

特征选择方法可以分为三个系列:过滤式选择、包裹式选择和嵌入式选择的方法 。

本文介绍的Fisher Score即为过滤式的特征选择算法。

关于过滤式的特征算法系列,可参考我的其他文章。
特征选择之卡方检验

特征选择之互信息

2、Fisher score

特征选择中的Fisher Score

Fisher Score是特征选择的有效方法之一, 其主要思想是鉴别性能较强的特征表现为类内距离尽可能小, 类间距离尽可能大

这个很好理解,在我们现实生活中也是如此,例如同一年龄层面的人间更有话题,而不同年龄层面的人之间就有代沟:同一个省市的人在一起就是老乡见老乡,不同省市的人之间可能就会有偏见与歧视。统一年龄层面的人属于同一类别,同一个省市的人属于同一类别,他们之间的类内距离更小:而不同年龄层,不同省市的人属于不同的类别,他们之间的类间距离更大。

有了这个例子的理解,我们继续往下看。

详细地说,给定一个 特征集合d,用 S 表示,fisher score 过滤式的特征选择的目标是选择一个特征子集m(m

T ∗ = arg ⁡ max ⁡ T ⊆ S F ( T ) {}\mathcal{T}^{*}=\arg \max _{\mathcal{T} \subseteq \mathcal{S}} F(\mathcal{T}) T=argmaxTSF(T) , s.t. ∣ T ∣ = m |\mathcal{T}|=m T=m

其中 s.t.表示约束条件,| · |是集合的个数。

这个方程是一个组合优化问题。为 这个方程找到全局最佳解决方案是一个NP难度的问题。一种常见的启发式方法解决了这个问题,方法是首先根据标准 F 独立计算每个特征的分数,然后选择得分最高的 前m个特征。但是,启发式算法选择的特征也有缺点。一方面,由于启发式算法单独计算每个特征的分数,因此忽略了特征的组合,即将两个或多个特征合在一起进行评估。例如,特征 a 和特征 b 的fisher分数可能都很低,但组合 ab 的fisher分数非常高。在这种情况下,启发式算法将丢弃特征 a 和 b,尽管原则上应该选择它们。另一方面,它们无法处理冗余特征。例如,特征 a 和特征 b 的fisher分数非常高,但它们具有很强的相关性。在这种情况下,启发式算法将同时选择特征 a 和 b,而特征 a 或 b 可以被排除并且对后续的学习性能没有损失。

以上内容来自论文

Gu, Q., Z. Li, J. Han, Q. Gu, and Z. Li, 2012: Generalized fisher
score for feature selection. Uai.

由于解决这个NP难度的问题太复杂。本文主要是运用启发式的规则来单独计算每个特征的Fisher Score。

接下来就是Fisher Score的计算规则,需要一定的理解。

定义数据集中共有n个样本属于C个类ω1, ω2…, ωC, 每一类分别包含 n i n_i ni个样本。结合下面这个图表来理解。

特征1 特征2 特征3
样本1 2 1 3 0
样本2 4 5 7 1
样本3 7 3 0 0
样本4 9 2 5 0
样本5 18 5 3 1

这张图中的数据集中共有5个样本,属于两个类0、1。

0类包含样本1、样本3和样本4总共三个样本。
1类包含样本2和样本5总共两个样本。

定义 x ( k ) x^{(k)} x(k)表示样本x在第k个特征上的取值, m i ( k ) m_{i} ^{(k)} mi(k)表示第i类样本在第k个特征上的取值的均值, m ( k ) m ^{(k)} m(k)表示所有类别的样本在第k个特征上的取值的均值。(以上定义需要准确理解,若还是不太理解,继续看下面我结合上面的图表来理解。)

定义第k个特征在数据集上的类间方差为 S B ( k ) S_{B}^{(k)} SB(k),有
S B ( k ) = ∑ i = 1 C n i n ( m i ( k ) − m ( k ) ) 2 S_{B}^{(k)}=\sum_{i=1}^{C} \frac{n_{i}}{n}\left(m_{i}^{(k)}-m^{(k)}\right)^{2} SB(k)=i=1Cnni(mi(k)m(k))2

定义第k个特征在数据集上的类内方差为 S w ( k ) S_{w}^{(k)} Sw(k),有
S w ( k ) = 1 n ∑ i = 1 C ∑ x ∈ ω i ( x ( k ) − m i ( k ) ) 2 S_{w}^{(k)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(k)}-m_{i}^{(k)}\right)^{2} Sw(k)=n1i=1Cxωi(x(k)mi(k))2

最后我们定义第k个特征在数据集上的Fisher Score为 J f i s h e r ( k ) J_{fisher}(k) Jfisher(k),有

J f i s h e r ( k ) = S B ( k ) S w ( k ) J_{fisher}(k) =\frac {S_{B}^{(k)} }{ S_{w}^{(k)}} Jfisher(k)=Sw(k)SB(k)

这样,我们就可以求得第k个特征的Fisher Score。

ok,我们以上面那张图为例子,来求特征1的Fisher Score。

特征1 特征2 特征3
样本1 2 1 3 0
样本2 4 5 7 1
样本3 7 3 0 0
样本4 9 2 5 0
样本5 18 5 3 1

首先看
S B ( k ) = ∑ i = 1 C n i n ( m i ( k ) − m ( k ) ) 2 S_{B}^{(k)}=\sum_{i=1}^{C} \frac{n_{i}}{n}\left(m_{i}^{(k)}-m^{(k)}\right)^{2} SB(k)=i=1Cnni(mi(k)m(k))2

其中n表示样本数,这里为5。 n i n_i ni表示第i类样本的个数。这里,0类样本的个数为3, 1类样本的个数为2。

m i ( k ) m_{i} ^{(k)} mi(k)表示第i类样本在第k个特征上的取值的均值。

那么 m 0 ( 1 ) m_{0} ^{(1)} m0(1)表示第0类样本在第1个特征上的取值的均值,由上图可知= ( 2 + 7 + 9 ) 3 \frac{(2+7+9)} {3} 32+7+9 =6

m 1 ( 1 ) m_{1} ^{(1)} m1(1)表示第1类样本在第1个特征上的取值的均值,由上图可知= ( 4 + 18 ) 2 \frac{(4+18)} {2} 24+18 = 11

m ( k ) m ^{(k)} m(k)表示所有类别的样本在第k个特征上的取值的均值

那么 m ( 1 ) m ^{(1)} m(1)表示所有类别的样本在第1个特征上的取值的均值,则
m ( 1 ) m ^{(1)} m(1)= ( 2 + 4 + 7 + 9 + 18 ) 5 \frac{(2+4+7+9+18)} {5} 52+4+7+9+18 = 8

且这里n=5, n 0 n_0 n0=3, n 1 n_1 n1=2

S B ( 1 ) S_{B}^{(1)} SB(1)表示特征1的类间方差,则
S B ( 1 ) S_{B}^{(1)} SB(1)= n 0 n ( m 0 ( 1 ) − m ( k ) ) 2 \frac {n_0} {n}(m_{0} ^{(1)} - m ^{(k)})^2 nn0(m0(1)m(k))2 + n 1 n ( m 1 ( 1 ) − m ( k ) ) 2 \frac {n_1} {n}(m_{1} ^{(1)} - m ^{(k)})^2 nn1(m1(1)m(k))2=
3 5 ∗ ( 6 − 8 ) 2 \frac{3} {5} * (6 - 8)^2 5368)2 + 2 5 ∗ ( 11 − 8 ) 2 \frac{2} {5} * (11- 8)^2 52118)2= 12 5 \frac{12} {5} 512 + 18 5 \frac{18} {5} 518=6。

再来看
S w ( k ) = 1 n ∑ i = 1 C ∑ x ∈ ω i ( x ( k ) − m i ( k ) ) 2 S_{w}^{(k)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(k)}-m_{i}^{(k)}\right)^{2} Sw(k)=n1i=1Cxωi(x(k)mi(k))2

这个复杂点。

首先n=5。

x ( k ) x^{(k)} x(k)表示样本x在第k个特征上的取值
则, x ( 1 ) x^{(1)} x(1)表示样本x在第特征1的取值。由图可知分别为:2、4 、7 、9 、18。

而当 x ∈ ω 0 x \in \omega_{0} xω0,即样本属于0类样本时,
x ( 1 ) x^{(1)} x(1)(当 x ∈ ω 0 x \in \omega_{0} xω0)为2、7 、9。

同理, x ( 1 ) x^{(1)} x(1)(当 x ∈ ω 1 x \in \omega_{1} xω1)为4、18。

且由上面已经算得 m 0 ( 1 ) m_{0} ^{(1)} m0(1)=6, m 1 ( 1 ) m_{1} ^{(1)} m1(1)=11。

S w ( 1 ) S_{w}^{(1)} Sw(1)表示特征1的类内方差,则
S w ( 1 ) = 1 n ∑ i = 1 C ∑ x ∈ ω i ( x ( 1 ) − m i ( 1 ) ) 2 S_{w}^{(1)}=\frac{1}{n} \sum_{i=1}^{C} \sum_{x \in \omega_{i}}\left(x^{(1)}-m_{i}^{(1)}\right)^{2} Sw(1)=n1i=1Cxωi(x(1)mi(1))2= 1 5 ∗ ( ( 2 − 6 ) 2 + ( 7 − 6 ) 2 + ( 9 − 6 ) 2 + ( 4 − 11 ) 2 + ( 18 − 11 ) 2 ) \frac{1}{5}*( (2 - 6)^2+(7-6)^2 + (9-6)^2 +(4-11)^2 + (18-11)^2) 51((26)2+(76)2+(96)2+(411)2+(1811)2)= 124 5 \frac{124}{5} 5124

终于,我们计算出了特征1的Fisher Score,
J f i s h e r ( 1 ) = S B ( 1 ) S w ( 1 ) J_{fisher}(1) =\frac {S_{B}^{(1)} }{ S_{w}^{(1)}} Jfisher(1)=Sw(1)SB(1)= 6 124 / 5 \frac{6} {124/5} 124/56= 15 62 \frac{15}{62} 6215

当我们重复上面的计算规则,计算出特征2,特征3的Fisher Score后就算大功告成了!

ok,这里我感觉我是讲解得很清楚了详细了,如果还是不太懂的话,可以重新认真看看上面计算的过程。认真计算后,你就能体会到为什么一个是叫类间方差,一个叫类内方差。

最后,我们运用的是启发式的规则来单独计算每个特征的Fisher Score,因此我们需要计算出数据集中所有的特征的Fisher Score,进行排名。

而Fisher Score的主要思想是鉴别性能较强的特征表现为类内距离尽可能小, 类间距离尽可能大

那么当类间方差越大,类内方差越小时,Fisher Score就越大。因此排名是根据从大到小的顺序进行排名。排名越靠前的特征在理论上鉴别能力越强。(当然存在我前文所叙述的缺点)

二、代码实现

1、github上大佬实现的代码

我整理了下,代码如下:
首先是construct_W.py

import numpy as np
from scipy.sparse import *
from sklearn.metrics.pairwise import pairwise_distances


def construct_W(X, **kwargs):
    """
    Construct the affinity matrix W through different ways
    Notes
    -----
    if kwargs is null, use the default parameter settings;
    if kwargs is not null, construct the affinity matrix according to parameters in kwargs
    Input
    -----
    X: {numpy array}, shape (n_samples, n_features)
        input data
    kwargs: {dictionary}
        parameters to construct different affinity matrix W:
        y: {numpy array}, shape (n_samples, 1)
            the true label information needed under the 'supervised' neighbor mode
        metric: {string}
            choices for different distance measures
            'euclidean' - use euclidean distance
            'cosine' - use cosine distance (default)
        neighbor_mode: {string}
            indicates how to construct the graph
            'knn' - put an edge between two nodes if and only if they are among the
                    k nearest neighbors of each other (default)
            'supervised' - put an edge between two nodes if they belong to same class
                    and they are among the k nearest neighbors of each other
        weight_mode: {string}
            indicates how to assign weights for each edge in the graph
            'binary' - 0-1 weighting, every edge receives weight of 1 (default)
            'heat_kernel' - if nodes i and j are connected, put weight W_ij = exp(-norm(x_i - x_j)/2t^2)
                            this weight mode can only be used under 'euclidean' metric and you are required
                            to provide the parameter t
            'cosine' - if nodes i and j are connected, put weight cosine(x_i,x_j).
                        this weight mode can only be used under 'cosine' metric
        k: {int}
            choices for the number of neighbors (default k = 5)
        t: {float}
            parameter for the 'heat_kernel' weight_mode
        fisher_score: {boolean}
            indicates whether to build the affinity matrix in a fisher score way, in which W_ij = 1/n_l if yi = yj = l;
            otherwise W_ij = 0 (default fisher_score = false)
        reliefF: {boolean}
            indicates whether to build the affinity matrix in a reliefF way, NH(x) and NM(x,y) denotes a set of
            k nearest points to x with the same class as x, and a different class (the class y), respectively.
            W_ij = 1 if i = j; W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y) (default reliefF = false)
    Output
    ------
    W: {sparse matrix}, shape (n_samples, n_samples)
        output affinity matrix W
    """

    # default metric is 'cosine'
    if 'metric' not in kwargs.keys():
        kwargs['metric'] = 'cosine'

    # default neighbor mode is 'knn' and default neighbor size is 5
    if 'neighbor_mode' not in kwargs.keys():
        kwargs['neighbor_mode'] = 'knn'
    if kwargs['neighbor_mode'] == 'knn' and 'k' not in kwargs.keys():
        kwargs['k'] = 5
    if kwargs['neighbor_mode'] == 'supervised' and 'k' not in kwargs.keys():
        kwargs['k'] = 5
    if kwargs['neighbor_mode'] == 'supervised' and 'y' not in kwargs.keys():
        print ('Warning: label is required in the supervised neighborMode!!!')
        exit(0)

    # default weight mode is 'binary', default t in heat kernel mode is 1
    if 'weight_mode' not in kwargs.keys():
        kwargs['weight_mode'] = 'binary'
    if kwargs['weight_mode'] == 'heat_kernel':
        if kwargs['metric'] != 'euclidean':
            kwargs['metric'] = 'euclidean'
        if 't' not in kwargs.keys():
            kwargs['t'] = 1
    elif kwargs['weight_mode'] == 'cosine':
        if kwargs['metric'] != 'cosine':
            kwargs['metric'] = 'cosine'

    # default fisher_score and reliefF mode are 'false'
    if 'fisher_score' not in kwargs.keys():
        kwargs['fisher_score'] = False
    if 'reliefF' not in kwargs.keys():
        kwargs['reliefF'] = False

    n_samples, n_features = np.shape(X)

    # choose 'knn' neighbor mode
    if kwargs['neighbor_mode'] == 'knn':
        k = kwargs['k']
        if kwargs['weight_mode'] == 'binary':
            if kwargs['metric'] == 'euclidean':
                # compute pairwise euclidean distances
                D = pairwise_distances(X)
                D **= 2
                # sort the distance matrix D in ascending order
                dump = np.sort(D, axis=1)
                idx = np.argsort(D, axis=1)
                # choose the k-nearest neighbors for each instance
                idx_new = idx[:, 0:k+1]
                G = np.zeros((n_samples*(k+1), 3))
                G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
                G[:, 1] = np.ravel(idx_new, order='F')
                G[:, 2] = 1
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

            elif kwargs['metric'] == 'cosine':
                # normalize the data first
                X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
                for i in range(n_samples):
                    X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
                # compute pairwise cosine distances
                D_cosine = np.dot(X, np.transpose(X))
                # sort the distance matrix D in descending order
                dump = np.sort(-D_cosine, axis=1)
                idx = np.argsort(-D_cosine, axis=1)
                idx_new = idx[:, 0:k+1]
                G = np.zeros((n_samples*(k+1), 3))
                G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
                G[:, 1] = np.ravel(idx_new, order='F')
                G[:, 2] = 1
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

        elif kwargs['weight_mode'] == 'heat_kernel':
            t = kwargs['t']
            # compute pairwise euclidean distances
            D = pairwise_distances(X)
            D **= 2
            # sort the distance matrix D in ascending order
            dump = np.sort(D, axis=1)
            idx = np.argsort(D, axis=1)
            idx_new = idx[:, 0:k+1]
            dump_new = dump[:, 0:k+1]
            # compute the pairwise heat kernel distances
            dump_heat_kernel = np.exp(-dump_new/(2*t*t))
            G = np.zeros((n_samples*(k+1), 3))
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
            G[:, 1] = np.ravel(idx_new, order='F')
            G[:, 2] = np.ravel(dump_heat_kernel, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

        elif kwargs['weight_mode'] == 'cosine':
            # normalize the data first
            X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
            for i in range(n_samples):
                    X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
            # compute pairwise cosine distances
            D_cosine = np.dot(X, np.transpose(X))
            # sort the distance matrix D in ascending order
            dump = np.sort(-D_cosine, axis=1)
            idx = np.argsort(-D_cosine, axis=1)
            idx_new = idx[:, 0:k+1]
            dump_new = -dump[:, 0:k+1]
            G = np.zeros((n_samples*(k+1), 3))
            G[:, 0] = np.tile(np.arange(n_samples), (k+1, 1)).reshape(-1)
            G[:, 1] = np.ravel(idx_new, order='F')
            G[:, 2] = np.ravel(dump_new, order='F')
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

    # choose supervised neighborMode
    elif kwargs['neighbor_mode'] == 'supervised':
        k = kwargs['k']
        # get true labels and the number of classes
        y = kwargs['y']
        label = np.unique(y)
        n_classes = np.unique(y).size
        # construct the weight matrix W in a fisherScore way, W_ij = 1/n_l if yi = yj = l, otherwise W_ij = 0
        if kwargs['fisher_score'] is True:
            W = lil_matrix((n_samples, n_samples))
            for i in range(n_classes):
                class_idx = (y == label[i])
                class_idx_all = (class_idx[:, np.newaxis] & class_idx[np.newaxis, :])
                W[class_idx_all] = 1.0/np.sum(np.sum(class_idx))
            return W

        # construct the weight matrix W in a reliefF way, NH(x) and NM(x,y) denotes a set of k nearest
        # points to x with the same class as x, a different class (the class y), respectively. W_ij = 1 if i = j;
        # W_ij = 1/k if x_j \in NH(x_i); W_ij = -1/(c-1)k if x_j \in NM(x_i, y)
        if kwargs['reliefF'] is True:
            # when xj in NH(xi)
            G = np.zeros((n_samples*(k+1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                D = pairwise_distances(X[class_idx, :])
                D **= 2
                idx = np.argsort(D, axis=1)
                idx_new = idx[:, 0:k+1]
                n_smp_class = (class_idx[idx_new[:]]).size
                if len(class_idx) <= k:
                    k = len(class_idx) - 1
                G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                G[id_now:n_smp_class+id_now, 2] = 1.0/k
                id_now += n_smp_class
            W1 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            # when i = j, W_ij = 1
            for i in range(n_samples):
                W1[i, i] = 1
            # when x_j in NM(x_i, y)
            G = np.zeros((n_samples*k*(n_classes - 1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx1 = np.column_stack(np.where(y == label[i]))[:, 0]
                X1 = X[class_idx1, :]
                for j in range(n_classes):
                    if label[j] != label[i]:
                        class_idx2 = np.column_stack(np.where(y == label[j]))[:, 0]
                        X2 = X[class_idx2, :]
                        D = pairwise_distances(X1, X2)
                        idx = np.argsort(D, axis=1)
                        idx_new = idx[:, 0:k]
                        n_smp_class = len(class_idx1)*k
                        G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx1, (k, 1)).reshape(-1)
                        G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx2[idx_new[:]], order='F')
                        G[id_now:n_smp_class+id_now, 2] = -1.0/((n_classes-1)*k)
                        id_now += n_smp_class
            W2 = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W2) > W2
            W2 = W2 - W2.multiply(bigger) + np.transpose(W2).multiply(bigger)
            W = W1 + W2
            return W

        if kwargs['weight_mode'] == 'binary':
            if kwargs['metric'] == 'euclidean':
                G = np.zeros((n_samples*(k+1), 3))
                id_now = 0
                for i in range(n_classes):
                    class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                    # compute pairwise euclidean distances for instances in class i
                    D = pairwise_distances(X[class_idx, :])
                    D **= 2
                    # sort the distance matrix D in ascending order for instances in class i
                    idx = np.argsort(D, axis=1)
                    idx_new = idx[:, 0:k+1]
                    n_smp_class = len(class_idx)*(k+1)
                    G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                    G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                    G[id_now:n_smp_class+id_now, 2] = 1
                    id_now += n_smp_class
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

            if kwargs['metric'] == 'cosine':
                # normalize the data first
                X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
                for i in range(n_samples):
                    X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
                G = np.zeros((n_samples*(k+1), 3))
                id_now = 0
                for i in range(n_classes):
                    class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                    # compute pairwise cosine distances for instances in class i
                    D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
                    # sort the distance matrix D in descending order for instances in class i
                    idx = np.argsort(-D_cosine, axis=1)
                    idx_new = idx[:, 0:k+1]
                    n_smp_class = len(class_idx)*(k+1)
                    G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                    G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                    G[id_now:n_smp_class+id_now, 2] = 1
                    id_now += n_smp_class
                # build the sparse affinity matrix W
                W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
                bigger = np.transpose(W) > W
                W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
                return W

        elif kwargs['weight_mode'] == 'heat_kernel':
            G = np.zeros((n_samples*(k+1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                # compute pairwise cosine distances for instances in class i
                D = pairwise_distances(X[class_idx, :])
                D **= 2
                # sort the distance matrix D in ascending order for instances in class i
                dump = np.sort(D, axis=1)
                idx = np.argsort(D, axis=1)
                idx_new = idx[:, 0:k+1]
                dump_new = dump[:, 0:k+1]
                t = kwargs['t']
                # compute pairwise heat kernel distances for instances in class i
                dump_heat_kernel = np.exp(-dump_new/(2*t*t))
                n_smp_class = len(class_idx)*(k+1)
                G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_heat_kernel, order='F')
                id_now += n_smp_class
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

        elif kwargs['weight_mode'] == 'cosine':
            # normalize the data first
            X_normalized = np.power(np.sum(X*X, axis=1), 0.5)
            for i in range(n_samples):
                X[i, :] = X[i, :]/max(1e-12, X_normalized[i])
            G = np.zeros((n_samples*(k+1), 3))
            id_now = 0
            for i in range(n_classes):
                class_idx = np.column_stack(np.where(y == label[i]))[:, 0]
                # compute pairwise cosine distances for instances in class i
                D_cosine = np.dot(X[class_idx, :], np.transpose(X[class_idx, :]))
                # sort the distance matrix D in descending order for instances in class i
                dump = np.sort(-D_cosine, axis=1)
                idx = np.argsort(-D_cosine, axis=1)
                idx_new = idx[:, 0:k+1]
                dump_new = -dump[:, 0:k+1]
                n_smp_class = len(class_idx)*(k+1)
                G[id_now:n_smp_class+id_now, 0] = np.tile(class_idx, (k+1, 1)).reshape(-1)
                G[id_now:n_smp_class+id_now, 1] = np.ravel(class_idx[idx_new[:]], order='F')
                G[id_now:n_smp_class+id_now, 2] = np.ravel(dump_new, order='F')
                id_now += n_smp_class
            # build the sparse affinity matrix W
            W = csc_matrix((G[:, 2], (G[:, 0], G[:, 1])), shape=(n_samples, n_samples))
            bigger = np.transpose(W) > W
            W = W - W.multiply(bigger) + np.transpose(W).multiply(bigger)
            return W

然后是主函数,

import numpy as np
from scipy.sparse import *
from construct_W import construct_W


def fisher_score(X, y):
    """
    This function implements the fisher score feature selection, steps are as follows:
    1. Construct the affinity matrix W in fisher score way
    2. For the r-th feature, we define fr = X(:,r), D = diag(W*ones), ones = [1,...,1]', L = D - W
    3. Let fr_hat = fr - (fr'*D*ones)*ones/(ones'*D*ones)
    4. Fisher score for the r-th feature is score = (fr_hat'*D*fr_hat)/(fr_hat'*L*fr_hat)-1
    Input
    -----
    X: {numpy array}, shape (n_samples, n_features)
        input data
    y: {numpy array}, shape (n_samples,)
        input class labels
    Output
    ------
    score: {numpy array}, shape (n_features,)
        fisher score for each feature
    Reference
    ---------
    He, Xiaofei et al. "Laplacian Score for Feature Selection." NIPS 2005.
    Duda, Richard et al. "Pattern classification." John Wiley & Sons, 2012.
    """

    # Construct weight matrix W in a fisherScore way
    kwargs = {"neighbor_mode": "supervised", "fisher_score": True, 'y': y}
    W = construct_W(X, **kwargs)

    # build the diagonal D matrix from affinity matrix W
    D = np.array(W.sum(axis=1))
    L = W
    tmp = np.dot(np.transpose(D), X)
    D = diags(np.transpose(D), [0])
    Xt = np.transpose(X)
    t1 = np.transpose(np.dot(Xt, D.todense()))
    t2 = np.transpose(np.dot(Xt, L.todense()))
    # compute the numerator of Lr
    D_prime = np.sum(np.multiply(t1, X), 0) - np.multiply(tmp, tmp)/D.sum()
    # compute the denominator of Lr
    L_prime = np.sum(np.multiply(t2, X), 0) - np.multiply(tmp, tmp)/D.sum()
    # avoid the denominator of Lr to be 0
    D_prime[D_prime < 1e-12] = 10000
    lap_score = 1 - np.array(np.multiply(L_prime, 1/D_prime))[0, :]

    # compute fisher score from laplacian score, where fisher_score = 1/lap_score - 1
    score = 1.0/lap_score - 1
    return np.transpose(score)


def feature_ranking(score):
    """
    Rank features in descending order according to fisher score, the larger the fisher score, the more important the
    feature is
    """
    idx = np.argsort(score, 0)
    return idx[::-1]

labels_pred=[[2,2,3,4,5,7],[3,0,1,1,2,4],[4,5,6,9,7,10]]
labels_true=[0,0,1]
z=fisher_score(labels_pred,labels_true)
print(z)

输出:

[3.         5.33333333 5.33333333 6.25925926 1.81481481 3.        ]

可以看到,输入参数为特征矩阵和标签类列表。输出为特征的Fisher Score值的列表。

大佬的github源代码地址↓
源代码地址

自己实现的代码

话不多说,上代码。

import pandas as pd
import numpy as np
#标签只能为0和1,特征空间任意,
#samle:n*m 的列表
#label:m*1 的列表
#返回每个特征的fisher score值的一个列表
#eg:  sample = [[1,2,3],[1,0,1],[1,5,6]]
#     label = [1, 0, 1]
#return  lst=[nan, 1.8148148148148149, 1.8148148148148149]
def binary_fisher_score(sample,label):

    if len(sample) != len(label):
        print('Sample does not match label')
        exit()
    df1 = pd.DataFrame(sample)
    df2 = pd.DataFrame(label, columns=['label'])
    data = pd.concat([df1, df2], axis=1)  # 合并成为一个dataframe

    data0 = data[data.label == 0]#对标签分类,分成包含0和1的两个dataframe
    data1 = data[data.label == 1]
    n = len(label)#标签长度
    n1 = sum(label)#1类标签的个数
    n0 = n - n1#0类标签的个数
    lst = []#用于返回的列表
    features_list = list(data.columns)[:-1]
    for feature in features_list:

        # 算关于data0
        m0_feature_mean = data0[feature].mean()  # 0类标签在第m维上的均值
        # 0类在第m维上的sw
        m0_SW=sum((data0[feature] -m0_feature_mean )**2)
        # 算关于data1
        m1_feature_mean = data1[feature].mean()  # 1类标签在第m维上的均值
        # 1类在第m维上的sw
        m1_SW=sum((data1[feature] -m1_feature_mean )**2)
        # 算关于data
        m_all_feature_mean = data[feature].mean()  # 所有类标签在第m维上的均值

        m0_SB = n0 / n * (m0_feature_mean - m_all_feature_mean) ** 2
        m1_SB = n1 / n * (m1_feature_mean - m_all_feature_mean) ** 2
        #计算SB
        m_SB = m1_SB + m0_SB
        #计算SW
        m_SW = (m0_SW + m1_SW) / n
        if m_SW == 0:
            # 0/0类型也是返回nan
            m_fisher_score = np.nan
        else:
            # 计算Fisher score
            m_fisher_score = m_SB / m_SW
        #Fisher score值添加进列表
        lst.append(m_fisher_score)

    return lst

sample = [[1,2,3],[1,0,1],[1,5,6]]
label = [1, 0, 1]
print(binary_fisher_score(sample,label))

输出:

[nan, 1.8148148148148149, 1.8148148148148149]

以上代码只是简单的实现了标签的值只能为0和1的情况,因此输入的参数的值请读者注意。当然标签的值大小并不重要,因为只是起到一个区分不同类别的作用。

后续我会再更新出一般的对于n个值的标签都适用的Fisher Score 的python实现。

希望本文对你有帮助!

你可能感兴趣的:(特征选择算法)