第六章 Logistic回归与最大熵模型

参考资料

1.李航《统计学习方法》 2.github: https://github.com/fengdu78/lihang-code

Logistic模型与最大熵模型都属于对数线性模型 是否是线性模型取决于训练的参数是否为线性

Logistic回归模型

Logistic分布

X X X是连续的随机变量, X X X服从Logistic分布是指 X X X具有下列分布函数和密度函数:
F ( x ) = P ( X ≤ x ) = 1 1 + e − ( x − μ ) / γ F(x)=P(X \leq x)=\frac{1}{1+e^{-(x-\mu)/ \gamma}} F(x)=P(Xx)=1+e(xμ)/γ1
f ( x ) = F ′ ( x ) = e − ( x − μ ) / γ γ ( 1 + e − ( x − μ ) / γ ) 2 f(x)=F^{'}(x)=\frac{e^{-(x-\mu)/ \gamma}}{\gamma(1+e^{-(x-\mu)/ \gamma})^2} f(x)=F(x)=γ(1+e(xμ)/γ)2e(xμ)/γ
其中: μ 是 位 置 参 数 , γ > 0 是 形 状 参 数 \mu是位置参数,\gamma>0是形状参数 μγ>0

import matplotlib.pyplot as plt
import numpy as np

def DrawLogisticDestribution(mu, gamma): 
    x = np.arange(-10, 10, 0.01)
    y = 1.0 / (1 + np.exp(-(x-mu)/gamma)) 
    y2 = (np.exp(-(x-mu)/gamma)) / pow((1 + np.exp(-(x-mu)/gamma)), 2)
    plt.figure(figsize=(7, 5))
    plt.plot(x, y, 'b-', label='Cumulative Distribution Function')
    plt.plot(x, y2, 'r-', label='Probability Dense Function')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.legend(loc='upper left')
    plt.show()

DrawLogisticDestribution(0, 1)

第六章 Logistic回归与最大熵模型_第1张图片

Logistic回归模型

二项Logistic回归模型

P ( Y = 1 ∣ x ) = e ( w ⋅ x + b ) 1 + e ( w ⋅ x + b ) P(Y=1|x)=\frac{e^{(w \cdot x + b)}}{1+e^{(w \cdot x + b)}} P(Y=1x)=1+e(wx+b)e(wx+b)
P ( Y = 0 ∣ x ) = 1 1 + e ( w ⋅ x + b ) P(Y=0|x)=\frac{1}{1+e^{(w \cdot x + b)}} P(Y=0x)=1+e(wx+b)1
x ∈ R n 是 输 入 , y ∈ { 0 , 1 } 是 输 出 , w ∈ R n , b ∈ R 是 参 数 x\in R^n是输入, y\in\{0, 1\}是输出,w \in R^n, b \in R是参数 xRn,y{0,1},wRn,bR
对于给定的输入 x x x,按照上式求得 P ( Y = 1 ∣ x ) , P ( Y = 0 ∣ x ) P(Y=1|x), P(Y=0|x) P(Y=1x),P(Y=0x),比较两个条件概率值的大小,将实例 x x x分到概率较大的那一类

事件的几率:该事件发生的概率与不发生的概率的比值

如果某事件发生的概率为 p p p,则该事件的几率为 p 1 − p \frac{p}{1-p} 1pp
对数几率(logit函数): l o g i t ( p ) = l o g p 1 − p logit(p)=log\frac{p}{1-p} logit(p)=log1pp
对Logistic回归而言, l o g i t P ( Y = 1 ∣ x ) P ( Y = 0 ∣ x ) = l o g e ( w ⋅ x + b ) = w ⋅ x + b logit \frac{P(Y=1|x)}{P(Y=0|x)} = log e^{(w \cdot x + b)} = w \cdot x + b logitP(Y=0x)P(Y=1x)=loge(wx+b)=wx+b
即在逻辑回归模型中,输出 Y = 1 Y=1 Y=1的对数几率是输入 x x x的线性函数

模型参数估计

对于给定的数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1, y_1),(x_2, y_2),...,(x_N, y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i ∈ R n , y i ∈ { 0 , 1 } x_i\in R^n,y_i\in\{0, 1\} xiRn,yi{0,1},可以应用极大似然估计方法估计模型的参数
P ( Y = 1 ∣ x ) = π ( x ) , P ( Y = 0 ∣ x ) = 1 − π ( x ) P(Y=1|x)=\pi(x),P(Y=0|x)=1-\pi(x) P(Y=1x)=π(x),P(Y=0x)=1π(x),则似然函数为:
∏ i = 1 N [ π ( x i ) ] y i [ 1 − π ( x i ) ] 1 − y i \prod_{i=1}^{N}[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i} i=1N[π(xi)]yi[1π(xi)]1yi
对数似然函数为:
L ( ω ) = ∑ i = 1 N [ y i l o g ( π ( x i ) ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g π ( x i ) 1 − π ( x i ) + l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i ( w ⋅ x + b ) − l o g ( 1 + e x p ( w ⋅ x + b ) ) ] \begin{aligned} L(\omega)&=\sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilog\frac{\pi(x_i)}{1-\pi(x_i)}+log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_i(w \cdot x + b)-log(1+exp(w \cdot x + b))]\\ \end{aligned} L(ω)=i=1N[yilog(π(xi))+(1yi)log(1π(xi))]=i=1N[yilog1π(xi)π(xi)+log(1π(xi))]=i=1N[yi(wx+b)log(1+exp(wx+b))]
L ( ω ) 的 极 大 值 问 题 就 变 成 了 以 对 数 函 数 为 目 标 的 最 优 化 问 题 L(\omega)的极大值问题就变成了以对数函数为目标的最优化问题 L(ω)

多项Logistic回归模型

P ( Y = k ∣ x ) = e x p ( w k ⋅ x + b ) 1 + ∑ k = 1 K − 1 ( w k ⋅ x + b ) , k = 1 , 2 , . . . , K − 1 P(Y=k|x)=\frac{exp(w_k \cdot x + b)}{1+\sum _{k=1}^{K-1}{(w_k \cdot x + b)}},k=1,2,...,K-1 P(Y=kx)=1+k=1K1(wkx+b)exp(wkx+b),k=1,2,...,K1
P ( Y = K ∣ x ) = 1 1 + ∑ k = 1 K − 1 ( w k ⋅ x + b ) P(Y=K|x)=\frac{1}{1+\sum _{k=1}^{K-1}{(w_k \cdot x + b)}} P(Y=Kx)=1+k=1K1(wkx+b)1
x ∈ R n + 1 , w k ∈ R n + 1 x\in R^{n+1}, w_k \in R^{n+1} xRn+1,wkRn+1

from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

def create_data():
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['label'] = iris.target
    df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
    data = np.array(df.iloc[:100, [0, 1, -1]])
    
    return data[:,:2], data[:,-1]

X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Logistic回归

g ( z ) = 1 1 + e − z , g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) g(z) = \frac {1}{1+e^{-z}},g^{'}(z)=g(z)(1-g(z)) g(z)=1+ez1,g(z)=g(z)(1g(z))

f w ( x ) = g ( w T x ) = 1 1 + e − w T x f_w(x)=g(w^Tx)=\frac{1}{1+e^{-w^Tx}} fw(x)=g(wTx)=1+ewTx1
对数似然函数为:
L ( ω ) = ∑ i = 1 N [ y i l o g ( π ( x i ) ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g π ( x i ) 1 − π ( x i ) + l o g ( 1 − π ( x i ) ) ] \begin{aligned} L(\omega)&=\sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilog\frac{\pi(x_i)}{1-\pi(x_i)}+log(1-\pi(x_i))]\\ \end{aligned} L(ω)=i=1N[yilog(π(xi))+(1yi)log(1π(xi))]=i=1N[yilog1π(xi)π(xi)+log(1π(xi))]
关于 w j w_j wj求偏导:
∂ L ( ω ) ∂ w j = ∑ i = 1 N [ y i l o g ( π ( x i ) ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g f w ( x i ) + ( 1 − y i ) l o g ( 1 − f w ( x i ) ) ] = ∑ i = 1 N y i f w ′ ( x i ) f w ( x i ) − 1 − y i 1 − f w ( x i ) f w ′ ( x i ) = ∑ i = 1 N ( y i f w ( x i ) − 1 − y i 1 − f w ( x i ) ) f w ′ ( x i ) = ∑ i = 1 N ( y i f w ( x i ) − 1 − y i 1 − f w ( x i ) ) f w ( x i ) ( 1 − f w ( x i ) ) ∂ w T x ∂ w j = ∑ i = 1 N ( y i ( 1 − f w ( x i ) ) − ( 1 − y i ) f w ( x i ) ) x j = ∑ i = 1 N ( y − f w ( x i ) ) x j \begin{aligned} \frac{\partial L(\omega)}{\partial w_j} &= \sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilogf_w(x_i)+(1-y_i)log(1-f_w(x_i))]\\ &=\sum_{i=1}^{N}y_i \frac{f_w^{'}{(x_i)}}{f_w(x_i)} - \frac{1-y_i}{1-f_w(x_i)} f_w^{'}{(x_i)}\\ &=\sum_{i=1}^{N}(\frac{y_i}{f_w(x_i)}-\frac{1-y_i}{1-f_w(x_i)})f_w^{'}{(x_i)}\\ &=\sum_{i=1}^{N}(\frac{y_i}{f_w(x_i)}-\frac{1-y_i}{1-f_w(x_i)})f_w(x_i)(1-f_w(x_i))\frac{\partial w^Tx}{\partial w_j}\\ &=\sum_{i=1}^{N}(y_i(1-f_w(x_i))-(1-y_i)f_w(x_i))x_j\\ &=\sum_{i=1}^{N}(y-f_w(x_i))x_j \end{aligned} wjL(ω)=i=1N[yilog(π(xi))+(1yi)log(1π(xi))]=i=1N[yilogfw(xi)+(1yi)log(1fw(xi))]=i=1Nyifw(xi)fw(xi)1fw(xi)1yifw(xi)=i=1N(fw(xi)yi1fw(xi)1yi)fw(xi)=i=1N(fw(xi)yi1fw(xi)1yi)fw(xi)(1fw(xi))wjwTx=i=1N(yi(1fw(xi))(1yi)fw(xi))xj=i=1N(yfw(xi))xj
参数更新:
w j : = w j + α ( y i − f w ( x i ) ) x j w_j:=w_j+\alpha(y_i - f_w(x_i))x_j wj:=wj+α(yifw(xi))xj

class LogisticRegressionClassifier(object):
    def __init__(self, max_iter=200, learning_rate=0.01):
        self.max_iter = max_iter
        self.learning_rate = learning_rate
    
    def sigmoid(self, x):
        return 1 / (1 + exp(-(x)))
    
    def data_matrix(self, X):
        data_mat = []
        for d in X:
            # d [6.0  2.8]  *d 6.0 2.8
            data_mat.append([1.0, *d])
        return data_mat
    
    def fit(self, X, y):
        data_mat = self.data_matrix(X)
        self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)
        
        for iter_ in range(self.max_iter):
            for i in range(len(X)):
                result = self.sigmoid(np.dot(data_mat[i], self.weights))
                error = y[i] - result
                # 参数更新
                self.weights += self.learning_rate * error * np.transpose([data_mat[i]])
        print("LogisticRegression Model learning_rate={}, max_iter={}".format( self.learning_rate, self.max_iter))
    def score(self, X_test, y_test):
        right = 0
        X_test = self.data_matrix(X_test)
        for x, y in zip(X_test, y_test):
            result = np.dot(x, self.weights)
            if(result > 0 and y == 1) or (result < 0 and y == 0):
                right += 1
        return right / len(X_test)
lg_clf = LogisticRegressionClassifier()
lg_clf.fit(X_train, y_train)
lg_clf.score(X_test, y_test)
x_ponits = np.arange(4, 8)
y_ = -(lg_clf.weights[1]*x_ponits + lg_clf.weights[0])/lg_clf.weights[2]
plt.plot(x_ponits, y_)

#lg_clf.show_graph()
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()

第六章 Logistic回归与最大熵模型_第2张图片
调用sklearn中的内置函数

from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
x_ponits = np.arange(4, 8)
y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]
plt.plot(x_ponits, y_)

plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0')
plt.plot(X[50:, 0], X[50:, 1], 'bo', color='orange', label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()

第六章 Logistic回归与最大熵模型_第3张图片

你可能感兴趣的:(统计学习方法,机器学习)