Logistic模型与最大熵模型都属于对数线性模型 是否是线性模型取决于训练的参数是否为线性
设 X X X是连续的随机变量, X X X服从Logistic分布是指 X X X具有下列分布函数和密度函数:
F ( x ) = P ( X ≤ x ) = 1 1 + e − ( x − μ ) / γ F(x)=P(X \leq x)=\frac{1}{1+e^{-(x-\mu)/ \gamma}} F(x)=P(X≤x)=1+e−(x−μ)/γ1
f ( x ) = F ′ ( x ) = e − ( x − μ ) / γ γ ( 1 + e − ( x − μ ) / γ ) 2 f(x)=F^{'}(x)=\frac{e^{-(x-\mu)/ \gamma}}{\gamma(1+e^{-(x-\mu)/ \gamma})^2} f(x)=F′(x)=γ(1+e−(x−μ)/γ)2e−(x−μ)/γ
其中: μ 是 位 置 参 数 , γ > 0 是 形 状 参 数 \mu是位置参数,\gamma>0是形状参数 μ是位置参数,γ>0是形状参数
import matplotlib.pyplot as plt
import numpy as np
def DrawLogisticDestribution(mu, gamma):
x = np.arange(-10, 10, 0.01)
y = 1.0 / (1 + np.exp(-(x-mu)/gamma))
y2 = (np.exp(-(x-mu)/gamma)) / pow((1 + np.exp(-(x-mu)/gamma)), 2)
plt.figure(figsize=(7, 5))
plt.plot(x, y, 'b-', label='Cumulative Distribution Function')
plt.plot(x, y2, 'r-', label='Probability Dense Function')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='upper left')
plt.show()
DrawLogisticDestribution(0, 1)
P ( Y = 1 ∣ x ) = e ( w ⋅ x + b ) 1 + e ( w ⋅ x + b ) P(Y=1|x)=\frac{e^{(w \cdot x + b)}}{1+e^{(w \cdot x + b)}} P(Y=1∣x)=1+e(w⋅x+b)e(w⋅x+b)
P ( Y = 0 ∣ x ) = 1 1 + e ( w ⋅ x + b ) P(Y=0|x)=\frac{1}{1+e^{(w \cdot x + b)}} P(Y=0∣x)=1+e(w⋅x+b)1
x ∈ R n 是 输 入 , y ∈ { 0 , 1 } 是 输 出 , w ∈ R n , b ∈ R 是 参 数 x\in R^n是输入, y\in\{0, 1\}是输出,w \in R^n, b \in R是参数 x∈Rn是输入,y∈{0,1}是输出,w∈Rn,b∈R是参数
对于给定的输入 x x x,按照上式求得 P ( Y = 1 ∣ x ) , P ( Y = 0 ∣ x ) P(Y=1|x), P(Y=0|x) P(Y=1∣x),P(Y=0∣x),比较两个条件概率值的大小,将实例 x x x分到概率较大的那一类
如果某事件发生的概率为 p p p,则该事件的几率为 p 1 − p \frac{p}{1-p} 1−pp
对数几率(logit函数): l o g i t ( p ) = l o g p 1 − p logit(p)=log\frac{p}{1-p} logit(p)=log1−pp
对Logistic回归而言, l o g i t P ( Y = 1 ∣ x ) P ( Y = 0 ∣ x ) = l o g e ( w ⋅ x + b ) = w ⋅ x + b logit \frac{P(Y=1|x)}{P(Y=0|x)} = log e^{(w \cdot x + b)} = w \cdot x + b logitP(Y=0∣x)P(Y=1∣x)=loge(w⋅x+b)=w⋅x+b
即在逻辑回归模型中,输出 Y = 1 Y=1 Y=1的对数几率是输入 x x x的线性函数
对于给定的数据集 T = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) } T=\{(x_1, y_1),(x_2, y_2),...,(x_N, y_N)\} T={(x1,y1),(x2,y2),...,(xN,yN)},其中 x i ∈ R n , y i ∈ { 0 , 1 } x_i\in R^n,y_i\in\{0, 1\} xi∈Rn,yi∈{0,1},可以应用极大似然估计方法估计模型的参数
设 P ( Y = 1 ∣ x ) = π ( x ) , P ( Y = 0 ∣ x ) = 1 − π ( x ) P(Y=1|x)=\pi(x),P(Y=0|x)=1-\pi(x) P(Y=1∣x)=π(x),P(Y=0∣x)=1−π(x),则似然函数为:
∏ i = 1 N [ π ( x i ) ] y i [ 1 − π ( x i ) ] 1 − y i \prod_{i=1}^{N}[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i} i=1∏N[π(xi)]yi[1−π(xi)]1−yi
对数似然函数为:
L ( ω ) = ∑ i = 1 N [ y i l o g ( π ( x i ) ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g π ( x i ) 1 − π ( x i ) + l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i ( w ⋅ x + b ) − l o g ( 1 + e x p ( w ⋅ x + b ) ) ] \begin{aligned} L(\omega)&=\sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilog\frac{\pi(x_i)}{1-\pi(x_i)}+log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_i(w \cdot x + b)-log(1+exp(w \cdot x + b))]\\ \end{aligned} L(ω)=i=1∑N[yilog(π(xi))+(1−yi)log(1−π(xi))]=i=1∑N[yilog1−π(xi)π(xi)+log(1−π(xi))]=i=1∑N[yi(w⋅x+b)−log(1+exp(w⋅x+b))]
求 L ( ω ) 的 极 大 值 问 题 就 变 成 了 以 对 数 函 数 为 目 标 的 最 优 化 问 题 L(\omega)的极大值问题就变成了以对数函数为目标的最优化问题 L(ω)的极大值问题就变成了以对数函数为目标的最优化问题
P ( Y = k ∣ x ) = e x p ( w k ⋅ x + b ) 1 + ∑ k = 1 K − 1 ( w k ⋅ x + b ) , k = 1 , 2 , . . . , K − 1 P(Y=k|x)=\frac{exp(w_k \cdot x + b)}{1+\sum _{k=1}^{K-1}{(w_k \cdot x + b)}},k=1,2,...,K-1 P(Y=k∣x)=1+∑k=1K−1(wk⋅x+b)exp(wk⋅x+b),k=1,2,...,K−1
P ( Y = K ∣ x ) = 1 1 + ∑ k = 1 K − 1 ( w k ⋅ x + b ) P(Y=K|x)=\frac{1}{1+\sum _{k=1}^{K-1}{(w_k \cdot x + b)}} P(Y=K∣x)=1+∑k=1K−1(wk⋅x+b)1
x ∈ R n + 1 , w k ∈ R n + 1 x\in R^{n+1}, w_k \in R^{n+1} x∈Rn+1,wk∈Rn+1
from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def create_data():
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
data = np.array(df.iloc[:100, [0, 1, -1]])
return data[:,:2], data[:,-1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
g ( z ) = 1 1 + e − z , g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) g(z) = \frac {1}{1+e^{-z}},g^{'}(z)=g(z)(1-g(z)) g(z)=1+e−z1,g′(z)=g(z)(1−g(z))
f w ( x ) = g ( w T x ) = 1 1 + e − w T x f_w(x)=g(w^Tx)=\frac{1}{1+e^{-w^Tx}} fw(x)=g(wTx)=1+e−wTx1
对数似然函数为:
L ( ω ) = ∑ i = 1 N [ y i l o g ( π ( x i ) ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g π ( x i ) 1 − π ( x i ) + l o g ( 1 − π ( x i ) ) ] \begin{aligned} L(\omega)&=\sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilog\frac{\pi(x_i)}{1-\pi(x_i)}+log(1-\pi(x_i))]\\ \end{aligned} L(ω)=i=1∑N[yilog(π(xi))+(1−yi)log(1−π(xi))]=i=1∑N[yilog1−π(xi)π(xi)+log(1−π(xi))]
关于 w j w_j wj求偏导:
∂ L ( ω ) ∂ w j = ∑ i = 1 N [ y i l o g ( π ( x i ) ) + ( 1 − y i ) l o g ( 1 − π ( x i ) ) ] = ∑ i = 1 N [ y i l o g f w ( x i ) + ( 1 − y i ) l o g ( 1 − f w ( x i ) ) ] = ∑ i = 1 N y i f w ′ ( x i ) f w ( x i ) − 1 − y i 1 − f w ( x i ) f w ′ ( x i ) = ∑ i = 1 N ( y i f w ( x i ) − 1 − y i 1 − f w ( x i ) ) f w ′ ( x i ) = ∑ i = 1 N ( y i f w ( x i ) − 1 − y i 1 − f w ( x i ) ) f w ( x i ) ( 1 − f w ( x i ) ) ∂ w T x ∂ w j = ∑ i = 1 N ( y i ( 1 − f w ( x i ) ) − ( 1 − y i ) f w ( x i ) ) x j = ∑ i = 1 N ( y − f w ( x i ) ) x j \begin{aligned} \frac{\partial L(\omega)}{\partial w_j} &= \sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilogf_w(x_i)+(1-y_i)log(1-f_w(x_i))]\\ &=\sum_{i=1}^{N}y_i \frac{f_w^{'}{(x_i)}}{f_w(x_i)} - \frac{1-y_i}{1-f_w(x_i)} f_w^{'}{(x_i)}\\ &=\sum_{i=1}^{N}(\frac{y_i}{f_w(x_i)}-\frac{1-y_i}{1-f_w(x_i)})f_w^{'}{(x_i)}\\ &=\sum_{i=1}^{N}(\frac{y_i}{f_w(x_i)}-\frac{1-y_i}{1-f_w(x_i)})f_w(x_i)(1-f_w(x_i))\frac{\partial w^Tx}{\partial w_j}\\ &=\sum_{i=1}^{N}(y_i(1-f_w(x_i))-(1-y_i)f_w(x_i))x_j\\ &=\sum_{i=1}^{N}(y-f_w(x_i))x_j \end{aligned} ∂wj∂L(ω)=i=1∑N[yilog(π(xi))+(1−yi)log(1−π(xi))]=i=1∑N[yilogfw(xi)+(1−yi)log(1−fw(xi))]=i=1∑Nyifw(xi)fw′(xi)−1−fw(xi)1−yifw′(xi)=i=1∑N(fw(xi)yi−1−fw(xi)1−yi)fw′(xi)=i=1∑N(fw(xi)yi−1−fw(xi)1−yi)fw(xi)(1−fw(xi))∂wj∂wTx=i=1∑N(yi(1−fw(xi))−(1−yi)fw(xi))xj=i=1∑N(y−fw(xi))xj
参数更新:
w j : = w j + α ( y i − f w ( x i ) ) x j w_j:=w_j+\alpha(y_i - f_w(x_i))x_j wj:=wj+α(yi−fw(xi))xj
class LogisticRegressionClassifier(object):
def __init__(self, max_iter=200, learning_rate=0.01):
self.max_iter = max_iter
self.learning_rate = learning_rate
def sigmoid(self, x):
return 1 / (1 + exp(-(x)))
def data_matrix(self, X):
data_mat = []
for d in X:
# d [6.0 2.8] *d 6.0 2.8
data_mat.append([1.0, *d])
return data_mat
def fit(self, X, y):
data_mat = self.data_matrix(X)
self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)
for iter_ in range(self.max_iter):
for i in range(len(X)):
result = self.sigmoid(np.dot(data_mat[i], self.weights))
error = y[i] - result
# 参数更新
self.weights += self.learning_rate * error * np.transpose([data_mat[i]])
print("LogisticRegression Model learning_rate={}, max_iter={}".format( self.learning_rate, self.max_iter))
def score(self, X_test, y_test):
right = 0
X_test = self.data_matrix(X_test)
for x, y in zip(X_test, y_test):
result = np.dot(x, self.weights)
if(result > 0 and y == 1) or (result < 0 and y == 0):
right += 1
return right / len(X_test)
lg_clf = LogisticRegressionClassifier()
lg_clf.fit(X_train, y_train)
lg_clf.score(X_test, y_test)
x_ponits = np.arange(4, 8)
y_ = -(lg_clf.weights[1]*x_ponits + lg_clf.weights[0])/lg_clf.weights[2]
plt.plot(x_ponits, y_)
#lg_clf.show_graph()
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
x_ponits = np.arange(4, 8)
y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]
plt.plot(x_ponits, y_)
plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0')
plt.plot(X[50:, 0], X[50:, 1], 'bo', color='orange', label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()