Supervised Classification——Logistic Regression Classification

目录

  • 1. 理解
  • 2. 方法
    • 2.1 Hypothesis representation
    • 2.2 Cost function
    • 2.2 Gradient Descent
  • 3. 代码
  • 参考

1. 理解

Logistic regression classifier实际上就是将所有的特征乘以一个权重然后相加起来得到一个值(跟线性回归一样),将这个结果输入到一个sigmod/logistic函数中,我们就会得到一个0-1之间的值(返回一个概率值)。 当这个值大于0.5, 我们将这个实例分为类别1;小于0.5,则归类为类别0。该分类可以应用于判断邮件是否是垃圾邮件、交易是否是欺诈以及肿瘤是否良性还是恶性。

Supervised Classification——Logistic Regression Classification_第1张图片

Advanatages:

  • Efficient and straightforward
  • It doesn’t require high computation power
  • It doesn’t require scaling of features
  • Provides a probability score for observations

Disadvantages:

  • Not able to handle a large number of categorical features/variables.
  • It is vulnerable to overfitting
  • It can’t solve the non-linear problem with the logistic regression.
  • Logistic regression will not perform well with independent variables that are not correlated to the target variable and are very similar or correlated to each other.

2. 方法

2.1 Hypothesis representation

下面式子是模型的假设:

f ( x ) = θ 0 + θ 1 x 1 + . . . + θ n x n = w T x + b ,   [   w   :   M a t r i x   o f   θ ] f(x) = \theta_0+ \theta_1x_1+...+\theta_nx_n= w^Tx+b, \ [\ w\ : \ Matrix \ of \ \theta] f(x)=θ0+θ1x1+...+θnxn=wTx+b, [ w : Matrix of θ]
h θ ( x ) = 1 1 + e − f ( x ) h_\theta(x) = \frac{1}{1+e^{-f(x)}} hθ(x)=1+ef(x)1

2.2 Cost function

定义代价函数用于评估模型的好坏(真实与预测的偏差,模型与数据拟合程度),我们想要使得代价函数尽可能的小以获得一个好的模型, 实际上,就是通过代价模型求取模型的参数。所谓通过训练获得模型存储模型,实际上就是存储参数。

在logistic regression分类中, 一般我们将分为两个类class 1 和class 2,这里对应的实际值y(标签)为0和1。而 h θ ( x ) h_\theta(x) hθ(x) 实际上是一个在[0-1]之间的概率值。

C o s t ( h θ ( x ) , y ) = { − l o g ( h θ ( x ) ) y = 1 − l o g ( 1 − h θ ( x ) ) y = 0 Cost(h_\theta(x), y) = \begin{cases} -log(h_\theta(x) )& y = 1 \\ -log(1-h_\theta(x) ) & y = 0 \end{cases} Cost(hθ(x),y)={log(hθ(x))log(1hθ(x))y=1y=0

从下面的图像我们也可以看出来, 当实际与预测接近是,代价趋近于0,反之则很大。
Supervised Classification——Logistic Regression Classification_第2张图片

将代价方程合并为一个式子:

J ( θ ) = − 1 m ∑ y ( i ) l o g ( h θ ( x ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ) ) ) J(\theta) = -\frac{1}{m}\sum y^{(i)}log(h_\theta(x) +(1-y^{(i)})log(1-h_\theta(x) )) J(θ)=m1y(i)log(hθ(x)+(1y(i))log(1hθ(x)))

最大似然的思想定义代价函数:

Supervised Classification——Logistic Regression Classification_第3张图片

2.2 Gradient Descent

实际上,梯度下降就是找到一个方向(对 θ \theta θ求导),使得 J ( θ ) J(\theta) J(θ)下降得最快, 它是一种迭代求解的方法,通过在每一步选取使目标函数变化最快的一个方向调整参数的值来逼近最优值。 具体的推导过程见参考链接2.

∂ J ( θ ) ∂ θ j = 1 m ∑ i = 1 m ( y − h θ ( x ( i ) ) ) x j ( i ) = Δ θ j \frac{\partial J(\theta) }{\partial \theta_j} = \frac1m\sum_{i=1}^{m}(y-h_\theta(x^{(i)}))x_j^{(i)}= \Delta\theta_j θjJ(θ)=m1i=1m(yhθ(x(i)))xj(i)=Δθj

我们已经知道了梯度下降的方向,定义一个步长 α \alpha α,就可以更新模型参数 θ \theta θ
θ j = θ j − Δ θ j \theta_j = \theta_j - \Delta\theta_j θj=θjΔθj

3. 代码

# Load data
from sklearn import datasets
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

iris = datasets.load_iris()

X = iris.data
y = iris.target

# Split train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
                                                   random_state = 21, stratify = y)
X_train.shape, X_test.shape   #((105, 4), (45, 4))
y_train.shape, y_test.shape   #((105,), (45,))

# Logistic regression
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train,y_train)

y_predict = logreg.predict(X_test)
# array([2, 2, 2, 2, 1, 0, 1, 0, 0, 1, 0, 2, 0, 2, 2, 0, 0, 0, 1, 0, 2, 2,
#       2, 0, 1, 1, 1, 0, 0, 1, 2, 2, 0, 0, 1, 2, 2, 1, 1, 2, 1, 1, 0, 2,
#       1])

# Accuracy
logreg.score(X_test, y_test)
#0.9777777777777777

# import the metrics class
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix
#array([[15,  0,  0],
#       [ 0, 14,  1],
#       [ 0,  0, 15]])

参考

  1. Introduction to Logistic Regression
  2. Logistic Regression–逻辑回归模型
  3. logistic regression model

你可能感兴趣的:(Machine,Learning)