ML - 非线性回归 logistic regression

1. 概率:

1.1 定义 概率(P)robability: 对一件事情发生的可能性的衡量

1.2 范围 0 <= P <= 1

1.3 计算方法:

1.3.1 根据个人置信

1.3.2 根据历史数据

1.3.3 根据模拟数据

1.4 条件概率:



2. Logistic Regression (逻辑回归)

2.1 例子


ML - 非线性回归 logistic regression_第1张图片
h(x) > 0.5

ML - 非线性回归 logistic regression_第2张图片
h(x) > 0.2

2.2 基本模型

测试数据为X(x0,x1,x2···xn)

要学习的参数为: Θ(θ0,θ1,θ2,···θn)



向量表示:



处理二值数据,引入Sigmoid函数时曲线平滑化

ML - 非线性回归 logistic regression_第3张图片

预测函数:



用概率表示:
正例(y=1):



反例(y=0):

2.3 Cost函数
线性回归:


ML - 非线性回归 logistic regression_第4张图片


找到合适的 θ0,θ1使上式最小

Logistic regression:


ML - 非线性回归 logistic regression_第5张图片

Cost函数:
目标:找到合适的 θ0,θ1使上式最小
2.4 解法:梯度下降(gradient decent)


ML - 非线性回归 logistic regression_第6张图片

ML - 非线性回归 logistic regression_第7张图片

更新法则:


image.png
  • 学习率
    同时对所有的θ进行更新
    重复更新直到收敛

  • Python 实现:

import numpy as np
import random


# m denotes the number of examples here, not the number of features
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    return theta


def genData(numPoints, bias, variance):
    x = np.zeros(shape=(numPoints, 2))
    y = np.zeros(shape=numPoints)
    # basically a straight line
    for i in range(0, numPoints):
        # bias feature
        x[i][0] = 1
        x[i][1] = i
        # our target variable
        y[i] = (i + bias) + random.uniform(0, 1) * variance
    return x, y


# gen 100 points with a bias of 25 and 10 variance as a bit of noise
x, y = genData(100, 25, 10)
print("x:")
print(x)
print("y:")
print(y)

m, n = np.shape(x)
print("m:")
print(m)
print("n:")
print(n)

numIterations = 100000 #训练次数
alpha = 0.0005  # 学习率
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print("theta:")
print(theta)

结果:

x:
[[ 1.  0.]
 [ 1.  1.]
 [ 1.  2.]
 [ 1.  3.]
 [ 1.  4.]
 [ 1.  5.]
 [ 1.  6.]
 [ 1.  7.]
 [ 1.  8.]
 [ 1.  9.]
 [ 1. 10.]
 [ 1. 11.]
 [ 1. 12.]
 [ 1. 13.]
 [ 1. 14.]
 [ 1. 15.]
 [ 1. 16.]
 [ 1. 17.]
 [ 1. 18.]
 [ 1. 19.]
 [ 1. 20.]
 [ 1. 21.]
 [ 1. 22.]
 [ 1. 23.]
 [ 1. 24.]
 [ 1. 25.]
 [ 1. 26.]
 [ 1. 27.]
 [ 1. 28.]
 [ 1. 29.]
 [ 1. 30.]
 [ 1. 31.]
 [ 1. 32.]
 [ 1. 33.]
 [ 1. 34.]
 [ 1. 35.]
 [ 1. 36.]
 [ 1. 37.]
 [ 1. 38.]
 [ 1. 39.]
 [ 1. 40.]
 [ 1. 41.]
 [ 1. 42.]
 [ 1. 43.]
 [ 1. 44.]
 [ 1. 45.]
 [ 1. 46.]
 [ 1. 47.]
 [ 1. 48.]
 [ 1. 49.]
 [ 1. 50.]
 [ 1. 51.]
 [ 1. 52.]
 [ 1. 53.]
 [ 1. 54.]
 [ 1. 55.]
 [ 1. 56.]
 [ 1. 57.]
 [ 1. 58.]
 [ 1. 59.]
 [ 1. 60.]
 [ 1. 61.]
 [ 1. 62.]
 [ 1. 63.]
 [ 1. 64.]
 [ 1. 65.]
 [ 1. 66.]
 [ 1. 67.]
 [ 1. 68.]
 [ 1. 69.]
 [ 1. 70.]
 [ 1. 71.]
 [ 1. 72.]
 [ 1. 73.]
 [ 1. 74.]
 [ 1. 75.]
 [ 1. 76.]
 [ 1. 77.]
 [ 1. 78.]
 [ 1. 79.]
 [ 1. 80.]
 [ 1. 81.]
 [ 1. 82.]
 [ 1. 83.]
 [ 1. 84.]
 [ 1. 85.]
 [ 1. 86.]
 [ 1. 87.]
 [ 1. 88.]
 [ 1. 89.]
 [ 1. 90.]
 [ 1. 91.]
 [ 1. 92.]
 [ 1. 93.]
 [ 1. 94.]
 [ 1. 95.]
 [ 1. 96.]
 [ 1. 97.]
 [ 1. 98.]
 [ 1. 99.]]
y:
[ 26.27815269  32.66768058  28.22594145  30.77125223  29.22859695
  38.02617578  38.77723704  38.75941693  37.51914005  34.70311263
  39.38349805  36.82172645  37.53424558  43.89335788  39.86619043
  42.77143872  46.97544428  50.24971924  45.30721118  44.55195142
  51.70691022  46.56863106  52.32805153  52.84954093  54.55242641
  52.14422122  56.27667761  56.98691298  53.56176317  63.44462043
  60.08578544  65.41098273  65.92701345  64.24412903  67.53920778
  63.35080039  64.43398594  63.34590094  63.11265328  67.07000322
  69.69430602  67.07964006  71.26126237  72.33061819  78.99023496
  78.62644886  75.33387876  74.23899871  80.06708854  81.03063236
  82.09372834  76.65280126  86.38648144  86.39932245  79.56509259
  86.62380336  88.00737772  91.95667651  86.30124993  91.39647352
  87.791776    88.80877001  96.1679461   94.8139934   98.44559598
  98.55320134  99.85464471  96.56094905  97.31222944  94.19160055
  98.14492827  99.80317251 101.65405055 102.29465893 100.20862392
 106.37400148 108.33447212 110.41768632 105.49789886 104.64961868
 109.37812661 107.69358766 109.10927721 109.93432977 109.13875359
 116.8197377  111.26240862 120.88567915 117.93786525 122.85693307
 120.26210017 116.99993199 124.74461618 124.99292528 122.0402555
 123.9124033  122.28379028 127.65976993 132.21417455 125.08727085]

m:
100
n:
2

theta:
[29.86975704  1.00423275]

你可能感兴趣的:(ML - 非线性回归 logistic regression)