Task 01- LR-Softmax-MLP

Task 01 线性回归-Softmax-MLP

1 线性回归

1.1 是什么

即单层神经网络,是最简单的机器学习模型之一:
y = x 1 w 1 + x 2 w 2 + . . . + x n w n + b = x w + b \bold y=x_1w_1+x_2w_2+...+x_nw_n+b=\bold x \bold w+b y=x1w1+x2w2+...+xnwn+b=xw+b
其中 w \bold w w即权重, b b b即偏置。

1.2 模型训练

  • 损失:均方误差

L ( w , b ) = 1 n ∑ i = 1 n 1 2 ( w T x ( i ) + b − y ( i ) ) 2 L(\bold w,b)=\frac {1}{n} \sum_{i=1}^n\frac {1}{2}(\bold{w}^{T}\bold{x}^{(i)}+b-y^{(i)})^2 L(w,b)=n1i=1n21(wTx(i)+by(i))2

  • 优化:mini-batch stochastic gradient descent (SGD)
    ( w , b ) ← ( w , b ) − α ∣ β ∣ ∑ i ∈ β ∂ ( w , b ) l ( i ) ( w , b ) o r θ ← θ − α ∣ β ∣ ∇ θ l ( i ) ( θ ) ∇ θ l ( i ) ( θ ) = [ x 1 ( i ) x 2 ( i ) 1 ] ( y ^ ( i ) − y ( i ) ) (\bold{w},b) \leftarrow (\bold{w},b)-\frac{\alpha}{\left|{\beta}\right|} \sum_{i\in{\beta}}\partial_{(\bold w,b)}l^{(i)}(\bold w,b) \\ or \\ \bold\theta\leftarrow\theta-\frac{\alpha}{\left|{\beta}\right|}\nabla_{\bold\theta}l^{(i)}(\bold\theta) \\ \nabla_{\bold\theta}l^{(i)}(\bold\theta)=\begin{bmatrix} x_1^{(i)} \\ x_2^{(i)} \\ 1 \end{bmatrix}(\hat{y}^{(i)}-y^{(i)}) (w,b)(w,b)βαiβ(w,b)l(i)(w,b)orθθβαθl(i)(θ)θl(i)(θ)=x1(i)x2(i)1(y^(i)y(i))
    其中 α \alpha α为学习率, β \beta β为批次大小

1.3 表示方法

from 《动手学深度学习》

Task 01- LR-Softmax-MLP_第1张图片

1.0.0 学习代码

# define a timer class to record time
class Timer(object):
    """Record multiple running times."""
    def __init__(self):
        self.times = []
        self.start()

    def start(self):
        # start the timer
        self.start_time = time.time()

    def stop(self):
        # stop the timer and record time into a list
        self.times.append(time.time() - self.start_time)
        return self.times[-1]

    def avg(self):
        # calculate the average and return
        return sum(self.times)/len(self.times)

    def sum(self):
        # return the sum of recorded time
        return sum(self.times)

2 Softmax

2.0 逻辑回归

  • 信息熵

H ( X ) = − ∑ i = 1 n p ( x i ) l o g ( p ( x i ) ) H(X)=-\sum_{i=1}^{n}p(x_i)log(p(x_i)) H(X)=i=1np(xi)log(p(xi))

  • 相对熵(Kullback-leibler散度):描述两个概率分布的差距
    K L ( P ∣ ∣ Q ) = ∑ P ( x ) l o g P ( x ) Q ( x ) KL(P||Q)=\sum P(x)log\frac{P(x)}{Q(x)} KL(PQ)=P(x)logQ(x)P(x)

H ( p , q ) = − ∑ p ( x i ) l o g ( q ( x i ) ) H(p,q)=-\sum{p(x_i)log(q(x_i))} H(p,q)=p(xi)log(q(xi))

相对熵=信息熵+交叉熵

  • Logistic分布( F ( x ) = 1 1 + e − ( x − μ / γ ) F(x)=\frac{1}{1+e^{-(x-\mu/\gamma)}} F(x)=1+e(xμ/γ)1)图形为“S形曲线”(sigmoid curve)

Task 01- LR-Softmax-MLP_第2张图片

  • 几率(odds):某事件发生的概率与不发生的概率之比

  • 对数几率(logit函数):
    l o g i t ( p ) = l o g p 1 − p logit(p)=log\frac{p}{1-p} logit(p)=log1pp

  • Logistic regression (for classification)
    l o g i t ( p ) = w x logit(p)=\bold{wx} logit(p)=wx

2.1 是什么

单层神经网络的一种,线性回归用于连续值,softmax用于多输出离散值(分类任务)

2.2 模型运算

o 1 = x 1 w 11 + x 2 w 21 + x 3 w 31 + x 4 w 41 + b 1 o 2 = x 1 w 12 + x 2 w 22 + x 3 w 32 + x 4 w 42 + b 2 o 3 = x 1 w 13 + x 2 w 23 + x 3 w 33 + x 4 w 43 + b 3 y ^ 1 , y ^ 2 , y ^ 3 = softmax ( o 1 , o 2 , o 3 ) y ^ 1 = exp ⁡ ( o 1 ) ∑ i = 1 3 exp ⁡ ( o i ) , y ^ 2 = exp ⁡ ( o 2 ) ∑ i = 1 3 exp ⁡ ( o i ) , y ^ 3 = exp ⁡ ( o 3 ) ∑ i = 1 3 exp ⁡ ( o i ) . o r O = X W + b , Y ^ = softmax ( O ) , \begin{aligned} o_1 &= x_1 w_{11} + x_2 w_{21} + x_3 w_{31} + x_4 w_{41} + b_1 \end{aligned} \\ \begin{aligned} o_2 &= x_1 w_{12} + x_2 w_{22} + x_3 w_{32} + x_4 w_{42} + b_2 \end{aligned} \\ \begin{aligned} o_3 &= x_1 w_{13} + x_2 w_{23} + x_3 w_{33} + x_4 w_{43} + b_3 \end{aligned} \\ \hat{y}_1, \hat{y}_2, \hat{y}_3 = \text{softmax}(o_1, o_2, o_3) \\ \hat{y}_1 = \frac{ \exp(o_1)}{\sum_{i=1}^3 \exp(o_i)},\quad \hat{y}_2 = \frac{ \exp(o_2)}{\sum_{i=1}^3 \exp(o_i)},\quad \hat{y}_3 = \frac{ \exp(o_3)}{\sum_{i=1}^3 \exp(o_i)}. \\ or \\ \begin{aligned} \boldsymbol{O} &= \boldsymbol{X} \boldsymbol{W} + \boldsymbol{b},\\ \boldsymbol{\hat{Y}} &= \text{softmax}(\boldsymbol{O}), \end{aligned} o1=x1w11+x2w21+x3w31+x4w41+b1o2=x1w12+x2w22+x3w32+x4w42+b2o3=x1w13+x2w23+x3w33+x4w43+b3y^1,y^2,y^3=softmax(o1,o2,o3)y^1=i=13exp(oi)exp(o1),y^2=i=13exp(oi)exp(o2),y^3=i=13exp(oi)exp(o3).orOY^=XW+b,=softmax(O),

  • 损失函数:交叉熵(cross entropy)
    H ( y ( i ) , y ^ ( i ) ) = − ∑ j = 1 q y j ( i ) log ⁡ y ^ j ( i ) ℓ ( Θ ) = 1 n ∑ i = 1 n H ( y ( i ) , y ^ ( i ) ) , H\left(\boldsymbol y^{(i)}, \boldsymbol {\hat y}^{(i)}\right ) = -\sum_{j=1}^q y_j^{(i)} \log \hat y_j^{(i)} \\ \ell(\boldsymbol{\Theta}) = \frac{1}{n} \sum_{i=1}^n H\left(\boldsymbol y^{(i)}, \boldsymbol {\hat y}^{(i)}\right ), H(y(i),y^(i))=j=1qyj(i)logy^j(i)(Θ)=n1i=1nH(y(i),y^(i)),

3 MLP (Multi-Layer Perceptron)

3.1 是什么

多层感知机在单层神经网络的基础上引入了⼀到多个隐藏层(hidden layer)。隐藏层位于输入层和输出层之间。

3.2 模型运算

H = X W h + b h , O = H W o + b o , \begin{aligned} \boldsymbol{H} &= \boldsymbol{X} \boldsymbol{W}_h + \boldsymbol{b}_h,\\ \boldsymbol{O} &= \boldsymbol{H} \boldsymbol{W}_o + \boldsymbol{b}_o, \end{aligned} HO=XWh+bh,=HWo+bo,

3.3 模型表示

Task 01- LR-Softmax-MLP_第3张图片

你可能感兴趣的:(神经网络,深度学习)