FTRL —— cross entropy 的导数

符号定义:

  • x t ∈ R d \mathbf x_t\in \mathbb R^d xtRd:d 空间中的一个样本实例,其标签值 y t ∈ { 0 , 1 } y_t\in \{0, 1\} yt{0,1}
  • p t p_t pt 为样本 x t \mathbf x_t xt 的预测值,在 logistic regression 环境下,
    • p t = σ ( w t ⋅ x t ) p_t=\sigma(\mathbf w_t\cdot \mathbf x_t) pt=σ(wtxt) σ ( a ) = 1 / ( 1 + exp ⁡ ( − a ) ) \sigma(a)=1/(1+\exp(-a)) σ(a)=1/(1+exp(a))
    • σ ′ ( a ) = σ ( a ) ( 1 − σ ( a ) ) \sigma'(a)=\sigma(a)(1-\sigma(a)) σ(a)=σ(a)(1σ(a))
    • p t ′ = p t ( 1 − p t ) ⋅ x t p'_t=p_t(1-p_t)\cdot \mathbf x_t pt=pt(1pt)xt
  • 损失函数为对数损失(LogLoss):
    • ℓ t ( w t ) = − y t log ⁡ p t − ( 1 − y t ) log ⁡ ( 1 − p t ) \ell_t(\mathbf w_t)=-y_t\log p_t-(1-y_t)\log(1-p_t) t(wt)=ytlogpt(1yt)log(1pt)
    • 因为 y t ∈ { 0 , 1 } y_t\in \{0, 1\} yt{0,1},所以 ℓ t ( w t ) = − log ⁡ p t \ell_t(\mathbf w_t)=-\log p_t t(wt)=logpt ℓ t ( w t ) = − log ⁡ ( 1 − p t ) \ell_t(\mathbf w_t)=-\log(1- p_t) t(wt)=log(1pt)

1. 计算 ∇ ℓ t ( w ) \nabla\ell_t(\mathbf w) t(w)

∂ ℓ t ( w ) ∂ w = − y t p t p t ′ + 1 − y t 1 − p t p t ′ = ( − y t p t + 1 − y t 1 − p t ) p t ′ = p t − y t p t ( 1 − p t ) ⋅ p t ( 1 − p t ) ⋅ x t = ( p t − y t ) x t \begin{array}{ll} \frac{\partial\ell_t(\mathbf w)}{\partial \mathbf w}&=-\frac{y_t}{p_t}p'_t+\frac{1-y_t}{1-p_t}p'_t\\ &=(-\frac{y_t}{p_t}+\frac{1-y_t}{1-p_t})p'_t&\\ &=\frac{p_t-y_t}{p_t(1-p_t)}\cdot p_t(1-p_t)\cdot \mathbf x_t&\\ &=(p_t-y_t)\mathbf x_t& \end{array} wt(w)=ptytpt+1pt1ytpt=(ptyt+1pt1yt)pt=pt(1pt)ptytpt(1pt)xt=(ptyt)xt

你可能感兴趣的:(机器学习)