P ( X = 1 ) = θ P ( X = 0 ) = 1 − θ P(X=1)=\theta\quad P(X=0)=1-\theta P(X=1)=θP(X=0)=1−θ
设随机变量k次取1,n-k次取0,则
似然函数为:
L ( θ ) = ∏ i = 1 n P ( x i ; θ ) = θ k ( 1 − θ ) n − k L(\theta)=\prod\limits_{i=1}^nP(x_i;\theta)=\theta^k(1-\theta)^{n-k} L(θ)=i=1∏nP(xi;θ)=θk(1−θ)n−k
取对数:
log ( L ( θ ) ) = k log ( θ ) + ( n − k ) log ( 1 − θ ) \log(L(\theta))=k\log(\theta)+(n-k)\log(1-\theta) log(L(θ))=klog(θ)+(n−k)log(1−θ)
求导:
∂ log ( L ( θ ) ) ∂ θ = k θ − n − k 1 − θ \frac{\partial{\log(L(\theta))}}{\partial{\theta}}=\frac{k}{\theta}-\frac{n-k}{1-\theta} ∂θ∂log(L(θ))=θk−1−θn−k
当 θ = k / n \theta=k/n θ=k/n时,导数为0,故 θ \theta θ的极大似然估计值为 k / n k/n k/n。
由贝叶斯定理可得:
P ( θ ∣ A 1 , A 2 , ⋯ , A n ) = P ( A 1 , A 2 , ⋯ , A n ∣ θ ) × P ( θ ) P ( A 1 , A 2 , ⋯ , A n ) P(\theta|A_1,A_2,\cdots,A_n)=\frac{P(A_1,A_2,\cdots,A_n|\theta)\times P(\theta)}{P(A_1,A_2,\cdots,A_n)} P(θ∣A1,A2,⋯,An)=P(A1,A2,⋯,An)P(A1,A2,⋯,An∣θ)×P(θ)
θ \theta θ的贝叶斯估计值为:
θ ^ = a r g max θ P ( θ ∣ A 1 , A 2 , ⋯ , A n ) = a r g max θ ∏ P ( A 1 , A 2 , ⋯ , A n ∣ θ ) × P ( θ ) = a r g max θ θ k ( 1 − θ ) n − k θ α − 1 ( 1 − θ ) β − 1 \hat{\theta}=arg\max\limits_\theta P(\theta|A_1,A_2,\cdots,A_n)\\=arg\max\limits_\theta \prod P(A_1,A_2,\cdots,A_n|\theta)\times P(\theta)\\=arg\max\limits_\theta\theta^k(1-\theta)^{n-k}\theta^{\alpha-1}(1-\theta)^{\beta-1} θ^=argθmaxP(θ∣A1,A2,⋯,An)=argθmax∏P(A1,A2,⋯,An∣θ)×P(θ)=argθmaxθk(1−θ)n−kθα−1(1−θ)β−1
求导可得,
θ ^ = k + ( α − 1 ) n + ( α − 1 ) + ( β − 1 ) \hat\theta=\frac{k+(\alpha-1)}{n+(\alpha-1)+(\beta-1)} θ^=n+(α−1)+(β−1)k+(α−1)
其中 α , β \alpha,\beta α,β是 β \beta β分布中的参数。
经验风险最小化就是求解优化问题:
min f ∈ F 1 N ∑ i = 1 N L ( y i , f ( x i ) ) \min\limits_{f\in\mathcal{F}}\frac{1}{N}\sum\limits_{i=1}^{N}L(y_i,f(x_i)) f∈FminN1i=1∑NL(yi,f(xi))
当模型是条件概率分布、损失函数是对数损失函数时,这个问题就变成了: min θ ∈ Θ − 1 N ∑ i = 1 N log P ( y i ∣ ( x i ; θ ) ) \min\limits_{\theta\in\Theta}-\frac{1}{N}\sum\limits_{i=1}^{N}\log P(y_i|(x_i;\theta)) θ∈Θmin−N1i=1∑NlogP(yi∣(xi;θ))
等价于极大似然估计:
max θ ∈ Θ 1 N ∑ i = 1 N log P ( y i ∣ ( x i ; θ ) ) \max\limits_{\theta\in\Theta}\frac{1}{N}\sum\limits_{i=1}^{N}\log P(y_i|(x_i;\theta)) θ∈ΘmaxN1i=1∑NlogP(yi∣(xi;θ))