生成学习算法(Generative Learning algorithms)

看了一下斯坦福大学公开课:机器学习教程(吴恩达教授),记录了一些笔记,写出来以便以后有用到。笔记如有误,还望告知。
本系列其它笔记:
线性回归(Linear Regression)
分类和逻辑回归(Classification and logistic regression)
广义线性模型(Generalized Linear Models)
生成学习算法(Generative Learning algorithms)

生成学习算法(Generative Learning algorithms)

之前我们学习的算法 p ( y ∣ x ; θ ) p(y|x;\theta) p(yx;θ)给定x的y的条件分布,我们称之为判别学习算法(discriminative learning algorithms);现在我们学习相反的算法 p ( x ∣ y ) ( p ( y ) ) p(x|y)(p(y)) p(xy)(p(y)),称之为生成学习算法(generative
learning algorithms)。

使用贝叶斯定理,我们可以得到给定x后y的分布:
p ( y ∣ x ) = p ( x ∣ y ) p ( y ) p ( x ) p ( x ) = p ( x ∣ y = 1 ) p ( y = 1 ) + p ( x ∣ y = 0 ) p ( y = 0 ) p(y|x) = \frac{p(x|y)p(y)}{p(x)} \\p(x) = p(x|y = 1)p(y = 1) + p(x|y = 0)p(y = 0) p(yx)=p(x)p(xy)p(y)p(x)=p(xy=1)p(y=1)+p(xy=0)p(y=0)

1 高斯判别分析(Gaussian discriminant analysis)

1.1 多元高斯分布(多元正态分布)

假设输入特征 x ∈ R n x \in \R^n xRn,且是连续的;p(x|y)满足高斯分布。

假设z符合多元高斯分布 z ∽ N ( μ ⃗ , Σ ) z \backsim\mathcal{N}(\vec\mu,\Sigma) zN(μ ,Σ)
p ( z ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x − μ ) T Σ − 1 ( x − μ ) ) p(z) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x-\mu)^{T}\Sigma^{-1}(x-\mu)) p(z)=(2π)2nΣ211exp(21(xμ)TΣ1(xμ))

1.2 高斯判别分析模型(The Gaussian Discriminant Analysis model)

y ∽ B e r n o u l l i ( ϕ ) x ∣ y = 0 ∽ N ( μ 0 , Σ ) x ∣ y = 1 ∽ N ( μ 1 , Σ ) p ( y ) = ϕ y ( 1 − ϕ ) 1 − y p ( x ∣ y = 0 ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x − μ 0 ) T Σ − 1 ( x − μ 0 ) ) p ( x ∣ y = 1 ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 exp ⁡ ( − 1 2 ( x − μ 1 ) T Σ − 1 ( x − μ 1 ) ) y \backsim Bernoulli(\phi) \\x|y = 0 \backsim\mathcal{N}(\mu_0,\Sigma) \\x|y = 1 \backsim\mathcal{N}(\mu_1,\Sigma) \\p(y) = \phi^{y}(1-\phi)^{1-y} \\p(x|y = 0) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x-\mu_0)^{T}\Sigma^{-1}(x-\mu_0)) \\p(x|y = 1) = \frac{1}{(2\pi)^{\frac{n}{2}}|\Sigma|^{\frac{1}{2}}}\exp(-\frac{1}{2}(x-\mu_1)^{T}\Sigma^{-1}(x-\mu_1)) yBernoulli(ϕ)xy=0N(μ0,Σ)xy=1N(μ1,Σ)p(y)=ϕy(1ϕ)1yp(xy=0)=(2π)2nΣ211exp(21(xμ0)TΣ1(xμ0))p(xy=1)=(2π)2nΣ211exp(21(xμ1)TΣ1(xμ1))

ℓ ( ϕ , μ 0 , μ 1 , Σ ) = log ⁡ ∏ i = 1 m p ( x ( i ) , y ( i ) ; ϕ , μ 0 , μ 1 , Σ ) = log ⁡ ∏ i = 1 m p ( x ( i ) ∣ y ( i ) ; μ 0 , μ 1 , Σ ) ⋅ p ( y ( i ) ; ϕ ) → j o i n t   L i k e l i h o o d = ∑ i = 1 m ( log ⁡ p ( x ( i ) ∣ y ( i ) ; μ 0 , μ 1 , Σ ) + log ⁡ p ( y ( i ) ; ϕ ) ) \ell(\phi,\mu_0,\mu_1,\Sigma) = \log\prod_{i=1}^{m}p(x^{(i)},y^{(i)};\phi,\mu_0,\mu_1,\Sigma) \\ = \log\prod_{i=1}^{m}p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)\cdot p(y^{(i)};\phi) \rightarrow joint \ Likelihood \\ = \sum_{i=1}^{m}(\log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+ \log p(y^{(i)};\phi)) (ϕ,μ0,μ1,Σ)=logi=1mp(x(i),y(i);ϕ,μ0,μ1,Σ)=logi=1mp(x(i)y(i);μ0,μ1,Σ)p(y(i);ϕ)joint Likelihood=i=1m(logp(x(i)y(i);μ0,μ1,Σ)+logp(y(i);ϕ))

∂ ∂ ϕ ℓ ( ϕ , μ 0 , μ 1 , Σ ) = ∂ ∂ ϕ ∑ i = 1 m ( log ⁡ p ( x ( i ) ∣ y ( i ) ; μ 0 , μ 1 , Σ ) + log ⁡ p ( y ( i ) ; ϕ ) ) = ∑ i = 1 m ∂ ∂ ϕ log ⁡ p ( y ( i ) ; ϕ ) = ∑ i = 1 m ∂ ∂ ϕ ( y ( i ) log ⁡ ( ϕ ) + ( 1 − y ( i ) ) log ⁡ ( 1 − ϕ ) ) = ∑ i = 1 m ( y ( i ) − ϕ ϕ ( 1 − ϕ ) ) 令   ∂ ∂ ϕ ℓ ( ϕ , μ 0 , μ 1 , Σ ) = 0 ⇒ ϕ = ∑ i = 1 m y ( i ) m = 1 m ∑ i = 1 m 1 { y ( i ) = 1 } \left.\frac{\partial}{\partial\phi}\right.\ell(\phi,\mu_0,\mu_1,\Sigma) = \left.\frac{\partial}{\partial\phi}\right.\sum_{i=1}^{m}(\log p(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)+ \log p(y^{(i)};\phi)) \\ = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi}\right.\log p(y^{(i)};\phi) \\ = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi}\right.(y^{(i)}\log(\phi) + (1-y{(i)})\log(1-\phi)) \\ = \sum_{i=1}^{m}(\frac{y^{(i)} - \phi}{\phi(1 - \phi)}) \\ 令 \ \left.\frac{\partial}{\partial\phi}\right.\ell(\phi,\mu_0,\mu_1,\Sigma) = 0 \Rightarrow \phi = \frac{\sum_{i=1}^{m}y^{(i)}}{m} = \frac{1}{m}\sum_{i=1}^{m}1 \lbrace y^{(i)} = 1 \rbrace ϕ(ϕ,μ0,μ1,Σ)=ϕi=1m(logp(x(i)y(i);μ0,μ1,Σ)+logp(y(i);ϕ))=i=1mϕlogp(y(i);ϕ)=i=1mϕ(y(i)log(ϕ)+(1y(i))log(1ϕ))=i=1m(ϕ(1ϕ)y(i)ϕ) ϕ(ϕ,μ0,μ1,Σ)=0ϕ=mi=1my(i)=m1i=1m1{y(i)=1}

logistic回归中
ℓ ( θ ) = log ⁡ ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) → c o n d i t i o n a l   L i k e l i h o o d \ell(\theta) = \log\prod_{i=1}^{m}p(y^{(i)}|x^{(i)};\theta) \rightarrow conditional \ Likelihood (θ)=logi=1mp(y(i)x(i);θ)conditional Likelihood
最大化 ℓ \ell ,得出下面结果:
ϕ = 1 m ∑ i = 1 m 1 { y ( i ) = 1 } μ 0 = ∑ i = 1 m 1 { y ( i ) = 0 } x ( i ) → 标 签 为 0 的 所 有 样 本 x ( i ) 求 和 ∑ i = 1 m 1 { y ( i ) = 0 } → 标 签 为 0 的 样 本 数 目 μ 1 = ∑ i = 1 m 1 { y ( i ) = 1 } x ( i ) ∑ i = 1 m 1 { y ( i ) = 1 } Σ = 1 m ∑ i = 1 m ( x ( i ) − μ y ( i ) ) ( x ( i ) − μ y ( i ) ) T \phi = \frac{1}{m}\sum_{i=1}^{m}1 \lbrace y^{(i)} = 1 \rbrace \\\mu_0 = \frac{{\sum_{i=1}^{m}1 \lbrace y^{(i)} = 0 \rbrace x^{(i)}} \rightarrow 标签为0的所有样本x^{(i)}求和}{{\sum_{i=1}^{m}1 \lbrace y^{(i)} = 0 \rbrace}\rightarrow 标签为0的样本数目} \\\mu_1 = \frac{\sum_{i=1}^{m}1 \lbrace y^{(i)} = 1 \rbrace x^{(i)}}{\sum_{i=1}^{m}1 \lbrace y^{(i)} = 1 \rbrace} \\\Sigma = \frac{1}{m}\sum_{i=1}^{m}(x^{(i)} - \mu_{y^{(i)}})(x^{(i)} - \mu_{y^{(i)}})^T ϕ=m1i=1m1{y(i)=1}μ0=i=1m1{y(i)=0}0i=1m1{y(i)=0}x(i)0x(i)μ1=i=1m1{y(i)=1}i=1m1{y(i)=1}x(i)Σ=m1i=1m(x(i)μy(i))(x(i)μy(i))T
得到 ϕ , μ 0 , μ 1 , Σ \phi,\mu_0,\mu_1,\Sigma ϕ,μ0,μ1,Σ之后,我们需要预测给定x的情况下最可能的y
arg ⁡ max ⁡ y P ( y ∣ x ) = arg ⁡ max ⁡ y P ( x ∣ y ) P ( y ) P ( x ) = arg ⁡ max ⁡ y P ( x ∣ y ) P ( y ) → P ( x ) 独 立 于 y , 所 以 P ( x ) 值 不 会 变 \mathop{\arg\max}_{y} P(y|x) = \mathop{\arg\max}_{y} \frac{P(x|y)P(y)}{P(x)} \\ = \mathop{\arg\max}_{y} P(x|y)P(y) \rightarrow P(x)独立于y,所以P(x)值不会变 argmaxyP(yx)=argmaxyP(x)P(xy)P(y)=argmaxyP(xy)P(y)P(x)yP(x)

arg ⁡ max ⁡ y 表 达 式 → 表 达 式 中 最 大 的 y 的 值 m i n ( x − 5 ) 2 = 0 → arg ⁡ min ⁡ x ( x − 5 ) 2 = 5 \mathop{\arg\max}_{y}表达式 \rightarrow 表达式中最大的y的值 \\ min(x-5)^2 = 0 \rightarrow\mathop{\arg\min}_{x}(x-5)^2 = 5 argmaxyymin(x5)2=0argminx(x5)2=5

2 朴素贝叶斯(Naive Bayes)

假定给定y, x i x_i xi是条件独立的:
p ( x 1 , x 2 , … , x 50000 ∣ y ) = p ( x 1 ∣ y ) p ( x 2 ∣ y , x 1 ) p ( x 3 ∣ y , x 1 , x 2 ) … p ( x 50000 ∣ y , x 1 , x 2 , … , x 49999 ) = p ( x 1 ∣ y ) p ( x 2 ∣ y ) p ( x 3 ∣ y ) … p ( x 50000 ∣ y ) = ∏ i = 1 50000 p ( x i ∣ y ) p(x_1,x_2,\dots,x_{50000}|y) = p(x_1|y)p(x_2|y,x_1)p(x_3|y,x_1,x_2)\dots p(x_{50000}|y,x_1,x_2,\dots,x_{49999}) \\ = p(x_1|y)p(x_2|y)p(x_3|y)\dots p(x_{50000}|y) \\ = \prod_{i=1}^{50000}p(x_i|y) p(x1,x2,,x50000y)=p(x1y)p(x2y,x1)p(x3y,x1,x2)p(x50000y,x1,x2,,x49999)=p(x1y)p(x2y)p(x3y)p(x50000y)=i=150000p(xiy)
模型参数:
ϕ j ∣ y = 1 = p ( x j = 1 ∣ y = 1 ) ϕ j ∣ y = 0 = p ( x j = 1 ∣ y = 0 ) ϕ y = p ( y = 1 ) \phi_{j|y=1} = p(x_j = 1|y = 1) \\ \phi_{j|y=0} = p(x_j = 1|y = 0) \\ \phi_{y} = p(y = 1) ϕjy=1=p(xj=1y=1)ϕjy=0=p(xj=1y=0)ϕy=p(y=1)
joint likelihood:
ℓ ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = ∏ i = 1 m p ( x i , y i ) \ell(\phi_y,\phi_{j|y=0},\phi_{j|y=1}) = \prod_{i=1}^{m}p(x_i,y_i) (ϕy,ϕjy=0,ϕjy=1)=i=1mp(xi,yi)
参数推导过程
①、
L ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = ∏ i = 1 m p ( x ( i ) , y ( i ) ) = ∏ i = 1 m { ∏ j = 1 n p ( x j ( i ) ∣ y ( i ) ; ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) } p ( y ( i ) ; ϕ y ) \mathcal{L}(\phi_y,\phi_{j|y=0},\phi_{j|y=1}) = \prod_{i=1}^{m}p(x^{(i)},y^{(i)}) \\ = \prod_{i=1}^{m}\lbrace\prod_{j=1}^{n}p(x_j^{(i)}|y^{(i)};\phi_{j|y=0},\phi_{j|y=1})\rbrace p(y^{(i)};\phi_y) L(ϕy,ϕjy=0,ϕjy=1)=i=1mp(x(i),y(i))=i=1m{j=1np(xj(i)y(i);ϕjy=0,ϕjy=1)}p(y(i);ϕy)
②、
ℓ ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = log ⁡ L ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = log ⁡ ∏ i = 1 m { ∏ j = 1 n p ( x j ( i ) ∣ y ( i ) ; ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) } p ( y ( i ) ; ϕ y ) = ∑ i = 1 m { log ⁡ ( ∏ j = 1 n p ( x j ( i ) ∣ y ( i ) ; ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) ) + log ⁡ ( p ( y ( i ) ; ϕ y ) ) } \ell(\phi_y,\phi_{j|y=0},\phi_{j|y=1}) = \log \mathcal{L}(\phi_y,\phi_{j|y=0},\phi_{j|y=1}) \\ = \log\prod_{i=1}^{m}\lbrace \prod_{j=1}^{n}p(x_j^{(i)}|y^{(i)};\phi_{j|y=0},\phi_{j|y=1})\rbrace p(y^{(i)};\phi_y) \\ = \sum_{i=1}^{m} \lbrace \log(\prod_{j=1}^{n}p(x_j^{(i)}|y^{(i)};\phi_{j|y=0},\phi_{j|y=1})) + \log(p(y^{(i)};\phi_y)) \rbrace (ϕy,ϕjy=0,ϕjy=1)=logL(ϕy,ϕjy=0,ϕjy=1)=logi=1m{j=1np(xj(i)y(i);ϕjy=0,ϕjy=1)}p(y(i);ϕy)=i=1m{log(j=1np(xj(i)y(i);ϕjy=0,ϕjy=1))+log(p(y(i);ϕy))}
③、
∂ ∂ ϕ y ℓ ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = ∑ i = 1 m ∂ ∂ ϕ y log ⁡ ( p ( y ( i ) ; ϕ y ) = ∑ i = 1 m ∂ ∂ ϕ y log ⁡ ( ϕ y 1 { y ( i ) = 1 } ( 1 − ϕ y ) ( 1 − 1 { y ( i ) = 1 } ) = ∑ i = 1 m ∂ ∂ ϕ y { ( 1 { y ( i ) = 1 } log ⁡ ϕ y ) + ( 1 − 1 { y ( i ) = 1 } ) log ⁡ ( 1 − ϕ y ) } = ∑ i = 1 m 1 { y ( i ) = 1 } − ϕ y ϕ y ( 1 − ϕ y ) 令 ∂ ∂ ϕ y ℓ ( ϕ y , ϕ j ∣ y = 0 , ϕ j ∣ y = 1 ) = 0 ⇒ ϕ y = ∑ i = 1 m 1 { y ( i ) = 1 } m \left.\frac{\partial}{\partial\phi_y}\right.\ell(\phi_y,\phi_{j|y=0},\phi_{j|y=1}) = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi_y}\right.\log(p(y^{(i)};\phi_y) \\ = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi_y}\right.\log(\phi_y^{1 \lbrace y^{(i)} = 1 \rbrace}(1 - \phi_y)^{(1 - 1 \lbrace y^{(i)} = 1 \rbrace)} \\ = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi_y}\right.\lbrace(1 \lbrace y^{(i)} = 1 \rbrace\log\phi_y) + (1 - 1 \lbrace y^{(i)} = 1 \rbrace)\log(1 - \phi_y)\rbrace \\ = \sum_{i=1}^{m}\frac{1 \lbrace y^{(i)} = 1 \rbrace - \phi_y}{\phi_y(1-\phi_y)} \\ 令 \left.\frac{\partial}{\partial\phi_y}\right.\ell(\phi_y,\phi_{j|y=0},\phi_{j|y=1}) = 0 \Rightarrow \phi_y = \frac{\sum_{i=1}^{m}1 \lbrace y^{(i)} = 1 \rbrace}{m} ϕy(ϕy,ϕjy=0,ϕjy=1)=i=1mϕylog(p(y(i);ϕy)=i=1mϕylog(ϕy1{y(i)=1}(1ϕy)(11{y(i)=1})=i=1mϕy{(1{y(i)=1}logϕy)+(11{y(i)=1})log(1ϕy)}=i=1mϕy(1ϕy)1{y(i)=1}ϕyϕy(ϕy,ϕjy=0,ϕjy=1)=0ϕy=mi=1m1{y(i)=1}
④、
生成学习算法(Generative Learning algorithms)_第1张图片

最大化 ℓ \ell ,得出下面结果:
生成学习算法(Generative Learning algorithms)_第2张图片

2.1 拉普拉斯平滑(Laplace smoothing)

生成学习算法(Generative Learning algorithms)_第3张图片

2.2 文本分类的事件模型(Event models for text classi cation)

多项式事件模型(Multinomial Event Model)

对于第i个训练样本邮件,特征向量 x ( i ) = ( x 1 ( i ) , x 2 ( i ) , … , x n i ( i ) ) , n i = 邮 件 中 词 的 数 量 x^{(i)} = (x_{1}^{(i)},x_{2}^{(i)},\dots,x_{ni}^{(i)}),ni = 邮件中词的数量 x(i)=(x1(i),x2(i),,xni(i)),ni=,特征向量的每个元素 x j = { 1 , 2 , … , 50000 } 字 典 中 的 一 个 索 引 x_j = \lbrace 1,2,\dots,50000 \rbrace字典中的一个索引 xj={1,2,,50000}

生成模型 p ( x , y ) = { ∏ i = 1 n p ( x i , y ) } p ( y ) p(x,y) = \lbrace \prod_{i=1}^{n}p(x_{i},y) \rbrace p(y) p(x,y)={i=1np(xi,y)}p(y)

模型参数
ϕ k ∣ y = 1 = p ( x j = k ∣ y = 1 ) ϕ k ∣ y = 0 = p ( x j = k ∣ y = 0 ) ϕ y = p ( y = 1 ) \phi_{k|y=1} = p(x_j = k|y = 1) \\\phi_{k|y=0} = p(x_j = k|y = 0) \\\phi_{y} = p(y = 1) ϕky=1=p(xj=ky=1)ϕky=0=p(xj=ky=0)ϕy=p(y=1)
极大似然参数推导过程
①、
L ( ϕ y , ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) = ∏ i = 1 m p ( x ( i ) , y ( i ) ) = ∏ i = 1 m { ∏ j = 1 n i p ( x j ( i ) ∣ y ( i ) ; ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) } p ( y ( i ) ; ϕ y ) \mathcal{L}(\phi_y,\phi_{k|y=0},\phi_{k|y=1}) = \prod_{i=1}^{m}p(x^{(i)},y^{(i)}) \\ = \prod_{i=1}^{m}\lbrace\prod_{j=1}^{n_i}p(x_j^{(i)}|y^{(i)};\phi_{k|y=0},\phi_{k|y=1})\rbrace p(y^{(i)};\phi_y) L(ϕy,ϕky=0,ϕky=1)=i=1mp(x(i),y(i))=i=1m{j=1nip(xj(i)y(i);ϕky=0,ϕky=1)}p(y(i);ϕy)
②、
ℓ ( ϕ y , ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) = log ⁡ L ( ϕ y , ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) = log ⁡ ∏ i = 1 m { ∏ j = 1 n i p ( x j ( i ) ∣ y ( i ) ; ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) } p ( y ( i ) ; ϕ y ) = ∑ i = 1 m { log ⁡ ( ∏ j = 1 n i p ( x j ( i ) ∣ y ( i ) ; ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) ) + log ⁡ ( p ( y ( i ) ; ϕ y ) ) } \ell(\phi_y,\phi_{k|y=0},\phi_{k|y=1}) = \log \mathcal{L}(\phi_y,\phi_{k|y=0},\phi_{k|y=1}) \\ = \log\prod_{i=1}^{m}\lbrace \prod_{j=1}^{n_i}p(x_j^{(i)}|y^{(i)};\phi_{k|y=0},\phi_{k|y=1})\rbrace p(y^{(i)};\phi_y) \\ = \sum_{i=1}^{m} \lbrace \log(\prod_{j=1}^{n_i}p(x_j^{(i)}|y^{(i)};\phi_{k|y=0},\phi_{k|y=1})) + \log(p(y^{(i)};\phi_y)) \rbrace (ϕy,ϕky=0,ϕky=1)=logL(ϕy,ϕky=0,ϕky=1)=logi=1m{j=1nip(xj(i)y(i);ϕky=0,ϕky=1)}p(y(i);ϕy)=i=1m{log(j=1nip(xj(i)y(i);ϕky=0,ϕky=1))+log(p(y(i);ϕy))}
③、
∂ ∂ ϕ y ℓ ( ϕ y , ϕ k ∣ y = 0 , ϕ k ∣ y = 1 ) = ∑ i = 1 m ∂ ∂ ϕ y log ⁡ ( p ( y ( i ) ; ϕ y ) = ∑ i = 1 m ∂ ∂ ϕ y log ⁡ ( ϕ y 1 { y ( i ) = 1 } ( 1 − ϕ y ) ( 1 − 1 { y ( i ) = 1 } ) = ∑ i = 1 m ∂ ∂ ϕ y { ( 1 { y ( i ) = 1 } log ⁡ ϕ y ) + ( 1 − 1 { y ( i ) = 1 } ) log ⁡ ( 1 − ϕ y ) } = ∑ i = 1 m 1 { y ( i ) = 1 } − ϕ y ϕ y ( 1 − ϕ y ) 令 ∂ ∂ ϕ y ℓ ( ϕ y , ϕ i ∣ y = 0 , ϕ i ∣ y = 1 ) = 0 ⇒ ϕ y = ∑ i = 1 m 1 { y ( i ) = 1 } m \left.\frac{\partial}{\partial\phi_y}\right.\ell(\phi_y,\phi_{k|y=0},\phi_{k|y=1}) = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi_y}\right.\log(p(y^{(i)};\phi_y) \\ = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi_y}\right.\log(\phi_y^{1 \lbrace y^{(i)} = 1 \rbrace}(1 - \phi_y)^{(1 - 1 \lbrace y^{(i)} = 1 \rbrace)} \\ = \sum_{i=1}^{m}\left.\frac{\partial}{\partial\phi_y}\right.\lbrace(1 \lbrace y^{(i)} = 1 \rbrace\log\phi_y) + (1 - 1 \lbrace y^{(i)} = 1 \rbrace)\log(1 - \phi_y)\rbrace \\ = \sum_{i=1}^{m}\frac{1 \lbrace y^{(i)} = 1 \rbrace - \phi_y}{\phi_y(1-\phi_y)} \\ 令 \left.\frac{\partial}{\partial\phi_y}\right.\ell(\phi_y,\phi_{i|y=0},\phi_{i|y=1}) = 0 \Rightarrow \phi_y = \frac{\sum_{i=1}^{m}1 \lbrace y^{(i)} = 1 \rbrace}{m} ϕy(ϕy,ϕky=0,ϕky=1)=i=1mϕylog(p(y(i);ϕy)=i=1mϕylog(ϕy1{y(i)=1}(1ϕy)(11{y(i)=1})=i=1mϕy{(1{y(i)=1}logϕy)+(11{y(i)=1})log(1ϕy)}=i=1mϕy(1ϕy)1{y(i)=1}ϕyϕy(ϕy,ϕiy=0,ϕiy=1)=0ϕy=mi=1m1{y(i)=1}
④、
生成学习算法(Generative Learning algorithms)_第4张图片

你可能感兴趣的:(机器学习)