此文章记录一些机器学习的相关知识点、公式及书写方法
y = w x + b \LARGE {y=wx+b} y=wx+b
σ ( x ) = 1 1 + e − x \LARGE {\sigma(x) = {1 \above{1pt} 1+e^{-x}}} σ(x)=1+e−x1
σ ( x ) = 1 1 + e − ( w x + b ) \LARGE {\sigma(x) = {1 \above{1pt} 1+e^{-(wx+b)}}} σ(x)=1+e−(wx+b)1
G i n i _ i n d e x ( D , a ) = ∑ v = 1 V D v D G i n i ( D v ) \LARGE Gini\_index(D, a) = \displaystyle \sum_{v=1}^V{D^v\above{1pt}D}Gini(D^v) Gini_index(D,a)=v=1∑VDDvGini(Dv)
G i n i ( D ) = 1 − ∑ k = 1 ∣ y ∣ P k 2 \LARGE Gini(D)=1-\displaystyle \sum_{k=1}^{|y|}P_k^2 Gini(D)=1−k=1∑∣y∣Pk2
P ( A B ) = P ( B ∣ A i ) ∗ P ( A i ) \LARGE P(AB) = P(B|A_i)*P(A_i) P(AB)=P(B∣Ai)∗P(Ai)
ps:
P ( B ) = ∑ k = 1 n P ( B ∣ A k ) ∗ P ( A k ) \LARGE P(B) = \displaystyle \sum_{k=1}^{n}P(B|A_k)*P(A_k) P(B)=k=1∑nP(B∣Ak)∗P(Ak)
ps:
P ( A i ∣ B ) = P ( A B ) P ( B ) = P ( B ∣ A i ) ∗ P ( A i ) ∑ k = 1 n P ( B ∣ A k ) ∗ P ( A k ) \LARGE P(A_i|B) = {P(AB) \above{1pt} P(B)} = {P(B|A_i)*P(A_i) \above{1pt} \displaystyle \sum_{k=1}^{n}P(B|A_k)*P(A_k)} P(Ai∣B)=P(B)P(AB)=k=1∑nP(B∣Ak)∗P(Ak)P(B∣Ai)∗P(Ai)
ps:
设: A = [ a 1 , a 2 , . . . a n ] 设:\LARGE A=[a_1,a_2,...a_n] 设:A=[a1,a2,...an]
则: ∣ A ∣ = a 1 2 + a 2 2 + . . . + a n 2 = ∑ i = 1 n a i 2 则:\LARGE |A| = \sqrt{\smash[]{a_1^2+a_2^2+...+a_n^2}} = \sqrt{\smash[]{ \displaystyle \sum_{i=1}^{n}a_i^2}} 则:∣A∣=a12+a22+...+an2=i=1∑nai2
设: A = [ a 1 , a 2 , . . . a n ] , B = [ b 1 , b 2 . . . b n ] 设:\Large A=[a_1,a_2,...a_n],B=[b_1,b_2...b_n] 设:A=[a1,a2,...an],B=[b1,b2...bn]
则: A ⋅ B = ∣ A ∣ ∣ B ∣ cos θ = a 1 ∗ b 1 + a 2 ∗ b 2 + . . . + a n ∗ b n = ∑ i = 1 n a i ∗ b i 则:\Large A \cdot B = |A||B|\cos\theta = a_1*b_1+a_2*b_2+...+a_n*b_n = \displaystyle \sum_{i=1}^{n}a_i*b_i 则:A⋅B=∣A∣∣B∣cosθ=a1∗b1+a2∗b2+...+an∗bn=i=1∑nai∗bi
设: A = [ a 1 , a 2 , . . . a n ] , B = [ b 1 , b 2 . . . b n ] 设:\Large A=[a_1,a_2,...a_n],B=[b_1,b_2...b_n] 设:A=[a1,a2,...an],B=[b1,b2...bn]
则: s i m i l a r i t y = cos ( θ ) = 向量的内积 向量模的乘积 = 向量的内积 向量 L 2 范数的乘积 = A ⋅ B ∣ A ∣ ∣ B ∣ = ∑ i = 1 n a i ∗ b i ∑ i = 1 n a i 2 ∗ ∑ i = 1 n b i 2 则: \begin{align} \Large similarity & \Large = \cos(\theta) = {向量的内积 \above{1pt} 向量模的乘积} = {向量的内积 \above{1pt} 向量L2范数的乘积} \nonumber\\ & \Large = {A \cdot B \above{1pt} |A||B|} = {\displaystyle \sum_{i=1}^{n}a_i*b_i \above{1pt} \sqrt{\smash[]{ \displaystyle \sum_{i=1}^{n}a_i^2}} * \sqrt{\smash[]{ \displaystyle \sum_{i=1}^{n}b_i^2}}} \nonumber\\ \end{align} 则:similarity=cos(θ)=向量模的乘积向量的内积=向量L2范数的乘积向量的内积=∣A∣∣B∣A⋅B=i=1∑nai2∗i=1∑nbi2i=1∑nai∗bi
P ( x 1 , x 2 , x 3 . . . x n ∣ θ ) = ∏ i = 1 n P ( x i ∣ θ ) \LARGE P(x_1,x_2,x_3...x_n|\theta) = \displaystyle \prod_{i=1}^{n}P(x_i|\theta) P(x1,x2,x3...xn∣θ)=i=1∏nP(xi∣θ)
ps:
如果随机变量X只取0和1两个值,并且相应的概率为: Pr(X=1)=p,Pr(X=0)=1−p,0<p<1
P r ( X = 1 ) = p , P r ( X = 0 ) = 1 − p , 0 < p < 1 \LARGE Pr(X=1)=p,Pr(X=0)=1-p,0
则称随机变量X服从参数为p的伯努利分布,X的概率函数可写为:
f ( x ∣ p ) = p x ( 1 − p ) 1 − x = { p x = 0 1 − p x = 1 0 x / = 0 , 1 \LARGE f(x|p) = p^x(1-p)^{1-x}= \begin{cases} p & x=0 \\ 1-p & x=1 \\ 0 & x \mathrlap{\,/}{ = } 0,1 \end{cases} f(x∣p)=px(1−p)1−x=⎩ ⎨ ⎧p1−p0x=0x=1x/=0,1
令q=1一p的话,也可以写成下面这样:
f ( x ∣ p ) = { p x q 1 − x x = 0 , 1 0 x / = 0 , 1 \LARGE f(x|p) = \begin{cases} p^xq^{1-x} & x=0,1 \\ 0 & x \mathrlap{\,/}{ = } 0,1 \end{cases} f(x∣p)=⎩ ⎨ ⎧pxq1−x0x=0,1x/=0,1
ps:
定义:伯努利分布指的是对于随机变量X有, 参数为p(0
什么样的事件遵循伯努利分布:任何我们只有一次实验和两个可能结果的事件都遵循伯努利分布【例如:抛硬币、猫狗分类】
某个事件发生的信息量可以定义成如下形式
F ( p ) = − log 2 p \LARGE F(p) = -\log_2p F(p)=−log2p
ps:
对概率系统 P P P 求熵 H H H 可定义为对系统 P P P 求信息量 f f f 的期望
H ( P ) : = E ( P f ) = ∑ i = 1 m p i ∗ f ( p i ) = ∑ i = 1 m p i ( − l o g 2 p i ) = − ∑ i = 1 m p i ∗ l o g 2 p i \begin{align} \LARGE H(P):& \LARGE =E(P_f) \nonumber\\ & \LARGE = \displaystyle \sum_{i=1}^{m} p_i*f(p_i) \nonumber\\ & \LARGE = \displaystyle \sum_{i=1}^{m} p_i(-log_2p_i) \nonumber\\ & \LARGE = - \displaystyle \sum_{i=1}^{m} p_i*log_2p_i \nonumber\\ \end{align} H(P):=E(Pf)=i=1∑mpi∗f(pi)=i=1∑mpi(−log2pi)=−i=1∑mpi∗log2pi
系统熵的求解过程简单来说,就是把系统里面所有 可能发生事件的信息量 − l o g 2 p i -log_2p_i −log2pi 求出来然后和这个 事件发生的概率 p i p_i pi 相乘,最后把这些 结果 − l o g 2 p i ∗ p i -log_2p_i*p_i −log2pi∗pi 相加,得到的就是这个系统的熵
ps:
相对熵用于计算两个系统之间的熵的差距,公式如下:
D K L ( P ∣ ∣ Q ) : = ∑ i = 1 m p i ∗ ( f Q ( q i ) − f P ( p i ) ) = ∑ i = 1 m p i ∗ ( ( − log 2 q i ) − ( − log 2 p i ) ) = ∑ i = 1 m p i ∗ ( − log 2 q i ) − ∑ i = 1 m p i ∗ ( − log 2 p i ) = H ( P , Q ) − H ( P ) \begin{align} \LARGE D_{KL} (P||Q):& \LARGE = \displaystyle \sum_{i=1}^{m} p_i*(f_Q(q_i) - f_P(p_i)) \nonumber\\ & \LARGE = \displaystyle \sum_{i=1}^{m} p_i*((-\log_2q_i) - (-\log_2p_i)) \nonumber\\ & \LARGE = \displaystyle \sum_{i=1}^{m} p_i*(-\log_2q_i) - \displaystyle \sum_{i=1}^{m} p_i*(-\log_2p_i) \nonumber\\ & \LARGE = H(P,Q) - H(P) \nonumber\\ \end{align} DKL(P∣∣Q):=i=1∑mpi∗(fQ(qi)−fP(pi))=i=1∑mpi∗((−log2qi)−(−log2pi))=i=1∑mpi∗(−log2qi)−i=1∑mpi∗(−log2pi)=H(P,Q)−H(P)
ps:
基本公式如下
H ( P , Q ) = ∑ i = 1 m x i ∗ ( − log 2 y i ) \LARGE H(P,Q)=\displaystyle \sum_{i=1}^{m} x_i*(-\log_2y_i) H(P,Q)=i=1∑mxi∗(−log2yi)
考虑正反两面的情况后可以写成如下形式
H ( P , Q ) = − ( ∑ i = 1 n ( x i ∗ log 2 y i + ( 1 − x i ) ∗ log 2 ( 1 − y i ) ) ) \Large H(P,Q)=-( \displaystyle \sum_{i=1}^{n} (x_i*\log_2 y_i + (1-x_i)*\log_2(1-y_i))) H(P,Q)=−(i=1∑n(xi∗log2yi+(1−xi)∗log2(1−yi)))
待补充…
参考视频
https://www.bilibili.com/video/BV1WX4y1g7bx