机器学习——入门基础(贝叶斯分类器)

文章目录

    • 贝叶斯定理
    • 重新理解最大似然估计
    • 朴素贝叶斯
    • 半朴素贝叶斯分类器
    • EM算法

贝叶斯定理

引言

传送门,讲解到位:https://www.matongxue.com/madocs/279

考虑这样一种情况

事件A和事件B发生有交集(这里可以用集合概念)

那必然有一个条件概率的公式,在A事件发生下,B事件发生的概率:P(B|A) = P(AnB) / P(A),这其实很好理解,用面积的知识,发生A事件的概率面积作为分母,A和B相交的事件概率作为分子,这就求出了A事件发生情况下,B事件发生的概率

下面这个公式很好推导,也就是P(B|A) = P(AnB) / P(A)P(A|B) = P(BnA) / P(B),就可算得

img

朴素贝叶斯分类算法核心就是上述理论公式转换如下,注意这个式子是需要假设各个特征之间相互独立,特征之间不会产生影响

preview

重新理解最大似然估计

传送门,讲解到位:https://www.matongxue.com/madocs/447

首先要去理解似然和概率的关系?

当我们知道某一具体事件发生的可能性,这就是概率,比如我们知道在客观情况下,硬币正反面的参数是各占0.5,所以我们在扔10次硬币的时候出现5次朝上的概率下图计算,满足二项分布:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ENQiIRm6-1642848765031)(data:image/svg+xml;utf8,%3Csvg%20xmlns%3Axlink%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxlink%22%20class%3D%22mjx-svg-math%22%20width%3D%2226.843ex%22%20height%3D%226.176ex%22%20style%3D%22font-size%3A14px%3Bvertical-align%3A%20-2.505ex%3B%22%20viewBox%3D%220%20-1580.7%2011557.3%202659.1%22%20role%3D%22img%22%20focusable%3D%22false%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20aria-labelledby%3D%22MathJax-SVG-1-Title%22%3E%0A%3Ctitle%20id%3D%22MathJax-SVG-1-Title%22%3E%7B10%5Cchoose5%7D0.5%5E%7B5%7D(1-0.5)]%5E%7B5%7D%5Capprox0.25%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-28%22%20d%3D%22M94%20250Q94%20319%20104%20381T127%20488T164%20576T202%20643T244%20695T277%20729T302%20750H315H319Q333%20750%20333%20741Q333%20738%20316%20720T275%20667T226%20581T184%20443T167%20250T184%2058T225%20-81T274%20-167T316%20-220T333%20-241Q333%20-250%20318%20-250H315H302L274%20-226Q180%20-141%20137%20-14T94%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJSZ3-28%22%20d%3D%22M701%20-940Q701%20-943%20695%20-949H664Q662%20-947%20636%20-922T591%20-879T537%20-818T475%20-737T412%20-636T350%20-511T295%20-362T250%20-186T221%2017T209%20251Q209%20962%20573%201361Q596%201386%20616%201405T649%201437T664%201450H695Q701%201444%20701%201441Q701%201436%20681%201415T629%201356T557%201261T476%201118T400%20927T340%20675T308%20359Q306%20321%20306%20250Q306%20-139%20400%20-430T690%20-924Q701%20-936%20701%20-940Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-31%22%20d%3D%22M213%20578L200%20573Q186%20568%20160%20563T102%20556H83V602H102Q149%20604%20189%20617T245%20641T273%20663Q275%20666%20285%20666Q294%20666%20302%20660V361L303%2061Q310%2054%20315%2052T339%2048T401%2046H427V0H416Q395%203%20257%203Q121%203%20100%200H88V46H114Q136%2046%20152%2046T177%2047T193%2050T201%2052T207%2057T213%2061V578Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-30%22%20d%3D%22M96%20585Q152%20666%20249%20666Q297%20666%20345%20640T423%20548Q460%20465%20460%20320Q460%20165%20417%2083Q397%2041%20362%2016T301%20-15T250%20-22Q224%20-22%20198%20-16T137%2016T82%2083Q39%20165%2039%20320Q39%20494%2096%20585ZM321%20597Q291%20629%20250%20629Q208%20629%20178%20597Q153%20571%20145%20525T137%20333Q137%20175%20145%20125T181%2046Q209%2016%20250%2016Q290%2016%20318%2046Q347%2076%20354%20130T362%20333Q362%20478%20354%20524T321%20597Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-35%22%20d%3D%22M164%20157Q164%20133%20148%20117T109%20101H102Q148%2022%20224%2022Q294%2022%20326%2082Q345%20115%20345%20210Q345%20313%20318%20349Q292%20382%20260%20382H254Q176%20382%20136%20314Q132%20307%20129%20306T114%20304Q97%20304%2095%20310Q93%20314%2093%20485V614Q93%20664%2098%20664Q100%20666%20102%20666Q103%20666%20123%20658T178%20642T253%20634Q324%20634%20389%20662Q397%20666%20402%20666Q410%20666%20410%20648V635Q328%20538%20205%20538Q174%20538%20149%20544L139%20546V374Q158%20388%20169%20396T205%20412T256%20420Q337%20420%20393%20355T449%20201Q449%20109%20385%2044T229%20-22Q148%20-22%2099%2032T50%20154Q50%20178%2061%20192T84%20210T107%20214Q132%20214%20148%20197T164%20157Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-29%22%20d%3D%22M60%20749L64%20750Q69%20750%2074%20750H86L114%20726Q208%20641%20251%20514T294%20250Q294%20182%20284%20119T261%2012T224%20-76T186%20-143T145%20-194T113%20-227T90%20-246Q87%20-249%2086%20-250H74Q66%20-250%2063%20-250T58%20-247T55%20-238Q56%20-237%2066%20-225Q221%20-64%20221%20250T66%20725Q56%20737%2055%20738Q55%20746%2060%20749Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJSZ3-29%22%20d%3D%22M34%201438Q34%201446%2037%201448T50%201450H56H71Q73%201448%2099%201423T144%201380T198%201319T260%201238T323%201137T385%201013T440%20864T485%20688T514%20485T526%20251Q526%20134%20519%2053Q472%20-519%20162%20-860Q139%20-885%20119%20-904T86%20-936T71%20-949H56Q43%20-949%2039%20-947T34%20-937Q88%20-883%20140%20-813Q428%20-430%20428%20251Q428%20453%20402%20628T338%20922T245%201146T145%201309T46%201425Q44%201427%2042%201429T39%201433T36%201436L34%201438Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-2E%22%20d%3D%22M78%2060Q78%2084%2095%20102T138%20120Q162%20120%20180%20104T199%2061Q199%2036%20182%2018T139%200T96%2017T78%2060Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-2212%22%20d%3D%22M84%20237T84%20250T98%20270H679Q694%20262%20694%20250T679%20230H98Q84%20237%2084%20250Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-2248%22%20d%3D%22M55%20319Q55%20360%2072%20393T114%20444T163%20472T205%20482Q207%20482%20213%20482T223%20483Q262%20483%20296%20468T393%20413L443%20381Q502%20346%20553%20346Q609%20346%20649%20375T694%20454Q694%20465%20698%20474T708%20483Q722%20483%20722%20452Q722%20386%20675%20338T555%20289Q514%20289%20468%20310T388%20357T308%20404T224%20426Q164%20426%20125%20393T83%20318Q81%20289%2069%20289Q55%20289%2055%20319ZM55%2085Q55%20126%2072%20159T114%20210T163%20238T205%20248Q207%20248%20213%20248T223%20249Q262%20249%20296%20234T393%20179L443%20147Q502%20112%20553%20112Q609%20112%20649%20141T694%20220Q694%20249%20708%20249T722%20217Q722%20153%20675%20104T555%2055Q514%2055%20468%2076T388%20123T308%20170T224%20192Q164%20192%20125%20159T83%2084Q80%2055%2069%2055Q55%2055%2055%2085Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-32%22%20d%3D%22M109%20429Q82%20429%2066%20447T50%20491Q50%20562%20103%20614T235%20666Q326%20666%20387%20610T449%20465Q449%20422%20429%20383T381%20315T301%20241Q265%20210%20201%20149L142%2093L218%2092Q375%2092%20385%2097Q392%2099%20409%20186V189H449V186Q448%20183%20436%2095T421%203V0H50V19V31Q50%2038%2056%2046T86%2081Q115%20113%20136%20137Q145%20147%20170%20174T204%20211T233%20244T261%20278T284%20308T305%20340T320%20369T333%20401T340%20431T343%20464Q343%20527%20309%20573T212%20619Q179%20619%20154%20602T119%20569T109%20550Q109%20549%20114%20549Q132%20549%20151%20535T170%20489Q170%20464%20154%20447T109%20429Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%3Cg%20class%3D%22mjx-svg-texatom%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%3Cg%20class%3D%22mjx-svg-texatom%20mjx-svg-TeXmathchoice%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJSZ3-28%22%20class%3D%22mjx-svg-mo%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mfrac%22%20transform%3D%22translate(736%2C0)%22%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%20transform%3D%22translate(0%2C676)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-31%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-30%22%20x%3D%22500%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%20transform%3D%22translate(250%2C-686)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-texatom%20mjx-svg-TeXmathchoice%22%20transform%3D%22translate(1737%2C0)%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJSZ3-29%22%20class%3D%22mjx-svg-mo%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-msubsup%22%20transform%3D%22translate(2474%2C0)%22%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-30%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2E%22%20x%3D%22500%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%20x%3D%22779%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-texatom%22%20transform%3D%22translate(1279%2C412)%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mo%22%20transform%3D%22translate(4207%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-28%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%20transform%3D%22translate(4596%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-31%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mo%22%20transform%3D%22translate(5319%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2212%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%20transform%3D%22translate(6320%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-30%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2E%22%20x%3D%22500%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%20x%3D%22779%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-msubsup%22%20transform%3D%22translate(7599%2C0)%22%3E%0A%3Cg%20class%3D%22mjx-svg-mo%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-29%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-texatom%22%20transform%3D%22translate(389%2C412)%22%3E%0A%3Cg%20class%3D%22mjx-svg-mrow%22%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%3E%0A%20%3Cuse%20transform%3D%22scale(0.707)%22%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mo%22%20transform%3D%22translate(8721%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2248%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3Cg%20class%3D%22mjx-svg-mn%22%20transform%3D%22translate(9777%2C0)%22%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-30%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-2E%22%20x%3D%22500%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-32%22%20x%3D%22779%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%20x%3D%221279%22%20y%3D%220%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E)

组合数计算公式:[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-LygeqmeK-1642848765031)(https://bkimg.cdn.bcebos.com/formula/80656d15eeb5e98f4c910e64b83ba6a3.svg)]

那当我们不了解这个具体参数,要通过实践才能去推测这个参数,这就是似然,相当于多次实践归纳推理,那么得到最有可能的参数也就是最大似然估计

单次实验(抛10次,出现6次花面),我们不知道参数,那可以假设参数为0.5、0.6…,拿0.5和0.6计算,两次做比较发现0.6参数的可能性是0.5参数的1.2倍,那肯定是要选取最有可能的参数,自然就是0.6
机器学习——入门基础(贝叶斯分类器)_第1张图片

那接下来,用多次实验进行最大似然估计,我们每一次实验抛10次硬币,总共进行6次实验,假设结果为{4,5,5,2,7,4},分别是每一次实验出现花面的个数,当然每次实验是独立的,不受其他实验的干扰,再进行通式,用x1,x2…xn表示每次结果,则在同一参数下,所求的独立事件的联合概率为下图,(注意:这个参数我们并不知道),
在这里插入图片描述表示在同一个参数下的实验结果,也可以认为是条件概率在这里插入图片描述
我们想求的必然是L(theta)最大值,也就是L(theta)=argmax

朴素贝叶斯

假设样本各属性条件独立,互不影响,这就是朴素贝叶斯的思想,公式就是套用了贝叶斯定理,如下图所示,P©表示样本分类的概率(西瓜举例,好瓜还是坏瓜的概率对于样本而言),P(X|C)表示每个属性分类的概率连乘(西瓜举例,就是每个属性的好坏概率)

机器学习——入门基础(贝叶斯分类器)_第2张图片

那显然,P(X)都是一样的是常量,就比较关系而言可以省略,并且要求最大值,所以公式变换如下:

机器学习——入门基础(贝叶斯分类器)_第3张图片

这里还要阐述下对于离散和连续值属性的处理,离散值很好解决,直接统计个数/总样本数即可,连续值要求出该属性取值的均值和方差做正态分布,具体就是西瓜书上下图所言

机器学习——入门基础(贝叶斯分类器)_第4张图片

示例,比如现在来一条数据,判断好瓜还是坏瓜?

机器学习——入门基础(贝叶斯分类器)_第5张图片

数据计算过程如下

机器学习——入门基础(贝叶斯分类器)_第6张图片

但是我们设想这样一种,如果某个属性的分类后的概率为0(也就是该分类情况下没有该属性值的出现)?应该如何处理?

这里我们要使其平滑,常用的方法就是拉普拉斯修正,分子加1,分母为N,N表示该分类情况下该属性的次数,也就是下图公式

机器学习——入门基础(贝叶斯分类器)_第7张图片

半朴素贝叶斯分类器

假设每个属性在类别之外最多依赖一个其他属性,这就是ODE(独依赖估计),其实就是把依赖的属性作为类别一起考虑,pai就是依赖属性

机器学习——入门基础(贝叶斯分类器)_第8张图片

假设所有的属性都依赖于同一个”超夫“属性,这就是SPODE(超父独依赖估计),xi就是依赖的超父属性

机器学习——入门基础(贝叶斯分类器)_第9张图片

EM算法

暂时省略

类同于K-Means聚类算法的思想

传送门:https://www.bilibili.com/video/BV1i4411G7Xv?p=9&share_source=copy_web

你可能感兴趣的:(大数据——机器学习,机器学习,概率论,分类,朴素贝叶斯算法,贝叶斯)