标题:local learning in probabilistic networks with hidden variables



neuralnetworks,which represent complex input/output relations using combinations ofsimple nonlinear processing elements, are a familiar tool in AI andcomputational neuroscience.

Probabilisticnetworks(also called belief networks or Bayesian networks) are a more explicitrepresentation of the joint probability distribution characterizing a problemdomain, providing a topological description of the causal relationships amongvariables.


Computationalmodels in AI are judged by two main criteria:ease of creation and effectivenessin decision making.这句话指出了在AI领域如何评价一个计算模型的好坏。

Probalilitytheory views the world as a set of random variables X1…Xn, each of which has adomain of possible values. The key concept in probability theory is the jointprobability distribution, which specifies a probability for each possiblecombination of values for all the random variables.这段话指出了各个变量直接的联系关系,下面所示的贝叶斯信念网就是这种联系的形象化体现:

论文读书笔记-local learning in probabilistic networks with hidden variables_第1张图片

Formally,a probabilistic network is defined by a directed acyclic graph together with aconditional probability table(CPT) associated with each node. The CPTassociated with variable X specifies the conditional distributionP(X|Parents(X)).


论文读书笔记-local learning in probabilistic networks with hidden variables_第2张图片


2、learning probabilistic networks


l  知道了网络结构和全部可观察的变量,这时我们只需要学习CPT中值即可,这个较为简单。

l  不知道网络结构但是知道全部可观察变量,这时需要构造网络的拓扑结构,用到的算法一般是贪心,这个较为复杂

l  结构知道但是存在不知道的隐藏变量,这就是这篇论文要解决的问题,这些隐藏变量往往不能忽视,如果删去就会给网络增加复杂度,如下图所示:


Thealgorithm is provided with a network structure and initial (randomly generated)values for the CPTs. It is presented with a set D of data cases D1, . . . ,Dm. Theobject is to find the CPT parameters w that best model the data. We adopt aBayesian notion of “best.” More specifically, we assume that each possible settingof w is equally likely a priori, so that the maximum likelihood model isappropriate. This means that the aim is to maximize Pw(D), the probabilityassigned by the network to the observed data when the CPT parameters are set tow.


In ourderivation, we will use the standard notation wijk to denote a specific CPTentry, the probability that variable Xi takes on its jth possible valueassignment given that its parents Ui take on their kth possible valueassignment:

论文读书笔记-local learning in probabilistic networks with hidden variables_第3张图片


论文读书笔记-local learning in probabilistic networks with hidden variables_第4张图片




