Probabilistic Graphical Models-Markov Network(Pairwise MN| General Gibbs Distribution| CRF)

Pairwise Markov Network

The normalized measure to get normalized probability distribution is called the partition function(or simply normalizing constant).

Probabilistic Graphical Models-Markov Network(Pairwise MN| General Gibbs Distribution| CRF)_第1张图片

Consider the pairwise factor ϕ1(A,B). That potential is proportional to none of

  • The marginal probability P(A,B)
  • The conditional probability P(A|B)
  • The conditional probability P(A,B|C,D) 

General Gibb’s Distribution

Fully connected pairwise MN is not fully expressive(represent any probability distribution over the random variables) —— not every distribution can be represented as a pairwise Markov network.

 

Probabilistic Graphical Models-Markov Network(Pairwise MN| General Gibbs Distribution| CRF)_第2张图片

An edge in the network between two nodes means that those two nodes can influence each other directly. If X and Y occur in the same factor, then that means they can influence each other directly.

In order to parameterize what we call a general Gibb’s Distribution, we’re going to parameterize it using general factors each of which has a scope that might contain more than two variables.

We can not read the factorization from a graph.That is, we have different factorizations that are quite different than their expressive power, all of which induce the exact same graph.

 

Q:The difference between an active trail in a Markov Network and an active trail in a Bayesian Network?

A:They are different in the case where Z is the descendant of a v-structure.

Let's look at a v-structure and Z is the descendent of the v-structure. If Z is observed in Bayesian network, it will create an active trail allowing influence to flow. If Z is observed in a Markov Network, then influence stops at Z and the trail is inactive.


Conditional Random Field

Conditional Random Field(CRF)在Task-specific prediction中很常用,假设我们有一些输入变量(观察变量)集合和一个目标变量集合,我们想要预测目标变量,与Markov Random field不同(其中input X和target Y是同种类型的变量),我们这里举两个例子:

  Image segmentation Text processing
input X pixel values(which we can process to produce more expressive features) words in a sentence
target Y label of a particular super-pixel(grass,sky,cow) label of the words(person, location, organization)

 

processed features(not trivial features,也就是上面的input variables)彼此之间是非常correlated的,那么它们就存在着很多redundant信息。如果我们使用bayes model建模,建立一个这样的模型:

Probabilistic Graphical Models-Markov Network(Pairwise MN| General Gibbs Distribution| CRF)_第3张图片

假设features是independent的,那么因为这个不合理的假设,得到的就是有偏差(skewed)的概率分布。那么什么是正确的independent假设呢,如果我们加一些edge来capture这样的correlations,不仅是很难的,而且还造成了densely connected models。

“So a completely different solution to this problem, basically says, well, I don't care about the image features. I don't want to predict the probability distributions over pixels, you know, I am not trying to do image synthesis.”

实际上我们对于X的分布不感兴趣,更别提它们之间的correlation了,我们关心的是给定X的条件下,Y的分布。所以别model联合概率P(X,Y)了,干脆对条件概率P(X|Y)进行建模。这就引出了CRF,我们先来看一下定义。

Probabilistic Graphical Models-Markov Network(Pairwise MN| General Gibbs Distribution| CRF)_第4张图片

CRF的定义看上去很像Gibbs distribution,这里也有一个factor集合,factor有各自的scope,和Gibbs distribution一样,将这些factors相乘得到一个unnormalized measure,不同的是这里我要model的是条件概率,所以这里的separate normalization constant(partition function)是X的一个函数,是一个X specific partition function(for every given X来sum up Y)。

你可能感兴趣的:(NetWork)