点互信息

点互信息

Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory andstatistics.

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence.

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (w.r.t. the joint distribution 

).

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/philip-pmi.pdf

Information-theory approach to find

collocations

– Measure of how much one word tells us about the

other. How much information we gain

– Can be negative or positive

点互信息

Problems with PMI

• Bad with sparse data

– Suppose some words only occur once, but appear

together

– Get very high score PMI score

– Consider our word clouds. High PMI score might

not necessarily indicate importance of bigram

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

点互信息互信息而来

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

Finally, 

 will increase if 

 is fixed but 

decreases.

这就是一个不好的地方 如果联系紧密 必然一同出现 p(x|y) 那么取决于p(x)的值大小 越不常见的x 值越大 假设 p(y|x)=1 完全相同共现 就就取决于变量的出现频度了 只出现一次分数最高 偏爱稀有 低频情况

Bad with word dependence

– Suppose two words are perfectly dependent on

eachother

– Whenever one occurs, the other occurs

– I(x, y) = log (1 / P(y))

– So the rarer the word is, the higher the PMI is

– High PMI score doesn't mean high word

dependence (could just mean rarer words)

– Threshold on word frequencies

   

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

   

可以看做局部一个点的互信息

   

考虑互信息

   

来自 <http://en.wikipedia.org/wiki/Mutual_information>

   

   

来自 <http://en.wikipedia.org/wiki/Mutual_information>

It can take positive or negative values, but is zero if X and Y areindependent. PMI maximizes when X and Y are perfectly associated, yielding the following bounds:

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

例子

x

y

p(xy)

0

0

0.1

0

1

0.7

1

0

0.15

1

1

0.05

Using this table we can marginalize to get the following additional table for the individual distributions:

  

p(x)

p(y)

0

.8

0.25

1

.2

0.75

With this example, we can compute four values for 

. Using base-2 logarithms:

pmi(x=0;y=0)

1

pmi(x=0;y=1)

0.222392421

pmi(x=1;y=0)

1.584962501

pmi(x=1;y=1)

1.584962501

(For reference, the mutual information 

 would then be 0.214170945)

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

和互信息的相似处

Where 

 is the self-information, or 

.

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

正规化的pmi npmi

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.

   

完全共现的时候 可以认为 p(x,y) = p(x)=p(y) 结合

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

Chain-rule for pmi

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

没太明白 这个TODO

This is easily proven by:

点互信息

   

来自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

   

   

   

   

   

   

你可能感兴趣的:(点互信息)