Image Captioning Metrics —— CIDEr的计算

简介

  1. TF-IDF
  2. 余弦相似性
  3. CIDEr算法
    CIDEr: Consensus-based Image Description Evaluation
  4. Examples
  5. Code
    released code of my github CaptionMetrics

TF-IDF

全称Term Frequency–Inverse Document Frequency,TF词频,IDF逆文本频率。
博客:
文本挖掘预处理之TF-IDF

余弦相似性

余弦相似性–维基百科

CIDEr算法

  1. 定义
    待测数据集规模为 N N N.
    候选集(Candidates) C = { c 1 , c 2 , . . . , c N } C=\{c_{1}, c_{2}, ..., c_{N}\} C={c1,c2,...,cN}.
    参照集(References) S i = { s i 1 , s i 2 , . . . , s i M } S_{i} = \{ s_{i1}, s_{i2}, ..., s_{iM}\} Si={si1,si2,...,siM},其中 M M M表示参照集句子数量, i i i表示第 i i i个图像.

  2. TF-IDF
    下面对候选集 c i c_{i} ci计算其 n − g r a m n-gram ngram 的TF-IDF weight.
    g k ( c i ) = T F ( k ) ∗ I D F ( k ) g_{k}(c_{i}) = TF(k) * IDF(k) gk(ci)=TF(k)IDF(k)
    T F ( k ) = h k ( c i ) ∑ h l ( c i ) TF(k) = \frac{h_{k}(c_{i})}{\sum h_{l}(c_{i})} TF(k)=hl(ci)hk(ci)
    I D F ( k ) = l o g ( N ∑ 1 N m i n ( 1 , ∑ 1 M h k ( c i ) ) IDF(k) = log(\frac{N}{\sum_{1}^{N} min(1, \sum_{1}^{M} h_{k}(c_{i})}) IDF(k)=log(1Nmin(1,1Mhk(ci)N)
    e.g.
    g k ( c i ) g_{k}(c_{i}) gk(ci)表示n-gram ω k \omega_{k} ωk 的TF-IDF weight.
    h k ( c i ) h_{k}(c_{i}) hk(ci)表示n-gram ω k \omega_{k} ωk 在句子 c i c_{i} ci中出现的次数.
    ∑ h l ( c i ) \sum h_{l}(c_{i}) hl(ci)表示在数据集上所有的n-gram ω l \omega_{l} ωl 在句子 c i c_{i} ci中出现次数之和.
    ∑ 1 N m i n ( 1 , ∑ 1 M h k ( c i ) \sum_{1}^{N} min(1, \sum_{1}^{M} h_{k}(c_{i}) 1Nmin(1,1Mhk(ci)表示n-gram ω k \omega_{k} ωk 在数据集文本中总共出现的次数.最小值是1,最大值是N.

  3. CIDEr
    n = 1, 2, 3, 4对应n-gram的n,如1-gram,2-gram,3-gram,4-gram.
    C I D E r n ( c i , s i ) = 1 M ∗ ∑ j = 1 M g n ( c i ) ∗ g n ( s i j ) ∣ ∣ g n ( c i ) ∣ ∣ ∗ ∣ ∣ g n ( s i j ) ∣ ∣ CIDEr_{n}(c_{i}, s_{i}) = \frac{1}{M}*\sum_{j=1}^M \frac{g^n(c_{i})*g^n(s_{ij})}{||g^n(c_{i})||*||g^n(s_{ij})||} CIDErn(ci,si)=M1j=1Mgn(ci)gn(sij)gn(ci)gn(sij)
    e.g.
    g n ( c i ) g^n(c_{i}) gn(ci)表示在句子 c i c_{i} ci的所有n-gram ω k \omega_{k} ωk 的TF-IDF weight 向量.

Examples

候选集 c 1 c_{1} c1 = {我 吃 饭 了 吗},参照集 S 11 S_{11} S11 = {他 早 上 吃 饭 了},以1-gram举例,数据集数量N=1,每个参照集句子数量M=1.

n-gram ω k \omega_{k} ωk g k ( c 1 ) g_{k}(c_{1}) gk(c1) g k ( s 11 ) g_{k}(s_{11}) gk(s11)
0.2 0.16
0.2 0.16
0.2 0.16
0.2 0
0.2 0
0 0.16
0 0.16
0 0.16

e.g.
C I D E r 1 = [ 0.2 , 0.2 , 0.2 , 0.2 , 0.2 , 0 , 0 , 0 ] 1 ∗ n T ∗ [ 0.16 , 0.16 , 0.16 , 0 , 0 , 0.16 , 0.16 , 0.16 ] n ∗ 1 0. 2 2 + 0. 2 2 + 0. 2 2 + 0. 2 2 + 0. 2 2 + 0 2 + 0 2 + 0 2 ∗ 0.1 6 2 + 0.1 6 2 + 0.1 6 2 + 0 2 + 0 2 + 0.1 6 2 + 0.1 6 2 + 0.1 6 2 = 0.5477 CIDEr_{1} = \frac{[0.2, 0.2, 0.2 ,0.2, 0.2, 0, 0, 0]^T_{1*n} * [0.16, 0.16, 0.16, 0, 0, 0.16, 0.16, 0.16]_{n*1}} {\sqrt{0.2^2+0.2^2+0.2^2+0.2^2+0.2^2+0^2+0^2+0^2} * \sqrt{0.16^2+0.16^2+0.16^2+0^2+0^2+0.16^2+0.16^2+0.16^2}} = 0.5477 CIDEr1=0.22+0.22+0.22+0.22+0.22+02+02+02 0.162+0.162+0.162+02+02+0.162+0.162+0.162 [0.2,0.2,0.2,0.2,0.2,0,0,0]1nT[0.16,0.16,0.16,0,0,0.16,0.16,0.16]n1=0.5477
注意,因为 N = 1 N=1 N=1所以,理论上每个n-gram的 I D F ( k ) = 0 IDF(k)=0 IDF(k)=0,但是为了避免这种情况,令 I D F ( k ) = 1 IDF(k)=1 IDF(k)=1.

Code

code放在了我的github上CaptionMetrics,参照coco release的metrics.

你可能感兴趣的:(论文阅读)