pytorch bilstm-crf的crf score代码实现详解

本文适合已经基本了解crf原理的读者,深入探究代码层面的实现原理

1. _forward_alg代码

    def _forward_alg(self, feats):
        # Do the forward algorithm to compute the partition function
        init_alphas = torch.full((1, self.tagset_size), -10000.)
        # START_TAG has all of the score.
        init_alphas[0][self.tag_to_ix[START_TAG]] = 0.

        # Wrap in a variable so that we will get automatic backprop
        forward_var = init_alphas

        # Iterate through the sentence
        for feat in feats:
            alphas_t = []  # The forward tensors at this timestep
            for next_tag in range(self.tagset_size):
                # broadcast the emission score: it is the same regardless of
                # the previous tag
                emit_score = feat[next_tag].view(
                    1, -1).expand(1, self.tagset_size)
                # the ith entry of trans_score is the score of transitioning to
                # next_tag from i
                trans_score = self.transitions[next_tag].view(1, -1)
                # The ith entry of next_tag_var is the value for the
                # edge (i -> next_tag) before we do log-sum-exp
                next_tag_var = forward_var + trans_score + emit_score
                # The forward variable for this tag is log-sum-exp of all the
                # scores.
                alphas_t.append(log_sum_exp(next_tag_var).view(1))
            forward_var = torch.cat(alphas_t).view(1, -1)
        terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]
        alpha = log_sum_exp(terminal_var)
        return alpha

这是官方的实现代码,我们知道crf的极大似然概率为

pytorch bilstm-crf的crf score代码实现详解_第1张图片

其中y为我们标注的正确转移路径,~y未所有可能的路径。那么在计算所有路径的和时,为什么可以像上述代码那样,每一层独立计算,然后传递到下一层,最后一层的结果就是最后想要的结果呢?以下给出证明。

2.证明过程

2.1 先看一种最简单的情况

A->B,其中A只有一种情况a,B有三种状态[b1,b2,b3]

假设传播到A的分数为S,转移到b1,b2,b3的概率分别为t1,t2,t3

那么有:

                                       \large log(e^{S+t1} + e^{S+t2} + e^{S+t3}) = log(e^{S}e^{t1} + e^{S}e^{t2} + e^{S}e^{t3} ) =log(e^{S}(e^{t1} + e^{t2} + e^{t3})) =loge^{S} + log(e^{t1} + e^{t2} + e^{t3})

可以看到路径的最后结果等于先计算每个层再相加。

2.2 序列标注情况下的推广

由于转移矩阵中,每一层转移的计算的输出维度一致,所以如果B有三种情况,那么A一定也有三种分数,上诉推广为:

                                \large log(e^{S1+t1} + e^{S1+t2} + e^{S1+t3} + e^{S2+t1} + e^{S2+t2} + e^{S2+t3} + e^{S3+t1} + e^{S3+t2} + e^{S3+t3}) = log((e^{S1}e^{t1} + e^{S1}e^{t2} + e^{S1}e^{t3} ) + (e^{S2}e^{t1} + e^{S2}e^{t2} + e^{S2}e^{t3} ) + (e^{S3}e^{t1} + e^{S3}e^{t2} + e^{S3}e^{t3} )) =log(e^{S1}(e^{t1} + e^{t2} + e^{t3}) + e^{S2}(e^{t1} + e^{t2} + e^{t3}) + e^{S3}(e^{t1} + e^{t2} + e^{t3})) =log(e^{S1} + e^{S2} + e^{S3}) + log(e^{t1} + e^{t2} + e^{t3})

可见上述代码的巧妙,将本来很难计算的全路径分解到计算层分数上。

 

你可能感兴趣的:(pytorch,NLP,深度学习)