BLUE 评价指标


BLUE通常用来衡量一组机器产生的翻译句子集合 (candidates) 与一组人工翻译句子 (references) 的相似程度。


candidate: The cat sat on the mat.
reference: The cat is on the mat.
  • BLUE-1

candidate {the, cat, sat, on, the, mat} 中有5个在 reference 中,即 blue1=5/6=0.83

  • BLUE-2

candidate {the cat, cat sat, sat on, on the, the mat} 中有3个在 reference 中,即 blue2=3/5=0.6

  • BLUE-3

candidate {the cat sat, cat sat on, sat on the, on the mat} 中有1个在 reference 中,即 blue3=1/4=0.25

  • BLUE-4

candidate {the cat sat on, cat sat on the, sat on the mat}中有0个在 reference 中,即 blue4=0/3=0
