机器人源码:GitHub地址
BLEU的计算使用nltk库的sentence_bleu函数
from nltk.translate.bleu_score import sentence_bleu
1. 首先我们需要从数据集中提取问题和参考答案
with open('train_data/xiaohuangji50w_nofenci.conv', 'r') as f:
# 读取数据集
f.readline() # 读取E无效信息
question = f.readline() # 读取问题
question = question[2:] # 去掉问题的前缀
_answer = f.readline() # 读取参考答案
_answer = _answer[2:] # 去前缀
2. 然后向机器人提问得到机器人的回答
answer = execute.predict(question_fenci)
3. 得到机器人的答案后,需要将参考答案和机器人给出的答案进行BLEU的测算。具体过程如下:
3.1 测算之前需要对参考答案和机器人的回答分词
这里使用jieba库进行分词
import jieba
# 分词
question_fenci = ' '.join(jieba.cut(question))
_answer_fenci = ' '.join(jieba.cut(_answer))
3.2 构造sentence_bleu的参数reference,candidate
reference是标准答案 是一个列表,可以有多个参考答案,每个参考答案都是分词后使用split()函数拆分的子列表
# 举个reference例子
reference = [['this', 'is', 'a', 'duck']]
reference.append(_answer_fenci.split())
candidate是对机器人的回答分词后经过split得到的一个词的列表
candidate = (answer_fenci.split())
3.3 下面就可以开始计算BLEU值了
score1 = sentence_bleu(reference, candidate, weights=(1, 0, 0, 0))
score2 = sentence_bleu(reference, candidate, weights=(0.5, 0.5, 0, 0))
score3 = sentence_bleu(reference, candidate, weights=(0.33, 0.33, 0.33, 0))
score4 = sentence_bleu(reference, candidate, weights=(0.25, 0.25, 0.25, 0.25))
weights的代表了1-gram 2-gram 3-gram 4-gram占得比重,缺省情况下为各占1/4。这样我们就完成了BLEU的测算。
关于BLEU怎么计算:这位博主讲的很清楚:BLEU算法
在调用sentence_bleu函数时可能会遇到下面的提示:
The hypothesis contains 0 counts of 2-gram overlaps.
Therefore the BLEU score evaluates to 0, independently of
how many N-gram overlaps of lower order it contains.
Consider using lower n-gram order or use SmoothingFunction()
warnings.warn(_msg)
这是因为你的答案中没有2-gram,2-gram就会得到一个非常小的值。
最后给出完整的代码:
# 计算前100、1k、1w项的BLEU值
import execute
import jieba
from nltk.translate.bleu_score import sentence_bleu
import time
count = 0
question = '' # 问题
_answer = '' # 参考答案
answer = '' # 机器人的回答
reference = [] # BLEU参考内容
candidate = [] # 聊天机器人返回的内容
# 分别计算1-gram 2-gram 3-gram 4-gram
score_total1 = 0
score_total2 = 0
score_total3 = 0
score_total4 = 0
i = 0
with open('train_data/xiaohuangji50w_nofenci.conv', 'r') as f:
# 这里用来忽略前1000个问答对,从1001个开始测试
# while i < 1000:
# f.readline()
# f.readline()
# f.readline()
# i += 1
# print('i: ' + str(i))
start_time = time.time()
# 更改判断条件即可选择测试的样本数
while count < 1000:
# 读取数据集
f.readline() # 读取E无效信息
question = f.readline() # 读取问题
question = question[2:] # 去掉问题的前缀
_answer = f.readline() # 读取参考答案
_answer = _answer[2:] # 去前缀
# 分词
question_fenci = ' '.join(jieba.cut(question))
_answer_fenci = ' '.join(jieba.cut(_answer))
# 与机器人聊天
# 使用了前面github提供的源码,中文聊天机器人。
answer = execute.predict(question_fenci)
# 答案分词
answer_fenci = ' '.join(jieba.cut(answer))
print('---------------分割线-----------------')
print('question_fenci: ' + str(question_fenci))
print('_answer_fenci: ' + str(_answer_fenci))
print('answer_fenci: ' + str(answer_fenci))
# 计算BLEU
reference.append(_answer_fenci.split())
candidate = (answer_fenci.split())
score1 = sentence_bleu(reference, candidate, weights=(1, 0, 0, 0))
score2 = sentence_bleu(reference, candidate, weights=(0.5, 0.5, 0, 0))
score3 = sentence_bleu(reference, candidate, weights=(0.33, 0.33, 0.33, 0))
score4 = sentence_bleu(reference, candidate, weights=(0.25, 0.25, 0.25, 0.25))
reference.clear()
print('Cumulate 1-gram :%f' \
% score1)
print('Cumulate 2-gram :%f' \
% score2)
print('Cumulate 3-gram :%f' \
% score3)
print('Cumulate 4-gram :%f' \
% score4)
score_total1 += score1
score_total2 += score2
score_total3 += score3
score_total4 += score4
count += 1
print('count:' + str(count) + ' score: ' + str(score1))
print('count:' + str(count) + ' score: ' + str(score2))
print('count:' + str(count) + ' score: ' + str(score3))
print('count:' + str(count) + ' score: ' + str(score4))
print('---------------分割线-----------------')
print('最终结果')
print('测试耗时:' + str(time.time() - start_time))
print('count: ' + str(count))
print('score_tatal1: ' + str(score_total1))
print('BLEU 1-gram: ' + str(score_total1 / count))
print('---------------分割线-----------------')
print('score_tatal2: ' + str(score_total2))
print('BLEU 2-gram: ' + str(score_total2 / count))
print('---------------分割线-----------------')
print('score_tatal3: ' + str(score_total3))
print('BLEU 3-gram: ' + str(score_total3 / count))
print('---------------分割线-----------------')
print('score_tatal4: ' + str(score_total4))
print('BLEU 4-gram: ' + str(score_total4 / count))