本小结使用的文Chatterbot版本1.0.5,Python版本:3.6
1、准备数据和训练代码
(1)准备一个yaml的文件数据,文件名称:data_yaml.yaml
categories:
- hadoop
conversations:
- - Hi, can I help you?
- Sure, I'd like to book a flight to Iceland.
- - Your flight has been booked.
- good
- are you ok?
(2)在和yaml文件相同的目录下创建Python文件,文件名称:mongodb_yaml_example.py,代码如下:
from chatterbot import ChatBot
from chatterbot.trainers import ChatterBotCorpusTrainer
chatbot = ChatBot('blogs',
database_uri="mongodb://test:123456@localhost:27017/chatterbot-blogs",
# 如果使用的是无密码登录使用:mongodb://localhost:27017/chatterbot-blogs
storage_adapter="chatterbot.storage.MongoDatabaseAdapter",
logic_adapters=["chatterbot.logic.BestMatch"],
preprocessors=['chatterbot.preprocessors.clean_whitespace'],
read_only=True
)
trainer = ChatterBotCorpusTrainer(chatbot)
trainer.train("./data_yaml.yaml")
response = chatbot.get_response('Hi, can I help you?')
print(response)
可以直接执行。
2、训练结果数据解释
通过训练之后会在chatterbot-blogs的mongodb库中生成名为statements的一张表。下面通过我们的训练数据和statements表中的数据对比解释。
我们从表中拿出一条数据:
{
"_id" : ObjectId("5d2fe85f1b0f32431a4b771e"),
"id" : null,
"text" : "Sure, I'd like to book a flight to Iceland.",
"search_text" : "NNP:i'd NNP:kind TO:publication DT:formation TO:iceland",
"conversation" : "training",
"persona" : "",
"in_response_to" : "Hi, can I help you?",
"search_in_response_to" : "PRP:support",
"created_at" : ISODate("2019-07-18T11:32:47.533Z"),
"tags" : [
"hadoop"
]
}
从表中的数据可以看到一些字段的:
text:我们输出的训练数据的一句话
search_text:对text进行分词的搜索的语句,目前好像不支持中文
in_reponse_to:当前训练对话中的上一句
tags:在训练中设置的categories字段属性
训练数据使用的是conversations字段属性下的所有的数据,而每个 - - 开始为一个完整的对话,在训练的结果数据中会有上下文的。例如data_yaml.yaml中的训练数据。
下面贴出data_yaml.yaml文件中最后一个对话的训练结果:
训练的一段对话:
- - Your flight has been booked.
- good
- are you ok?
1.第一句
{
"_id" : ObjectId("5d2feb229b054df231a15c72"),
"id" : null,
"text" : "Your flight has been booked.",
"search_text" : "PRP$:formation VBN:schedule",
"conversation" : "training",
"persona" : "",
"in_response_to" : null,
"search_in_response_to" : "",
"created_at" : ISODate("2019-07-18T11:44:34.479Z"),
"tags" : [
"hadoop"
]
}
2.第二句
{
"_id" : ObjectId("5d2feb229b054df231a15c73"),
"id" : null,
"text" : "good",
"search_text" : "good",
"conversation" : "training",
"persona" : "",
"in_response_to" : "Your flight has been booked.",
"search_in_response_to" : "PRP$:formation VBN:schedule",
"created_at" : ISODate("2019-07-18T11:44:34.483Z"),
"tags" : [
"hadoop"
]
}
3.第三句
{
"_id" : ObjectId("5d2feb229b054df231a15c74"),
"id" : null,
"text" : "are you ok?",
"search_text" : "PRP:ok",
"conversation" : "training",
"persona" : "",
"in_response_to" : "good",
"search_in_response_to" : "good",
"created_at" : ISODate("2019-07-18T11:44:34.487Z"),
"tags" : [
"hadoop"
]
}