本文翻译自:https://rasa.com/docs/action-server/knowledge-bases
仅供学习参考。
1、为什么要引入知识图谱?
答曰:在对话中,用户的输入并不总是某些对象的名字,而是用第几个或者它之类的引用话术,那么我们就需要跟踪这些对象信息,以便解析为用户所理解的正确对象;
并且用户还可能希望在对话中获得对象的详细的信息,比如《黑客帝国》的主演有谁?那么由于对象的信息非常多变,如果采取硬编码工程量太大,所以rasa提供了集成知识库来应对此挑战;
要使用此集成,可以创建从ActionQueryKnowledgeBase继承的自定义操作。
2、创建一个knowledge Base
一开始为了熟悉,我将先使用InMemoryKnowledgeBase,这个是什么意思?就是知识库是在内存中,还不是存储在数据库中,当数据量非常大的时候才创建(后续再研究);
那么为了初始化InMemoryKnowledgeBase,我们需要有一个json文件,下面为示例,下面的json包含了三个餐厅和三个酒店的信息,restaurant里面的是键值对,每个同级的对象都应该包含相同的键值对,其中id和name是必要的,如果你不想要,则必须要去修改InMemoryKnowledgeBase
{
"restaurant": [
{
"id": 0,
"name": "Donath",
"cuisine": "Italian",
"outside-seating": true,
"price-range": "mid-range"
},
{
"id": 1,
"name": "Berlin Burrito Company",
"cuisine": "Mexican",
"outside-seating": false,
"price-range": "cheap"
},
{
"id": 2,
"name": "I due forni",
"cuisine": "Italian",
"outside-seating": true,
"price-range": "mid-range"
}
],
"hotel": [
{
"id": 0,
"name": "Hilton",
"price-range": "expensive",
"breakfast-included": true,
"city": "Berlin",
"free-wifi": true,
"star-rating": 5,
"swimming-pool": true
},
{
"id": 1,
"name": "Hilton",
"price-range": "expensive",
"breakfast-included": true,
"city": "Frankfurt am Main",
"free-wifi": true,
"star-rating": 4,
"swimming-pool": false
},
{
"id": 2,
"name": "B&B",
"price-range": "mid-range",
"breakfast-included": false,
"city": "Berlin",
"free-wifi": false,
"star-rating": 1,
"swimming-pool": false
},
]
}
3、定义NLU
新意图:query_knowledge_base,为了让bot知道用户希望从知识库中进行检索;
对提及的实体进行注释,以便模型能够检测到类似于“第一个”这样的引用话术;
广泛的使用同义词。
ActionQueryKnowledgeBase可以处理两种请求:
查询特定类型的对象列表,回到上面的例子就是查询有多少restaurant,那么将会返回一个列表;
查询某个对象的特定属性,这就是更细的查询了,比如查询restaurant里面Donath的cuisine;
那么意图应该包含这两种请求的多种变式。
- intent: query_knowledge_base
examples: |
- what [restaurants]{"entity": "object_type", "value": "restaurant"} can you recommend?
- list some [restaurants]{"entity": "object_type", "value": "restaurant"}
- can you name some [restaurants]{"entity": "object_type", "value": "restaurant"} please?
- can you show me some [restaurant]{"entity": "object_type", "value": "restaurant"} options
- list [German]{"entity": "cuisine"} [restaurants]{"entity": "object_type", "value": "restaurant"}
- do you have any [mexican]{"entity": "cuisine"} [restaurants]{"entity": "object_type", "value": "restaurant"}?
- do you know the [price range]{"entity": "attribute", "value": "price-range"} of [that one]{"entity": "mention"}?
- what [cuisine]{"entity": "attribute"} is [it]{"entity": "mention"}?
- do you know what [cuisine]{"entity": "attribute"} the [last one]{"entity": "mention", "value": "LAST"} has?
- does [Donath]{"entity": "restaurant"} have [outside seating]{"entity": "attribute", "value": "outside-seating"}?
- what is the [price range]{"entity": "attribute", "value": "price-range"} of [Berlin Burrito Company]{"entity": "restaurant"}?
- what is with [I due forni]{"entity": "restaurant"}?
- Do you also have any [Vietnamese]{"entity": "cuisine"} [restaurants]{"entity": "object_type", "value": "restaurant"}?
- What about any [Mexican]{"entity": "cuisine", "value": "mexican"} [restaurants]{"entity": "object_type", "value": "restaurant"}?
- Do you also know some [Italian]{"entity": "cuisine"} [restaurants]{"entity": "object_type", "value": "restaurant"}?
- can you tell me the [price range]{"entity": "attribute", "value": "price-range"} of [that restaurant]{"entity": "mention"}?
- what [cuisine]{"entity": "attribute"} do [they]{"entity": "mention"} have?
- what [hotels]{"entity": "object_type", "value": "hotel"} can you recommend?
- please list some [hotels]{"entity": "object_type", "value": "hotel"} in [Frankfurt am Main]{"entity": "city"} for me
- what [hotels]{"entity": "object_type", "value": "hotel"} do you know in [Berlin]{"entity": "city"}?
- name some [hotels]{"entity": "object_type", "value": "hotel"} in [Berlin]{"entity": "city"}
- show me some [hotels]{"entity": "object_type", "value": "hotel"}
- what are [hotels]{"entity": "object_type", "value": "hotel"} in [Berlin]{"entity": "city"}
- does the [last]{"entity": "mention", "value": "LAST"} one offer [breakfast]{"entity": "attribute", "value": "breakfast-included"}?
- does the [second one]{"entity": "mention", "value": "2"} [include breakfast]{"entity": "attribute", "value": "breakfast-included"}?
- what is the [price range]{"entity": "attribute", "value": "price-range"} of the [second]{"entity": "mention", "value": "2"} hotel?
- does the [first]{"entity": "mention", "value": "1"} one have [wifi]{"entity": "attribute", "value": "free-wifi"}?
- does the [third]{"entity": "mention", "value": "3"} one have a [swimming pool]{"entity": "attribute", "value": "swimming-pool"}?
- what is the [star rating]{"entity": "attribute", "value": "star-rating"} of [Berlin Wall Hostel]{"entity": "hotel"}?
- Does the [Hilton]{"entity": "hotel"} have a [swimming pool]{"entity": "attribute", "value": "swimming-pool"}?
要在nlu中指定和注释一下的实体:
object_type (对象类型):每当Nlu中引用知识库的特定对象类型,该对象类型应该标记为实体,如 restaurant ,它是知识库里面的键;
mention (提及,引用):如果用户通过“第一个”、 “那个”、“它” 来引用对象,那么应该把这些属于标记为mention;
attribute:在知识库中的所有属性在nlu中有应该标记为attribute,可以使用同义词将属性名称映射到知识库中的名称。
还需要在domain中增加这些实体:
entities:
- object_type
- mention
- attribute
slots:
object_type:
type: unfeaturized
mention:
type: unfeaturized
attribute:
type: unfeaturized
4、创建一个action去查询你的知识库
action.py
from rasa_sdk.knowledge_base.storage import InMemoryKnowledgeBase
from rasa_sdk.knowledge_base.actions import ActionQueryKnowledgeBase
class MyKnowledgeBaseAction(ActionQueryKnowledgeBase):
def __init__(self):
knowledge_base = InMemoryKnowledgeBase("data.json")
super().__init__(knowledge_base)
必须 要将知识库传递给构造的类,现在这样是InMemoryKnowledgeBase,当然也可以是自己的知识库,但是需要注意只能从一个知识库里面提取信息,暂时还不支持同时使用多个知识库。
不要忘记将此action加入到domain 中
domain.yml
actions:
- action_query_knowledge_base
5、查询知识库对象
为了能够查询任何知识库的对象,用户请求需要包含object_type对象类型,来看下例子
用户:Can you please name some restaurants?
这个问题中包含了"restaruants",bot需要获取这个实体以行程查询
用户:What Italian restaurant options in Berlin do I have?
这个问题中用户希望获得(1)有意大利料理(2)位于柏林的餐厅列表,那么NER在处理中将获得这些属性,然后通过这些属性去知识库中过滤找到对应的餐厅。
相应的需要在NLU中做配合
intents:
- intent: query_knowledge_base
examples: |
- What [Italian](cuisine) [restaurant](object_type) options in [Berlin](city) do I have?.
这里的cuisine和city应该与知识库中的相对应,并讲这些作为实体和词槽加入domain里面。
6、查询知识库对象属性
这种查询用于用户希望查询有关对象的特定信息,那么该请求应该包含感兴趣的对象和属性,例如:
用户:What is the cuisine of Berlin Burrito Company?
其中cuisine(感兴趣的对象的属性) Berlin Burrito Company(感兴趣的对象)
NLU中也要有训练的数据
intents:
- intent: query_knowledge_base
examples: |
- What is the [cuisine](attribute) of [Berlin Burrito Company](restaurant)?
7、解决引用话术的问题
按照上面的例子,用户有可能不是用具体的名字来指代餐馆,也可能提及引用先前列出的对象,比如:
用户:What is the cuisine of the second restaurant you mentioned?
rasa可以解析两种引用类型:(1)有序停用,如“第一个”,(2)例如“它”或者“那一个”。
8、有序引用
当用户根据对象在列表中的位置引用对象时,成为顺序引用,如:
User: What restaurants in Berlin do you know?
Bot: Found the following objects of type ‘restaurant’: 1: I due forni 2: PastaBar 3: Berlin Burrito Company
User: Does the first one have outside seating?
有序引用通常在向用户展示对象列表的时候使用,为了能够解析这些引用为实际对象,在KnowledgeBase里面设置了有序的引用映射, /rasa-sdk/knowledge_base/storage.py
storage.py
class KnowledgeBase:
def __init__(self) -> None:
self.ordinal_mention_mapping = {
"1": lambda l: l[0],
"2": lambda l: l[1],
"3": lambda l: l[2],
"4": lambda l: l[3],
"5": lambda l: l[4],
"6": lambda l: l[5],
"7": lambda l: l[6],
"8": lambda l: l[7],
"9": lambda l: l[8],
"10": lambda l: l[9],
"ANY": lambda l: random.choice(l),
"LAST": lambda l: l[-1],
}
....
如果提到“第一个”,那么可以在NLU中使用同义词来做映射
intents:
- intent: query_knowledge_base
examples: |
- Does the [first one]{entity: "mention", value": 1} have [outside seating]{entity: "attribute", value": "outside-seating"}
9、其他引用
看下面的一段对话
User: What is the cuisine of PastaBar?
Bot: PastaBar has an Italian cuisine.
User: Does it have wifi?
Bot: Yes.
User: Can you give me an address
如果NER检测到it, 那么知识库操作会将它解析为会话中最后提到的对象"PastaBar" 。