基于医疗知识图谱的问答系统运行步骤-注意事项

本项目下载的是中科院刘焕勇的源码

https://github.com/liuhuanyong/QASystemOnMedicalKG

下载后如何运行的步骤方法:

(1)安装neo4j数据库以及相应的包,安装Neo4j时要先安装JDKjava开发工具包。要注意使用的版本问题,Neo4j是版本4的,Java是1.8版本的,在本项目中使用的是py2neo=4.3.0版本的数据包,太高不可以运行。

以下是关于安装Neo4j的相关链接以及基础了解:

https://blog.csdn.net/sinat_36226553/article/details/108541370?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522164862259616782094864946%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=164862259616782094864946&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~top_click~default-3-108541370.142^v5^pc_search_result_control_group,143^v6^register&utm_term=neo4j&spm=1018.2226.3001.4187

https://so.csdn.net/so/search?q=neo4j&spm=1001.2101.3001.7020

运行步骤以及版本等相关操作

(2)python 安装py2neo和pyahocorasick包,安装pyahocorasick的时候报错,提示安装Visual Studio Build Tools:
先安装 Microsoft Visual C++ :在 https://visualstudio.microsoft.com/downloads/ 下载Build Tools, 安装后,在模块选择里勾选Visual Studio Build Tools里面的C++ Build Tools。
有的人说直接用anaconda安装pyahocorasick不需要安装VC,具体我没试过。
(3)接着运行程序:
1)先修改build_medicalgraph和answer_search的user和password,改成你的neo4j的账号名和密码

基于医疗知识图谱的问答系统运行步骤-注意事项_第1张图片

2)然后在build_medicalgraph的最后两行添加:
handler.create_graphnodes()
handler.create_graphrels()
3)运行build_medicalgraph,有的可能会报错:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 81: illegal multibyte sequence.
把有open的地方加上encoding=‘utf-8’

 

4)数据很多,会运行几个小时,运行完之后打开neo4j explore,就有节点和图 

基于医疗知识图谱的问答系统运行步骤-注意事项_第2张图片

 

5)再运行chatbot_graph.py,输入你想问的问题,就会出来答案

基于医疗知识图谱的问答系统运行步骤-注意事项_第3张图片

 关于模型代码的解析:

(1)对于知识图谱的构建,首先是数据的获取,数据主要是通过爬虫获取到的,且是结构化数据,对于半结构化数据无需从句子或文章中进行知识抽取等相关操作,最终本文主要是通过将数据保存成json格式使用数据。构建数据这部分主要是构建实体类型,属性以及关系的相关操作,源代码中有相应的注解,就不在此贴出相关的代码解释了。代码还包括了问句的分类、解析、对解析结果的查询以及返回查询问句结果几部分,代码包括自己的理解,如有其他见解或错误请提出,仅代表我个人的理解。

(2)部分代码片段

问句分类部分

import os
import ahocorasick
#自动机
#可实现自动批量匹配字符串的作用,即可一次返回该条字符串中命中的所有关键词

class QuestionClassifier:
    def __init__(self):
        #cur_dir 是当前目录,其中[:-1]可以达到返回上一层的效果
        #获取的绝对路径os.path.abspath(__file__)
        cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
        # 特征词路径
        self.disease_path = os.path.join(cur_dir, 'dict/disease.txt')
        self.department_path = os.path.join(cur_dir, 'dict/department.txt')
        self.check_path = os.path.join(cur_dir, 'dict/check.txt')
        self.drug_path = os.path.join(cur_dir, 'dict/drug.txt')
        self.food_path = os.path.join(cur_dir, 'dict/food.txt')
        self.producer_path = os.path.join(cur_dir, 'dict/producer.txt')
        self.symptom_path = os.path.join(cur_dir, 'dict/symptom.txt')
        self.deny_path = os.path.join(cur_dir, 'dict/deny.txt')
        # 加载特征词,七类词包括七种实体部分的词和构建的领域词和一些否定词
        self.disease_wds= [i.strip() for i in open(self.disease_path,encoding='utf-8') if i.strip()]
        self.department_wds= [i.strip() for i in open(self.department_path,encoding='utf-8') if i.strip()]
        self.check_wds= [i.strip() for i in open(self.check_path,encoding='utf-8') if i.strip()]
        self.drug_wds= [i.strip() for i in open(self.drug_path,encoding='utf-8') if i.strip()]
        self.food_wds= [i.strip() for i in open(self.food_path,encoding='utf-8') if i.strip()]
        self.producer_wds= [i.strip() for i in open(self.producer_path,encoding='utf-8') if i.strip()]
        self.symptom_wds= [i.strip() for i in open(self.symptom_path,encoding='utf-8') if i.strip()]
        self.region_words = set(self.department_wds + self.disease_wds + self.check_wds + self.drug_wds + self.food_wds + self.producer_wds + self.symptom_wds)
        self.deny_words = [i.strip() for i in open(self.deny_path,encoding='utf-8') if i.strip()]
        # 构造领域actree
        self.region_tree = self.build_actree(list(self.region_words))
        # 构建词典-格式比如{'感冒':'disease'....}
        self.wdtype_dict = self.build_wdtype_dict()
        # 问句疑问词,问句疑问包含了疾病的属性和边相关的问题词
        self.symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
        self.cause_qwds = ['原因','成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
        self.acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
        self.food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜' ,'忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物','补品']
        self.drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
        self.prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止','躲避','逃避','避开','免得','逃开','避开','避掉','躲开','躲掉','绕开',
                             '怎样才能不', '怎么才能不', '咋样才能不','咋才能不', '如何才能不',
                             '怎样才不', '怎么才不', '咋样才不','咋才不', '如何才不',
                             '怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不',
                             '怎样才可不', '怎么才可不', '咋样才可不', '咋才可不', '如何可不']
        self.lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
        self.cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
        self.cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
        self.easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
        self.check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
        self.belong_qwds = ['属于什么科', '属于', '什么科', '科室']
        self.cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
                          '有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']

        print('model init finished ......')

        return

    '''分类主函数'''
    def classify(self, question):
        data = {}
        # # check_medical 是定义在后面的函数
        # 搜寻最终提取词的信息 比如{'感冒‘:’diseases‘.....}
        medical_dict = self.check_medical(question)
        if not medical_dict:
            return {}
        data['args'] = medical_dict
        #收集问句当中所涉及到的实体类型
        types = []
        for type_ in medical_dict.values():
            types += type_
        question_type = 'others'

        question_types = []

        # 症状
        if self.check_words(self.symptom_qwds, question) and ('disease' in types):
            question_type = 'disease_symptom'
            question_types.append(question_type)

        if self.check_words(self.symptom_qwds, question) and ('symptom' in types):
            question_type = 'symptom_disease'
            question_types.append(question_type)

        # 原因
        if self.check_words(self.cause_qwds, question) and ('disease' in types):
            question_type = 'disease_cause'
            question_types.append(question_type)
        # 并发症
        if self.check_words(self.acompany_qwds, question) and ('disease' in types):
            question_type = 'disease_acompany'
            question_types.append(question_type)

        # 推荐食品
        if self.check_words(self.food_qwds, question) and 'disease' in types:
            deny_status = self.check_words(self.deny_words, question)
            if deny_status:
                question_type = 'disease_not_food'
            else:
                question_type = 'disease_do_food'
            question_types.append(question_type)

        #已知食物找疾病
        if self.check_words(self.food_qwds+self.cure_qwds, question) and 'food' in types:
            deny_status = self.check_words(self.deny_words, question)
            if deny_status:
                question_type = 'food_not_disease'
            else:
                question_type = 'food_do_disease'
            question_types.append(question_type)

        # 推荐药品
        if self.check_words(self.drug_qwds, question) and 'disease' in types:
            question_type = 'disease_drug'
            question_types.append(question_type)

        # 药品治啥病
        if self.check_words(self.cure_qwds, question) and 'drug' in types:
            question_type = 'drug_disease'
            question_types.append(question_type)

        # 疾病接受检查项目
        if self.check_words(self.check_qwds, question) and 'disease' in types:
            question_type = 'disease_check'
            question_types.append(question_type)

        # 已知检查项目查相应疾病
        if self.check_words(self.check_qwds+self.cure_qwds, question) and 'check' in types:
            question_type = 'check_disease'
            question_types.append(question_type)

        # 症状防御
        if self.check_words(self.prevent_qwds, question) and 'disease' in types:
            question_type = 'disease_prevent'
            question_types.append(question_type)

        # 疾病医疗周期
        if self.check_words(self.lasttime_qwds, question) and 'disease' in types:
            question_type = 'disease_lasttime'
            question_types.append(question_type)

        # 疾病治疗方式
        if self.check_words(self.cureway_qwds, question) and 'disease' in types:
            question_type = 'disease_cureway'
            question_types.append(question_type)

        # 疾病治愈可能性
        if self.check_words(self.cureprob_qwds, question) and 'disease' in types:
            question_type = 'disease_cureprob'
            question_types.append(question_type)

        # 疾病易感染人群
        if self.check_words(self.easyget_qwds, question) and 'disease' in types :
            question_type = 'disease_easyget'
            question_types.append(question_type)

        # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
        if question_types == [] and 'disease' in types:
            question_types = ['disease_desc']

        # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
        if question_types == [] and 'symptom' in types:
            question_types = ['symptom_disease']

        # 将多个分类结果进行合并处理,组装成一个字典
        data['question_types'] = question_types

        return data

    '''构造词对应的类型
    根据7类实体构造{特征词:特征词对应类型}词典。
    存储region_word中对应词汇的类型(疾病、科室)
    '''
    def build_wdtype_dict(self):
        wd_dict = dict()
        # region_words包含了一系列信息
        for wd in self.region_words:
            wd_dict[wd] = []
            #查询 关键词 是否在对应的列表中存在,若存在则添加,不存在返回空
            if wd in self.disease_wds:
                wd_dict[wd].append('disease')
            if wd in self.department_wds:
                wd_dict[wd].append('department')
            if wd in self.check_wds:
                wd_dict[wd].append('check')
            if wd in self.drug_wds:
                wd_dict[wd].append('drug')
            if wd in self.food_wds:
                wd_dict[wd].append('food')
            if wd in self.symptom_wds:
                wd_dict[wd].append('symptom')
            if wd in self.producer_wds:
                wd_dict[wd].append('producer')
        return wd_dict

#构造actree,加速过滤
#该函数构建领域actree,加速过滤。通过python的ahocorasick库实现。
#ahocorasick是一种字符串匹配算法,由两种数据结构实现:trie和Aho-Corasick自动机。
#Trie是一个字符串索引的词典,检索相关项时时间和字符串长度成正比。
#AC自动机能够在一次运行中找到给定集合所有字符串。AC自动机其实就是在Trie树上实现KMP,
#可以完成多模式串的匹配。
#具体ahocorasick用法非本文重点,
#可参考https://blog.csdn.net/pirage/article/details/51657178等博文。
#类似KMP,快速匹配

    def build_actree(self, wordlist):
        actree = ahocorasick.Automaton()#初始化trie树
        for index, word in enumerate(wordlist):
            actree.add_word(word, (index, word))#向trie树中添加单词
        actree.make_automaton()#将trie树转化成Aho-Corasick
        return actree


    #问句过滤
    #通过ahocorasick库的iter()函数匹配领域词,将有重复字符串的领域词去除短的,
   # 取最长的领域词返回。功能为过滤问句中含有的领域词,
  #  返回{问句中的领域词:词所对应的实体类型}。


 #   思路
#1.初始化
#词典:疾病、科室、检查项目、药物、食物、具体品牌的药、症状、表否定意义的词以及一个拥有全部词语的词典region_word
#把region_word中所有的词取出构造actree(为了加快后面的搜索速度):region_tree
#新建一个词典wdtype_dict,存储region_word中对应词汇的类型(疾病、科室...)
#构造同义词词典,便于理解用户意思,适应不同的表述方法
#2.分析用户的问题
#问句过滤(过滤出用户提到的领域内信息):通过region_tree查找出所有在词典region_word中出现的关键词,并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
#问题分类(判断用户具体已知什么求什么):通过同义词表和wdtype_dict关键词词典判断出用户的具体问题
#原文链接:https://blog.csdn.net/floracuu/article/details/113574130

#问句过滤(过滤出用户提到的领域信息)通过region_tree查找出所有在词典region_word中出现的关键词
#并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
    def check_medical(self, question):
        region_wds = []
        # region_tree 是一棵用region_wds 做出来的actree,快速找出question与之匹配的实体
        # 但是有时候匹配的结果与我们想的不一,比如“瓜烧白菜”和“白菜”是不一样的
        # 通过ahocorasick库的iter()函数匹配领域词
        # # ahocorasick库 匹配问题  iter返回一个元组,i的形式如(3, (23192, '乙肝'))
        for i in self.region_tree.iter(question):
            #wd是question用actree作了加速
            wd = i[1][1]  #匹配到的词
            region_wds.append(wd)
        #利用停用词过滤
        stop_wds = []
        for wd1 in region_wds:
            for wd2 in region_wds:
                #如果词语不一样,则添加较长的
                ##判断每对儿词之间的关系,选择更详细的加入词典
                #比如“内科”in“消化内科”,并且!=
                if wd1 in wd2 and wd1 != wd2:
                    stop_wds.append(wd1)#取短词
        #更新最后剩下的词语组合
        final_wds = [i for i in region_wds if i not in stop_wds]#取长词
        # 更新字典,格式比如{'感冒':'disease'....}
        final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}

        return final_dict


    #基于特征词进行分类
    #该函数检查问句中是否含有某实体类型内的特征词。

    def check_words(self, wds, sent):
        for wd in wds:
            if wd in sent:
                return True
        return False


if __name__ == '__main__':
    handler = QuestionClassifier()
    #问题输入到分类过程
    while 1:
        question = input('input an question:')
        data = handler.classify(question)
        print(data)

问句解析

#将用户问题转换成neo4j的查询语句

#1.将提取出的问题关键词按照类型合并
#2.循环取出问题字段,将其翻译成neo4j查询语句
"""

parser_main函数
该函数为问句解析主函数。
首先传入问句分类结果,获取问句中领域词及其实体类型。
接着调用build_entitydict函数,返回形如{'实体类型':['领域词'],...}的entity_dict字典。
然后对问句分类返回值中[‘question_types’]的每一个question_type,
调用sql_transfer函数转换为neo4j的Cypher语言。
最后组合每种question_type转换后的sql查询语句。
原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
"""
class QuestionPaser:
    # 如: args={'青光眼': ['disease'], '肺气肿': ['disease'], '消化内科': ['department']}
    # 合并后: entity_dict= {'disease': ['青光眼', '肺气肿'], 'department': ['消化内科']}
    #原文链接:https: // blog.csdn.net / floracuu / article / details / 113828998
    '''构建实体节点'''
    def build_entitydict(self, args):
        #args 实质是将函数传入的参数存储在元组类型的变量args中
        entity_dict = {}
        #键值和类型
        for arg, types in args.items():
            for type in types:
                if type not in entity_dict:
                    entity_dict[type] = [arg]
                else:
                    entity_dict[type].append(arg)

        return entity_dict

    '''解析主函数'''
    def parser_main(self, res_classify):
        # 取到关键词
        args = res_classify['args']
        # 合并同类型的字段
        entity_dict = self.build_entitydict(args)
        question_types = res_classify['question_types']
        sqls = []

        # 取到所有的问题类型,并且将问题类型转换为对应的sql语句,每次通过sql_{}转换为词典全部存入sqls[]
        # 其中sql_{}中一共有两个字段question_types和sql

        for question_type in question_types:
            sql_ = {}#变量后带下划线避免与系统关键词冲突。
            sql_['question_type'] = question_type
            sql = []
            if question_type == 'disease_symptom':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'symptom_disease':
                sql = self.sql_transfer(question_type, entity_dict.get('symptom'))

            elif question_type == 'disease_cause':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_acompany':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_not_food':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_do_food':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'food_not_disease':
                sql = self.sql_transfer(question_type, entity_dict.get('food'))

            elif question_type == 'food_do_disease':
                sql = self.sql_transfer(question_type, entity_dict.get('food'))

            elif question_type == 'disease_drug':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'drug_disease':
                sql = self.sql_transfer(question_type, entity_dict.get('drug'))

            elif question_type == 'disease_check':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'check_disease':
                sql = self.sql_transfer(question_type, entity_dict.get('check'))

            elif question_type == 'disease_prevent':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_lasttime':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_cureway':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_cureprob':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_easyget':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            elif question_type == 'disease_desc':
                sql = self.sql_transfer(question_type, entity_dict.get('disease'))

            if sql:
                sql_['sql'] = sql

                sqls.append(sql_)

        return sqls

    '''针对不同的问题,翻译成Neo4j的SQL语句'''
    def sql_transfer(self, question_type, entities):
        if not entities:
            return []

        # 查询语句
        sql = []
        # 查询疾病的原因
        if question_type == 'disease_cause':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cause".format(i) for i in entities]

        # 查询疾病的防御措施
        elif question_type == 'disease_prevent':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.prevent".format(i) for i in entities]

        # 查询疾病的持续时间
        elif question_type == 'disease_lasttime':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_lasttime".format(i) for i in entities]

        # 查询疾病的治愈概率
        elif question_type == 'disease_cureprob':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cured_prob".format(i) for i in entities]

        # 查询疾病的治疗方式
        elif question_type == 'disease_cureway':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_way".format(i) for i in entities]

        # 查询疾病的易发人群
        elif question_type == 'disease_easyget':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.easy_get".format(i) for i in entities]

        # 查询疾病的相关介绍
        elif question_type == 'disease_desc':
            sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.desc".format(i) for i in entities]

        # 查询疾病有哪些症状
        elif question_type == 'disease_symptom':
            sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]

        # 查询症状会导致哪些疾病
        elif question_type == 'symptom_disease':
            sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]

        # 查询疾病的并发症
        elif question_type == 'disease_acompany':
            sql1 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql2 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql = sql1 + sql2
        # 查询疾病的忌口
        elif question_type == 'disease_not_food':
            sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]

        # 查询疾病建议吃的东西
        elif question_type == 'disease_do_food':
            sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql = sql1 + sql2

        # 已知忌口查疾病
        elif question_type == 'food_not_disease':
            sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]

        # 已知推荐查疾病
        elif question_type == 'food_do_disease':
            sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql = sql1 + sql2

        # 查询疾病常用药品-药品别名记得扩充
        elif question_type == 'disease_drug':
            sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql = sql1 + sql2

        # 已知药品查询能够治疗的疾病
        elif question_type == 'drug_disease':
            sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
            sql = sql1 + sql2
        # 查询疾病应该进行的检查
        elif question_type == 'disease_check':
            sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]

        # 已知检查查询疾病
        elif question_type == 'check_disease':
            sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]

        return sql


#用cypher语句搜索问题类型,将找到的信息以python模式添加到答案里。
if __name__ == '__main__':
    handler = QuestionPaser()

解析后的结果查询

"""
问句解析之后需要对解析后的结果进行查询。
该脚本创建了一个AnswerSearcher类。与build_medicalgraph.py类似,
该类定义了Graph类的成员变量g和返回答案列举的最大个数num_list。
该类的成员函数有两个,一个查询主函数一个回复模块。

search_main函数
传入问题解析的结果sqls,将保存在queries里的[‘question_type’]和[‘sql’]分别取出。
首先调用self.g.run(query).data()函数执行[‘sql’]中的查询语句得到查询结果,
再根据[‘question_type’]的不同调用answer_prettify函数将查询结果和答案话术结合起来。
最后返回最终的答案。

answer_prettify函数
该函数根据对应的qustion_type,调用相应的回复模板。

原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281

"""
"""
执行neo4j查询语句并拼接成自然语言
"""
from py2neo import Graph

class AnswerSearcher:
    #链接数据库
    def __init__(self):
        self.g = Graph(
            host="127.0.0.1",
            http_port=7474,
            user="neo4j",
            password="101827bdx")
        self.num_limit = 20

    '''执行cypher查询,并返回相应结果'''
    def search_main(self, sqls):
        final_answers = []
        for sql_ in sqls:
            question_type = sql_['question_type']
            queries = sql_['sql']
            answers = []
            for query in queries:
                #执行sql语句
                ress = self.g.run(query).data()
                answers += ress
                #传过去当前问题和当前问题的所有回答
            final_answer = self.answer_prettify(question_type, answers)
            if final_answer:
                final_answers.append(final_answer)
        return final_answers

    '''根据对应的qustion_type,调用相应的回复模板'''
    def answer_prettify(self, question_type, answers):
        final_answer = []
        if not answers:
            return ''
        if question_type == 'disease_symptom':
            # 根据上文,m代表疾病,n代表查询另一端结点,此处是症状
            desc = [i['n.name'] for i in answers]
            # {0}{1}代表format函数中变量的位置
            # set方法是对元素进行去重,处理之后是一个字典形式,使用list是将其转化为列表
            # 将症状去重化为列表,将列表中所有项通过分号连接成完整的部分
            subject = answers[0]['m.name']
            final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'symptom_disease':
            desc = [i['m.name'] for i in answers]
            subject = answers[0]['n.name']
            final_answer = '症状{0}可能染上的疾病有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_cause':
            desc = [i['m.cause'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}可能的成因有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_prevent':
            desc = [i['m.prevent'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}的预防措施包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_lasttime':
            desc = [i['m.cure_lasttime'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}治疗可能持续的周期为:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_cureway':
            desc = [';'.join(i['m.cure_way']) for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}可以尝试如下治疗:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_cureprob':
            desc = [i['m.cured_prob'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}治愈的概率为(仅供参考):{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_easyget':
            desc = [i['m.easy_get'] for i in answers]
            subject = answers[0]['m.name']

            final_answer = '{0}的易感人群包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_desc':
            desc = [i['m.desc'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0},熟悉一下:{1}'.format(subject,  ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_acompany':
            desc1 = [i['n.name'] for i in answers]
            desc2 = [i['m.name'] for i in answers]
            subject = answers[0]['m.name']
            desc = [i for i in desc1 + desc2 if i != subject]
            final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_not_food':
            desc = [i['n.name'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}忌食的食物包括有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_do_food':
            do_desc = [i['n.name'] for i in answers if i['r.name'] == '宜吃']
            recommand_desc = [i['n.name'] for i in answers if i['r.name'] == '推荐食谱']
            subject = answers[0]['m.name']
            final_answer = '{0}宜食的食物包括有:{1}\n推荐食谱包括有:{2}'.format(subject, ';'.join(list(set(do_desc))[:self.num_limit]), ';'.join(list(set(recommand_desc))[:self.num_limit]))

        elif question_type == 'food_not_disease':
            desc = [i['m.name'] for i in answers]
            subject = answers[0]['n.name']
            final_answer = '患有{0}的人最好不要吃{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)

        elif question_type == 'food_do_disease':
            desc = [i['m.name'] for i in answers]
            subject = answers[0]['n.name']
            final_answer = '患有{0}的人建议多试试{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)

        elif question_type == 'disease_drug':
            desc = [i['n.name'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}通常的使用的药品包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'drug_disease':
            desc = [i['m.name'] for i in answers]
            subject = answers[0]['n.name']
            final_answer = '{0}主治的疾病有{1},可以试试'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'disease_check':
            desc = [i['n.name'] for i in answers]
            subject = answers[0]['m.name']
            final_answer = '{0}通常可以通过以下方式检查出来:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        elif question_type == 'check_disease':
            desc = [i['m.name'] for i in answers]
            subject = answers[0]['n.name']
            final_answer = '通常可以通过{0}检查出来的疾病有{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))

        return final_answer


if __name__ == '__main__':
    searcher = AnswerSearcher()

(3)本项目的问答系统完全基于规则匹配实现,通过关键词匹配,对问句进行分类, #医疗问题本身属于封闭域类场景,对领域问题进行穷举并分类, 然后使用cypher的match去匹配查找neo4j,根据返回数据组装问句回答,最后返回结果。  问答框架的构建是通过chatbot_graph.py、answer_search.py、 # question_classifier.py、question_parser.py等脚本实现。

资料链接直通:


B站讲解视频

基于医疗知识图谱的问答系统

原项目链接地址

你可能感兴趣的:(知识图谱,深度学习,pytorch,知识图谱)