Datawhale 知识图谱组队学习
1,知识图谱的概念:从学术的角度,知识图谱本质上是语义网络的知识库;从实际应用角度,知识图谱可以理解成多关系图。
2,知识图谱的构建:(1)数据来源:业务本身的数据;网络上公开、抓取的数据(需要NLP技术提取结构化信息)
(2)信息抽取的难点:处理非结构化数据
(3)涉及的技术有:命名实体识别、关系抽取、实体统一、指代消解
3,知识图谱的存储:基于RDF的存储(偏学术上);基于图数据库的存储(neo4j偏工业上)
4,Neo4J的安装使用:安装配置jdk及neo4j社区版,cmd进入neo4j安装路径bin目录下执行:neo4j.bat console,打开浏览器,输入http://127.0.0.1:7474/,界面最上方就是交互的输入框。
5,Neo4J实战:(同时按住shift + enter,命令换行)
创建节点:CREATE (n:Person {name:'John'}) RETURN n
注:
CREATE是创建操作,Person是标签,代表节点的类型。
花括号{}代表节点的属性,属性类似Python的字典。
这条语句的含义就是创建一个标签为Person的节点,该节点具有一个name属性,属性值是John。
创建更多的人物节点,并分别命名:(多行,以英文;隔开)
CREATE (n:Person {name:'Sally'}) RETURN n;
CREATE (n:Person {name:'Steve'}) RETURN n;
CREATE (n:Person {name:'Mike'}) RETURN n;
CREATE (n:Person {name:'Liz'}) RETURN n;
CREATE (n:Person {name:'Shawn'}) RETURN n;
创建地区节点:(节点类型为Location,属性包括city和state)
CREATE (n:Location {city:'Miami', state:'FL'});
CREATE (n:Location {city:'Boston', state:'MA'});
CREATE (n:Location {city:'Lynn', state:'MA'});
CREATE (n:Location {city:'Portland', state:'ME'});
CREATE (n:Location {city:'San Francisco', state:'CA'});
创建关系:
朋友关系
MATCH (a:Person {name:'Liz'}),
(b:Person {name:'Mike'})
MERGE (a)-[:FRIENDS]->(b)
关系增加属性
MATCH (a:Person {name:'Shawn'}),
(b:Person {name:'Sally'})
MERGE (a)-[:FRIENDS {since:2001}]->(b)
增加更多的朋友关系
MATCH (a:Person {name:'Shawn'}), (b:Person {name:'John'}) MERGE (a)-[:FRIENDS {since:2012}]->(b)
MATCH (a:Person {name:'Mike'}), (b:Person {name:'Shawn'}) MERGE (a)-[:FRIENDS {since:2006}]->(b)
MATCH (a:Person {name:'Sally'}), (b:Person {name:'Steve'}) MERGE (a)-[:FRIENDS {since:2006}]->(b)
MATCH (a:Person {name:'Liz'}), (b:Person {name:'John'}) MERGE (a)-[:MARRIED {since:1998}]->(b)
创建节点的时候就建好关系:
CREATE (a:Person {name:'Todd'})-[r:FRIENDS]->(b:Person {name:'Carlos'})
图数据库查询:
MATCH (a:Person)-[:BORN_IN]->(b:Location {city:'Boston'}) RETURN a,b
MATCH (a)--() RETURN a
MATCH (a)-[r]->() RETURN a.name, type(r)
MATCH (a)-[r]->() RETURN a.name, type(r)
删除和修改:
MATCH (a:Person {name:'Liz'}) SET a.age=34
MATCH (a:Person {name:'Shawn'}) SET a.age=32
MATCH (a:Person {name:'John'}) SET a.age=44
MATCH (a:Person {name:'Mike'}) SET a.age=25
MATCH (a:Person {name:'Mike'}) SET a.test='test'
MATCH (a:Person {name:'Mike'}) REMOVE a.test
MATCH (a:Location {city:'Portland'}) DELETE a
MATCH (a:Person {name:'Todd'})-[rel]-(b:Person) DELETE a,b,rel
6,通过Python操作Neo4j: neo4j模块;py2neo模块
neo4j_test.py
#! -*- coding: utf-8 -*-
#step 1:导入Neo4j驱动包 安装驱动pip install neo4j-driver
from neo4j import GraphDatabase
#setp 2:连接Neo4j图数据库
driver = GraphDatabase.driver('neo4j://localhost:7687', auth=('neo4j', '123'))
#添加关系函数
def add_friend(tx, name, friend_name):
tx.run('merge (a:Person {name:$name})'
'merge (a)-[:KNOWS]->(friend:Person {name: $friend_name})',
name=name, friend_name=friend_name
)
#定义关系函数
def print_friends(tx, name):
for record in tx.run('match (a:Person)-[:KNOWS]->(friend) where a.name = $name return friend.name order by friend.name', name=name):
print(record['friend.name'])
#step 3:运行
with driver.session() as session:
session.write_transaction(add_friend, 'Arthur', 'Guinevere')
session.write_transaction(add_friend, 'Arthur', 'Lancelot')
session.write_transaction(add_friend, 'Arthur', 'Merlin')
session.read_transaction(print_friends, 'Arthur')
py2neo_test.py
#! -*- coding: utf-8 -*-
#step1:导包
from py2neo import Graph, Node, Relationship
#step2:构件图
g = Graph('http://localhost:7474/', auth=('neo4j', '123'))
#step3:创建节点
tx = g.begin()
a = Node('Person', name='Alice')
tx.create(a)
b = Node('Person', name='Bob')
#step4:创建边
ab = Relationship(a, 'KNOWS', b)
#step5:运行
tx.create(ab)
tx.commit()
7,通过csv文件批量导入图数据:
csv分为两个nodes.csv和relations.csv,注意关系里的起始节点必须是在nodes.csv里能找到。
制作出nodes.csv和relations.csv后,通过以下步骤导入neo4j:
neo4j安装的绝对路径/import
修改 neo4j/conf/neo4j.conf 文件中的 dbms.default_database=mygraph.db
Reference
干货 | 从零到一学习知识图谱的技术与应用
手把手教你快速入门知识图谱 - Neo4J教程
python操作图数据库neo4j的两种方式
Neo4j之导入数据
知识图谱Schema
美团大脑:知识图谱的建模方法及其应用
知识图谱概念与技术 肖仰华