PandaDB vs Neo4j 单机图查询性能对比测试报告

PandaDB

本测试报告由PandaDB开发团队提供
时间: 2021年3月31日

1.测试简介

PandaDB是以属性图为基础实现的大规模异构数据的融合管理。为指导后续研发,我们以目前最为成熟、应用最广泛、单机图查询的性能标杆图数据库——Neo4j为参照,实测了PandaDB和Neo4j在单机图查询上的性能差异。

本次测试,我们采用了图数据库的国际通行基准测试LDBC的测试数据集和部分测试负载。

2.测试环境

表 1:测试环境

环境 配置
硬件环境 单台测试物理机,配置:
双路至强可扩展金牌6230R CPU
384GB DDR4内存
220 TB Raid 5 HDD
软件环境 操作系统版本:CentOS 7.8 (64 bit)
JDK版本:1.8
测试使用软件版本 PandaDB版本:v0.3.0.210331
Neo4j版本:v3.5.6 Community

3.测试负载

基于基准测试LDBC的测试数据和测试负载。其中测试数据中有170亿边,25亿节点。

首先git clone https://github.com/ldbc/ldbc_snb_datagen,然后生成测试数据并导入PandaDB。

(1)测试数据的生成

编辑ldbc_snb_datagen根目录下的params.ini文件,将generator.scaleFactor设置为1000。然后执行命令:
tools/run.py --cores 24 --memory 100g ./target/ldbc_snb_datagen-0.4.0-SNAPSHOT-jar-with-dependencies.jar params.ini
生成的数据量在1.3TB左右。

(2)测试数据的导入

将测试数据分别导入Neo4j、PandaDB,导入语句见附录1。
Neo4j导入耗时:1d 5h 40m 49s 176ms。
PandaDB导入耗时:21h 19m 13s 107ms。

(3) 数据索引

:person("id")
:post("id")
:comment("id")
:person("firstName")

(4) 数据量

PandaDB磁盘占用为2.4 TB,Neo4j 1.8 TB。

4.测试语句

表 2 : 本测试报告所用测试负载(Cypher语句)

编号 查询语句 对应的LDBC测试语句 测试语义
C1 MATCH (n:person{firstName:"%s"})
RETURN n
根据非唯一属性过滤节点
C2 MATCH (m:comment {id: "%s"})
RETURN m.creationDate AS messageCreationDate,
m.content as content
interactive-short4 根据唯一属性过滤节点
C3 MATCH (n:person {id:"%s"})-[r:knows]-(friend:person{lastName:"Sharma"})
RETURN id(friend)
interactive-short3 一度关系,返回id
C4 MATCH (n:person{id:"%s"})-[r:knows]-(friend)
RETURN friend.id AS personId,
friend.firstName AS firstName,
friend.lastName AS lastName,
r.creationDate AS friendshipCreationDate
interactive-short3 一度关系,返回节点数据
C5 MATCH (n:person {id:"%s"})-[:isLocatedIn]->(p:place)
RETURN n.firstName AS firstName,
n.lastName AS lastName,
n.birthday AS birthday,
n.locationIP AS locationIP,
n.browserUsed AS browserUsed,
p.id AS cityId, n.gender AS gender,
n.creationDate AS creationDate
interactive-short1 一度关系,返回节点数据
C6 MATCH (m:comment{id:"%s"})-[:hasCreator]->(p:person)
RETURN p.id AS personId,
p.firstName AS firstName,
p.lastName AS lastName
nteractive-short5 一度关系,返回节点数据
C7 MATCH (n:person {id:"%s"})-[:knows]-> () -[:knows]->(m:person{gender:"male"})
RETURN id(m)
二度关系,首尾节点加属性过滤
C8 MATCH (n:person {id:"%s"})-[:knows]-> () -[:knows]->(m:person)
RETURN m.firstName AS firstName,
m.lastName AS lastName,
m.birthday AS birthday,
m.locationIP AS locationIP,
m.browserUsed AS browserUsed
二度关系,返回属性
C9 MATCH (:person {id:"%s"})<-[:hasCreator]-(m)-[:replyOf]->(p:post)-[:hasCreator]->(c)
RETURN m.id AS messageId,
m.creationDate AS messageCreationDate,
p.id AS originalPostId,
c.id AS originalPostAuthorId,
c.firstName AS originalPostAuthorFirstName,
c.lastName AS originalPostAuthorLastName
interactive-short2 三度关系
C10 MATCH (m:comment{id:"%s"})-[:replyOf]->(p:post)<-[:containerOf]-(f:forum)-[:hasModerator]->(mod:person)
RETURN f.id AS forumId,
f.title AS forumTitle,
mod.id AS moderatorId,
mod.firstName AS moderatorFirstName,
mod.lastName AS moderatorLastName
interactive-short6 三度关系
C11 MATCH (m:post{id:"%s"})<-[:replyOf]-(c:comment)-[:hasCreator]->(p:person)
RETURN c.id AS commentId,
c.content AS commentContent,
c.creationDate AS commentCreationDate,
p.id AS replyAuthorId,
p.firstName AS replyAuthorFirstName,
p.lastName AS replyAuthorLastName
interactive-short7(前半部分) 两度关系
C12 MATCH (m:post{id:"%s"})-[:hasCreator]->(a:person)-[r:knows]-(p)
RETURN m.id AS postId,
m.language as postLanguage,
p.id AS replyAuthorId,
p.firstName AS replyAuthorFirstName,
p.lastName AS replyAuthorLastName
interactive-short7(后半部分) 两度关系

5. 测试结果

表3:测试结果(ms)

查询语句 Neo4j
查询耗时
PandaDB
查询耗时
加速比[1]
(PandaDB相对于Neo4j)
C1 998 1,125 0.89
C2 154 54 2.85
C3 7,381 1,197 6.17
C4 1,261 473 2.67
C5 68 109 0.62
C6 139 126 1.10
C7 2,218 486 4.56
C8 3,275 2,447 1.34
C9 37,793 27,743 1.36
C10 164 169 0.97
C11 117 107 1.09
C12 2,232 212 10.53
图1:查询语句的响应时间对比
图2:查询语句的响应时间对比(%)

附录:测试数据导入语句

导入语句如下所示。其中修改为数据实际存储路径。
(1)Neo4j数据导入命令
nohup neo4j-community-3.5.6/bin/neo4j-admin import --database graph1000.db --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/tag-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/comment-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/tagclass-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/person-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/forum-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/post-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/organisation-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/place-output.csv  --relationships=/ldbc/ldbc-out/ldbc-1000/relations/organisation_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_knows_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/post_hasCreator_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/tagclass_isSubclassOf_tagclass-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_studyAt_organisation-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_hasTag_tag-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_comment-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_likes_comment-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_hasMember_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_workAt_organisation-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_hasCreator_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_likes_post-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/place_isPartOf_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/post_hasTag_tag-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_hasTag_tag-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/tag_hasType_tagclass-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_hasModerator_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_post-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/post_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_containerOf_post-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_hasInterest_tag-output.csv  --delimiter "|" --array-delimiter ";" > neo4j-import-0303.log 2>&1 &
(2)PandaDB数据导入命令
nohup java -jar pandadb-importer-v0.3.jar --db-path=/panda-server/ldbc-1000.0302.db --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/tag-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/comment-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/tagclass-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/person-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/forum-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/post-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/organisation-output.csv --nodes=/ldbc/ldbc-out/ldbc-1000/nodes/place-output.csv  --relationships=/ldbc/ldbc-out/ldbc-1000/relations/organisation_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_knows_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/post_hasCreator_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/tagclass_isSubclassOf_tagclass-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_studyAt_organisation-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_hasTag_tag-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_comment-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_likes_comment-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_hasMember_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_workAt_organisation-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_hasCreator_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_likes_post-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/place_isPartOf_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/post_hasTag_tag-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_hasTag_tag-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/tag_hasType_tagclass-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_hasModerator_person-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/comment_replyOf_post-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/post_isLocatedIn_place-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/forum_containerOf_post-output.csv --relationships=/ldbc/ldbc-out/ldbc-1000/relations/person_hasInterest_tag-output.csv  --delimeter="|" --array-delimeter=";" > 1000-0302.log 2>&1 &

[1] 注:加速比计算公式:Neo4j查询时间/PandaDB查询时间。此值越大表示pandadb性能优势越明显,为1表示查询性能相同

你可能感兴趣的:(PandaDB vs Neo4j 单机图查询性能对比测试报告)