通过Gremlin语言构建关系图并进行图分析

背景

Gremlin是Apache TinkerPop框架下实现的图遍历语言,支持OLTP与OLAP,是目前图数据库领域主流的查询语言,可类比SQL语言之于关系型数据库。

HugeGraph是国内的一款开源图数据库,完全支持Gremlin语言。本文将讲述如何在HugeGraph图数据库上通过Gremlin来构建一张图,并进行一些基本的图分析。该图描述了TinkerPop框架、Gremlin语言、相关图数据库及其作者等实体之间的关系。最终图的效果如下:
通过Gremlin语言构建关系图并进行图分析_第1张图片

构建图

参考官网文档的说明,整个图构建分为如下4步:准备图模型、创建Schema、添加顶点与边、验证结果。下面的Gremlin语句均在HugeGraph-Studio中执行。

  1. 准备图模型

    构建图之前,需要先考虑好图模型是什么样子,就像在关系型数据库里面,建表之前需要先构思好表结构及表关联关系一样,当然HugeGraph图数据库的建模相比关系数据库来说简单一些,针对某类实体,HugeGraph只需要把关注点放在:哪些实体类与它有联系,就加一类关联边,比如在人与人之间加一类朋友关联。

    在HugeGraph里面,一类实体由顶点类型(VertexLabel)来表述;一类关联边由边类型(EdgeLabel)来表述;实体或关联边可以包含属性,一类属性由属性类型(PropertyKey)来表述。

    在本实例中,包含如下结构:

    包括3种VertexLabel:

    • 人”person”
    • 软件”software”
    • 语言”language”

    包括6种EdgeLabel:

    • 人认识人”knows”
    • 人创建软件”created”
    • 软件包含软件”contains”
    • 软件定义语言”define”
    • 软件实现软件”implements”
    • 软件支持语言”supports”

    包括6种PropertyKey:

    • 姓名”name”
    • 年龄”age”
    • 地址”addr”
    • 使用语言”lang”
    • 标签”tag”
    • 权重”weight”
  2. 创建Schema

    1. 创建属性类型(PropertyKey)

      graph.schema().propertyKey("name").asText().create() // 创建姓名属性,文本类型
      graph.schema().propertyKey("age").asInt().create()   // 创建年龄属性,整数类型
      graph.schema().propertyKey("addr").asText().create() // 创建地址属性,文本类型
      graph.schema().propertyKey("lang").asText().create() // 创建语言属性,文本类型
      graph.schema().propertyKey("tag").asText().create()  // 创建标签属性,文本类型
      graph.schema().propertyKey("weight").asFloat().create() // 创建权重属性,浮点类型
    2. 创建顶点类型(VertexLabel)

      // 创建顶点类型:人"person",包含姓名、年龄、地址等属性,使用自定义的字符串作为ID
      graph.schema().vertexLabel("person")
                    .properties("name", "age", "addr", "weight")
                    .useCustomizeStringId()
                    .create()
      // 创建顶点类型:软件"software",包含名称、使用语言、标签等属性,使用名称作为主键
      graph.schema().vertexLabel("software")
                    .properties("name", "lang", "tag", "weight")
                    .primaryKeys("name")
                    .create()
      // 创建顶点类型:语言"language",包含名称、使用语言等属性,使用名称作为主键
      graph.schema().vertexLabel("language")
                    .properties("name", "lang", "weight")
                    .primaryKeys("name")
                    .create()
    3. 创建边类型(EdgeLabel)

      // 创建边类型:人认识人"knows",此类边由"person"指向"person"
      graph.schema().edgeLabel("knows")
                    .sourceLabel("person").targetLabel("person")
                    .properties("weight")
                    .create()
      // 创建边类型:人创建软件"created",此类边由"person"指向"software"
      graph.schema().edgeLabel("created")
                    .sourceLabel("person").targetLabel("software")
                    .properties("weight")
                    .create()
      // 创建边类型:软件包含软件"contains",此类边由"software"指向"software"
      graph.schema().edgeLabel("contains")
                    .sourceLabel("software").targetLabel("software")
                    .properties("weight")
                    .create()
      // 创建边类型:软件定义语言"define",此类边由"software"指向"language"
      graph.schema().edgeLabel("define")
                    .sourceLabel("software").targetLabel("language")
                    .properties("weight")
                    .create()
      // 创建边类型:软件实现软件"implements",此类边由"software"指向"software"
      graph.schema().edgeLabel("implements")
                    .sourceLabel("software").targetLabel("software")
                    .properties("weight")
                    .create()
      // 创建边类型:软件支持语言"supports",此类边由"software"指向"language"
      graph.schema().edgeLabel("supports")
                    .sourceLabel("software").targetLabel("language")
                    .properties("weight")
                    .create()
  3. 添加顶点与边

    1. 添加TinkerPop相关顶点与边

      // 调用Gremlin的addVertex方法添加顶点,参数由顶点label、id、properties构成
      // T.label表示顶点类型、T.id表示顶点id
      // 后面接着的"name"、"age"等是顶点属性,每个属性由一对键值组成
      
      // 添加2个作者顶点
      okram = graph.addVertex(T.label, "person", T.id, "okram",
                              "name", "Marko A. Rodriguez", "age", 29,
                              "addr", "Santa Fe, New Mexico", "weight", 1)
      spmallette = graph.addVertex(T.label, "person", T.id, "spmallette",
                                   "name", "Stephen Mallette", "age", 0,
                                   "addr", "", "weight", 1)
      
      // 添加TinkerPop顶点
      tinkerpop = graph.addVertex(T.label, "software", "name", "TinkerPop",
                                  "lang", "java", "tag", "Graph computing framework",
                                  "weight", 1)
      // 添加TinkerGraph顶点
      tinkergraph = graph.addVertex(T.label, "software", "name", "TinkerGraph",
                                    "lang", "java", "tag", "In-memory property graph",
                                    "weight", 1)
      // 添加Gremlin顶点
      gremlin = graph.addVertex(T.label, "language", "name", "Gremlin",
                                "lang", "groovy/python/javascript", "weight", 1)
      
      // 调用Gremlin的addEdge方法添加边
      // 由源顶点对象发起调用,参数由边类型、目标顶点、属性构成
      // 后面接着的"name"、"age"等是顶点属性,每个属性由一对键值组成
      
      // 添加2位作者创建TinkerPop的边
      okram.addEdge("created", tinkerpop, "weight", 1)
      spmallette.addEdge("created", tinkerpop, "weight", 1)
      
      // 添加2位作者的认识边
      okram.addEdge("knows", spmallette, "weight", 1)
      
      // 添加TinkerPop、TinkerGraph、Gremlin之间的关系边
      tinkerpop.addEdge("define", gremlin, "weight", 1)
      tinkerpop.addEdge("contains", tinkergraph, "weight", 1)
      tinkergraph.addEdge("supports", gremlin, "weight", 1)
    2. 添加HugeGraph相关顶点与边

      // 注意:下面的Gremlin语句在执行时需要紧接着上述第一步的语句
      // 因为这里使用了上面定义的tinkerpop、gremlin等变量
      
      // 添加3个作者顶点
      javeme = graph.addVertex(T.label, "person", T.id, "javeme",
                               "name", "Jermy Li", "age", 29, "addr",
                               "Beijing", "weight", 1)
      zhoney = graph.addVertex(T.label, "person", T.id, "zhoney",
                               "name", "Zhoney Zhang", "age", 29,
                               "addr", "Beijing", "weight", 1)
      linary = graph.addVertex(T.label, "person", T.id, "linary",
                               "name", "Linary Li", "age", 28,
                               "addr", "Wuhan. Hubei", "weight", 1)
      
      // 添加HugeGraph顶点
      hugegraph = graph.addVertex(T.label, "software", "name", "HugeGraph",
                                  "lang", "java", "tag", "Graph Database",
                                  "weight", 1)
      
      // 添加作者创建HugeGraph的边
      javeme.addEdge("created", hugegraph, "weight", 1)
      zhoney.addEdge("created", hugegraph, "weight", 1)
      linary.addEdge("created", hugegraph, "weight", 1)
      
      // 添加作者之间的关系边
      javeme.addEdge("knows", zhoney, "weight", 1)
      javeme.addEdge("knows", linary, "weight", 1)
      
      // 添加HugeGraph实现TinkerPop的边
      hugegraph.addEdge("implements", tinkerpop, "weight", 1)
      // 添加HugeGraph支持Gremlin的边
      hugegraph.addEdge("supports", gremlin, "weight", 1)
    3. 添加Titan相关顶点与边

      // 注意:下面的Gremlin语句在执行时需要紧接着上述第一步的语句
      // 因为这里使用了上面定义的tinkerpop、gremlin等变量
      
      // 添加2个作者顶点
      alaro = graph.addVertex(T.label, "person", T.id, "dalaro",
                              "name", "Dan LaRocque ", "age", 0,
                              "addr", "", "weight", 1)
      mbroecheler = graph.addVertex(T.label, "person", T.id, "mbroecheler",
                                    "name", "Matthias Broecheler",
                                    "age", 0, "addr", "San Francisco", "weight", 1)
      
      // 添加Titan顶点
      titan = graph.addVertex(T.label, "software", "name", "Titan",
                              "lang", "java", "tag", "Graph Database", "weight", 1)
      
      // 添加作者、Titan之间的关系边
      dalaro.addEdge("created", titan, "weight", 1)
      mbroecheler.addEdge("created", titan, "weight", 1)
      okram.addEdge("created", titan, "weight", 1)
      
      dalaro.addEdge("knows", mbroecheler, "weight", 1)
      
      // 添加Titan与TinkerPop、Gremlin之间的关系边
      titan.addEdge("implements", tinkerpop, "weight", 1)
      titan.addEdge("supports", gremlin, "weight", 1)
  4. 验证结果

    查询所有的顶点及其关联边,验证结果是否为构想好的图。

    // 查询所有的顶点"g.V()" (在HugeGraph-Studio中执行该语句时,顶点的关联边也会一道被查询出来)
    // 或者也可使用查询所有边"g.E()"进行验证
    g.V()

分析图

基于上述构建好的图,可以进一步进行图查询与分析。下面给出了几个典型的应用。

  1. 根据顶点类型和名称来查询”Gremlin“顶点:

    // g.V()表示查询顶点,hasLabel过滤顶点类型,has过滤属性
    g.V().hasLabel("language").has("name", "Gremlin")

    执行结果:顶点Gremlin

  2. 查询哪些图数据库支持Gremlin:

    // in表示查询顶点的入顶点,也就是指向Gremlin顶点的顶点
    g.V().hasLabel("language").has("name", "Gremlin")
         .in("supports")

    执行结果:顶点HugeGraph、顶点Titan、顶点TinkerGraph

  3. 查询支持Gremlin的数据库的作者,并且年龄是29岁:

    // in后面可以接has进行条件过滤
    g.V().hasLabel("language").has("name", "Gremlin")
         .in("supports")
         .in("created").has("age", 29)

    执行结果:顶点Marko A. Rodriguez、顶点Jermy Li、顶点Zhoney Zhang

  4. 查看上一个查询(查询3)所经过的路径:

    // path表示每一步的中间结果都会保留下来,最终作为路径
    g.V().hasLabel("language").has("name", "Gremlin")
         .in("supports")
         .in("created").has("age", 29)
         .path().by("name")

    执行结果:共3条路径分别是
    Gremlin - Titan - Marko A. Rodriguez
    Gremlin - HugeGraph - Jermy Li
    Gremlin - HugeGraph - Zhoney Zhang

  5. 查询路径时保留所经过的边信息

    // inE表示顶点的入边,outV表示边的出顶点,两步加起来等价于in
    g.V().hasLabel("language").has("name", "Gremlin")
         .inE("supports").outV()
         .path()

    执行结果:共3条路径分别是(中间保留了边ID信息)
    4:Gremlin - S3:Titan>6>>S4:Gremlin - 3:Titan
    4:Gremlin - S3:HugeGraph>6>>S4:Gremlin - 3:HugeGraph
    4:Gremlin - S3:TinkerGraph>6>>S4:Gremlin - 3:TinkerGraph

  6. 将查询到的路径信息以图的形式展现

    g.V().hasLabel("language").has("name", "Gremlin")
         .inE("supports").outV()
         .inE("created").outV().has("age", 29)
         .path()

    执行结果:
    通过Gremlin语言构建关系图并进行图分析_第2张图片

完整代码:

// PropertyKey
graph.schema().propertyKey("name").asText().create()
graph.schema().propertyKey("age").asInt().create()
graph.schema().propertyKey("addr").asText().create()
graph.schema().propertyKey("lang").asText().create()
graph.schema().propertyKey("tag").asText().create()
graph.schema().propertyKey("weight").asFloat().create()

// VertexLabel
graph.schema().vertexLabel("person").properties("name", "age", "addr", "weight").useCustomizeStringId().create()
graph.schema().vertexLabel("software").properties("name", "lang", "tag", "weight").primaryKeys("name").create()
graph.schema().vertexLabel("language").properties("name", "lang", "weight").primaryKeys("name").create()

// EdgeLabel
graph.schema().edgeLabel("knows").sourceLabel("person").targetLabel("person").properties("weight").create()
graph.schema().edgeLabel("created").sourceLabel("person").targetLabel("software").properties("weight").create()
graph.schema().edgeLabel("contains").sourceLabel("software").targetLabel("software").properties("weight").create()
graph.schema().edgeLabel("define").sourceLabel("software").targetLabel("language").properties("weight").create()
graph.schema().edgeLabel("implements").sourceLabel("software").targetLabel("software").properties("weight").create()
graph.schema().edgeLabel("supports").sourceLabel("software").targetLabel("language").properties("weight").create()

// TinkerPop
okram = graph.addVertex(T.label, "person", T.id, "okram", "name", "Marko A. Rodriguez", "age", 29, "addr", "Santa Fe, New Mexico", "weight", 1)
spmallette = graph.addVertex(T.label, "person", T.id, "spmallette", "name", "Stephen Mallette", "age", 0, "addr", "", "weight", 1)

tinkerpop = graph.addVertex(T.label, "software", "name", "TinkerPop", "lang", "java", "tag", "Graph computing framework", "weight", 1)
tinkergraph = graph.addVertex(T.label, "software", "name", "TinkerGraph", "lang", "java", "tag", "In-memory property graph", "weight", 1)
gremlin = graph.addVertex(T.label, "language", "name", "Gremlin", "lang", "groovy/python/javascript", "weight", 1)

okram.addEdge("created", tinkerpop, "weight", 1)
spmallette.addEdge("created", tinkerpop, "weight", 1)

okram.addEdge("knows", spmallette, "weight", 1)

tinkerpop.addEdge("define", gremlin, "weight", 1)
tinkerpop.addEdge("contains", tinkergraph, "weight", 1)
tinkergraph.addEdge("supports", gremlin, "weight", 1)

// Titan
dalaro = graph.addVertex(T.label, "person", T.id, "dalaro", "name", "Dan LaRocque ", "age", 0, "addr", "", "weight", 1)
mbroecheler = graph.addVertex(T.label, "person", T.id, "mbroecheler", "name", "Matthias Broecheler", "age", 29, "addr", "San Francisco", "weight", 1)

titan = graph.addVertex(T.label, "software", "name", "Titan", "lang", "java", "tag", "Graph Database", "weight", 1)

dalaro.addEdge("created", titan, "weight", 1)
mbroecheler.addEdge("created", titan, "weight", 1)
okram.addEdge("created", titan, "weight", 1)

dalaro.addEdge("knows", mbroecheler, "weight", 1)

titan.addEdge("implements", tinkerpop, "weight", 1)
titan.addEdge("supports", gremlin, "weight", 1)

// HugeGraph
javeme = graph.addVertex(T.label, "person", T.id, "javeme", "name", "Jermy Li", "age", 29, "addr", "Beijing", "weight", 1)
zhoney = graph.addVertex(T.label, "person", T.id, "zhoney", "name", "Zhoney Zhang", "age", 29, "addr", "Beijing", "weight", 1)
linary = graph.addVertex(T.label, "person", T.id, "linary", "name", "Linary Li", "age", 28, "addr", "Wuhan. Hubei", "weight", 1)

hugegraph = graph.addVertex(T.label, "software", "name", "HugeGraph", "lang", "java", "tag", "Graph Database", "weight", 1)

javeme.addEdge("created", hugegraph, "weight", 1)
zhoney.addEdge("created", hugegraph, "weight", 1)
linary.addEdge("created", hugegraph, "weight", 1)

javeme.addEdge("knows", zhoney, "weight", 1)
javeme.addEdge("knows", linary, "weight", 1)

hugegraph.addEdge("implements", tinkerpop, "weight", 1)
hugegraph.addEdge("supports", gremlin, "weight", 1)

更多Gremlin语法可参考TinkerPop官网文档。

你可能感兴趣的:(HugeGraph,Gremlin,图数据库,TinkerPop,Gremlin,Graph,Database,HugeGraph,图数据库)