TinkerPop’s Hadoop-Gremlin
JanusGraph with TinkerPop’s Hadoop-Gremlin
def defineGratefulDeadSchema(janusGraph) {
m = janusGraph.openManagement()
//人信息节点label
person = m.makeVertexLabel("person").make()
//properties
//使用IncrementBulkLoader导入时,去掉下面注释
//blid = m.makePropertyKey("bulkLoader.vertex.id").dataType(Long.class).make()
birth = m.makePropertyKey("birth").dataType(Date.class).make()
age = m.makePropertyKey("age").dataType(Integer.class).make()
name = m.makePropertyKey("name").dataType(String.class).make()
//index
index = m.buildIndex("nameCompositeIndex", Vertex.class).addKey(name).unique().buildCompositeIndex()
//使用IncrementBulkLoader导入时,去掉下面注释
//bidIndex = m.buildIndex("byBulkLoaderVertexId", Vertex.class).addKey(blid).indexOnly(person).buildCompositeIndex()
m.commit()
}
{"id":4136,"label":"person","properties":{"name":[{"id":"16t-36w-5j9","value":"lisi"}],"birth":[{"id":"1z9-36w-3yd","value":1509443638951}],"age":[{"id":"101-26w-5qt","value":4136}]}}
{"id":4702,"label":"person","properties":{"name":[{"id":"171-38o-5j9","value":"fu1 "}],"birth":[{"id":"1zh-38o-3yd","value":1509043638952}],"age":[{"id":"1l9-38o-4qt","value":1}]}}
{"id":4700,"label":"person","properties":{"name":[{"id":"171-38o-5j9","value":"fu2 "}],"birth":[{"id":"1zh-38o-3yd","value":1509043638976}],"age":[{"id":"1l9-38o-4qt","value":1}]}}
storage.backend=hbase
schema.default = none
# true:在批量导入或api添加时,会进行一致性校验,否则不会进行
# 本例子中的一致性:在name属性上建立了唯一索引,所以name不允许有重复值。
storage.batch-loading=true
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat
gremlin.hadoop.inputLocation=./data/zl/test-modern.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
#####################################
# GiraphGraphComputer Configuration #
#####################################
giraph.minWorkers=2
giraph.maxWorkers=2
giraph.useOutOfCoreGraph=true
giraph.useOutOfCoreMessages=true
mapred.map.child.java.opts=-Xmx1024m
mapred.reduce.child.java.opts=-Xmx1024m
giraph.numInputThreads=4
giraph.numComputeThreads=4
giraph.maxMessagesInMemory=100000
####################################
# SparkGraphComputer Configuration #
####################################
spark.master=local[*]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer
我将上述文件都放在了D:\soft\janusgraph-0.2.0-hadoop2\data\zl目录下,即janusgraph安装目录的自己创建的一个zl文件夹。
./bin/gremlin.bat
:load data/zl/test-janusgraph-schema.groovy
graph = JanusGraphFactory.open('data/zl/janusgraph-test.properties')
defineGratefulDeadSchema(graph)
graph = GraphFactory.open('data/zl/hadoop-graphson.properties')
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph('data/zl/janusgraph-test.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
graph = JanusGraphFactory.open('data/zl/janusgraph-test.properties')
g = graph.traversal()
g.V().valueMap()
查到的数据类似下面结构
==>[name:[lisi],birth:[Tue Oct 31 17:53:58 CST 2017],age:[10000]]
==>[name:[zhouliang],birth:[Tue Oct 31 17:53:58 CST 2017],age:[10000]]
org.janusgraph.core.SchemaViolationException: Adding this property for key [name] and value [lisi] violates a uniqueness constraint [nameCompositeIndex]
./bin/gremlin.bat
:load data/zl/test-janusgraph-schema.groovy
graph = JanusGraphFactory.open('data/zl/janusgraph-test.properties')
defineGratefulDeadSchema(graph)
graph = GraphFactory.open('data/zl/hadoop-graphson.properties')
blvp = BulkLoaderVertexProgram.build().writeGraph('data/zl/janusgraph-test.properties').create(graph)
graph.compute(SparkGraphComputer).program(blvp).submit().get()
graph = JanusGraphFactory.open('data/zl/janusgraph-test.properties')
g = graph.traversal()
g.V().valueMap()
==>[name:[fu1],birth:[Tue Oct 31 17:53:58 CST 2017],bulkLoader.vertex.id:[4702],age:[10000]]
==>[name:[lisi],birth:[Tue Oct 31 17:53:58 CST 2017],bulkLoader.vertex.id:[4136],age:[10000]]
Undefined type used in query: bulkLoader.vertex.id
16:35:41 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(~label = person AND bulkLoader.vertex.id = 4136)]. For better performance, use indexes
5.重复执行第4步,都会导入成功,你会发现最后数据没有多,还是原来数据。
6.修改test-modern.json文件中顶点属性值,或修改顶点id值,然后在执行第4步,会发现json中id变的顶点会再次添加成功;id没变的顶点,但是其属性值变了,最后到图中对应的顶点的属性值也变化了。(其中如果json添加了新的属性,图中也会添加新的属性,但是json中属性变少了,图中的对应属性还在)。
7.修改json文件,将某两个顶点的id设置成相同的,在执行第4步,会报错,类似如下:
16:18:04 WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 3, localhost): java.lang.IllegalStateException: The property does not exist as the key has no associated value for the provided element: v[4136]:bulkLoader.vertex.id
at org.apache.tinkerpop.gremlin.structure.Property$Exceptions.propertyDoesNotExist(Property.java:155)
org.janusgraph.core.SchemaViolationException: Adding this property for key [name] and value [lisi] violates a uniqueness constraint [nameCompositeIndex]
bidIndex = m.buildIndex("byBulkLoaderVertexId", Vertex.class).addKey(blid).indexOnly(person).buildCompositeIndex()