JanusGraph环境搭建实战

JanusGraph环境搭建实战

最近工作一直在做关于图数据的开发工作,正在开发图计算相关的功能,图数据库的内核还是基于TinkerPop3的,TinkerPop3提供了一套完整的图数据模型建模,标准的DSL Gremlin图操作语句,还有基于内存的图数据库TinkerGraph。TinkerPop为其他部分开源数据库厂商提供了图遍历和图查询图计算的内核例如比较出名的Titan以及后来活跃的JanusGraph,以及HugeGraph。国内大部分图数据库厂商平台都是基于TinkerPop内核,兼容Gremlin语法。


JanusGraph安装

  1. 下载JanusGraph安装包0.4.xrelease
  2. 解压缩安装包,并修改配置文件

JanusGraph环境搭建实战_第1张图片
可以看出JanusGraph内部包含了一个Elasticsearch的完整实例,这里我们可以开始使用JG了,进入{JG_HOME}bin/目录下,执行 ./janusgraph start命令。
我们可以查看一下janusgraph.sh脚本的内容,剖析一下启动的过程cat ./janusgraph.sh
几段关键脚本如下:
JanusGraph环境搭建实战_第2张图片

  • start:启动JG服务(实际上不会启动一个JG的jvm进程),可以看到实际上是先启动了Cassandra,再Elasticsearch,Gremlin-Server的进程。(* 注意这里不能使用root账户启动,因为ES需要非root启动)

  • stop:顺序地关闭以上的服务进程

  • status: 各个进程的状态

  • clean: 清理${JG_HOME}/log 和 data目录

start() {
    status_class $CASSANDRA_FRIENDLY_NAME $CASSANDRA_CLASS_NAME >/dev/null && status && echo "Stop services before starting" && exit 1
    echo "Forking Cassandra..."
    if [ -n "$VERBOSE" ]; then
        CASSANDRA_INCLUDE="$BIN"/cassandra.in.sh "$BIN"/cassandra || exit 1
    else
        CASSANDRA_INCLUDE="$BIN"/cassandra.in.sh "$BIN"/cassandra >/dev/null 2>&1 || exit 1
    fi
    wait_for_cassandra || {
        echo "See $BIN/../log/cassandra.log for Cassandra log output."    >&2
        return 1
    }

    status_class $ES_FRIENDLY_NAME $ES_CLASS_NAME >/dev/null && status && echo "Stop services before starting" && exit 1
    echo "Forking Elasticsearch..."
    if [ -n "$VERBOSE" ]; then
        "$BIN"/../elasticsearch/bin/elasticsearch -d
    else
        "$BIN"/../elasticsearch/bin/elasticsearch -d >/dev/null 2>&1
    fi
    wait_for_startup Elasticsearch $ELASTICSEARCH_IP $ELASTICSEARCH_PORT $ELASTICSEARCH_STARTUP_TIMEOUT_S || {
        echo "See $BIN/../log/elasticsearch.log for Elasticsearch log output."  >&2
        return 1
    }

    status_class $GREMLIN_FRIENDLY_NAME $GREMLIN_CLASS_NAME >/dev/null && status && echo "Stop services before starting" && exit 1
    echo "Forking Gremlin-Server..."
    if [ -n "$VERBOSE" ]; then
        "$BIN"/gremlin-server.sh conf/gremlin-server/gremlin-server.yaml &
    else
        "$BIN"/gremlin-server.sh conf/gremlin-server/gremlin-server.yaml >/dev/null 2>&1 &
    fi
    wait_for_startup 'Gremlin-Server' $GSRV_IP $GSRV_PORT $GSRV_STARTUP_TIMEOUT_S || {
        echo "See $BIN/../log/gremlin-server.log for Gremlin-Server log output."  >&2
        return 1
    }
    disown

    echo "Run gremlin.sh to connect." >&2
}
  1. 启动cassandra
  2. 启动es
  3. 启动gremlin-server

上面这种启动方式只试用于测试或者学习,不适用于真实的生产环境,也称之为“最小安装环境”
JanusGraph环境搭建实战_第3张图片

真实生产环境中,业务数据存储相对复杂且数量级较高,顶点和边的数量可能是上亿级别的,单个存储节点或索引节点 ,第二种方案,多个JG节点后端多个存储和索引
JanusGraph环境搭建实战_第4张图片
最高级的解决方案JG结构设计:
1.JG服务负载均衡
2.存储引擎集群化
3.索引引擎集群化
这样的好处大概在于保证整个生产环境的可用性,不至于某个JG节点或者存储索引节点挂掉后整个集群处于不可用的状态。如下图所示:
JanusGraph环境搭建实战_第5张图片
这里给出一张自己设计并搭建的方案

  • 1台JG(GremlinServer)节点
  • 三个基于CDH的HBase节点(1个HMaster,2个RegionServer)
  • 三个ES节点(1个Master,2个DataNode)
    JanusGraph环境搭建实战_第6张图片

搭建好上述环境后,最为重要的就是配置Janusgraph了。

在${JG_HOME}/conf/下添加http-janusgraph-hbase-es.properties文件
配置storage,index等信息

gremlin.graph=org.janusgraph.core.JanusGraphFactory
#hbase
storage.backend=hbase
#zk的地址
storage.hostname=172.16.10.227
#ES
index.search.backend=elasticsearch
#ES maste rnode
index.search.hostname=172.16.10.230

在${JG_HOME}/conf/gremlin-server下添加配置文件http-gremlin-server.yaml。注意在graph配置项多可以配置多个图的实例,通过上面的properties引入。

host: 0.0.0.0
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
graphs: {
  graph: conf/http-janusgraph-hbase-es.properties
}
scriptEngines: {
  gremlin-groovy: {
    plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
               org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
               org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  # Older serialization versions for backwards compatibility:
  #   - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  #     - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  #       - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  #         - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  #           - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  #             - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}
  #             processors:
  #               - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  #                 - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
  #                 metrics: {
  #                   consoleReporter: {enabled: true, interval: 180000},
  #                     csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  #                       jmxReporter: {enabled: true},
  #                         slf4jReporter: {enabled: true, interval: 180000},
  #                           gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  #                             graphiteReporter: {enabled: false, interval: 180000}}
  #                             maxInitialLineLength: 4096
  #                             maxHeaderSize: 8192
  #                             maxChunkSize: 8192
  #                             maxContentLength: 65536
  #                             maxAccumulationBufferComponents: 1024
  #                             resultIterationBatchSize: 64
  #                             writeBufferLowWaterMark: 32768
  #                             writeBufferHighWaterMark: 65536

JanusGraph着重于图数据的数据建模与事务等特性,所以存储和索引等功能可以认为是交给三方存储引擎来完成,也方便集成大数据等领域的组件做图计算人工智能等方面的工作。

JanusGraph启动

基于上述的安装方式,启动方式也比较简单,进入${JG_HOME}目录,
执行./bin/gremlin-server.sh conf/gremlin-server/http-gremlin-server.yaml 命令

122  [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Configuring Gremlin Server from conf/gremlin-server/http-gremlin-server.yaml
514  [main] INFO  org.janusgraph.diskstorage.hbase.HBaseCompatLoader  - Instantiated HBase compatibility layer supporting runtime HBase version 1.2.6: org.janusgraph.diskstorage.hbase.HBaseCompat1_0
785  [main] INFO  org.janusgraph.diskstorage.hbase.HBaseStoreManager  - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: 172.16.10.227
833  [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1436 [main] INFO  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper  - Process identifier=hconnection-0x1115ec15 connecting to ZooKeeper ensemble=172.16.10.227:2181
1480 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
1480 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:host.name=localhost.nn
1480 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:java.version=1.8.0_201
1480 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:java.vendor=Oracle Corporation
1480 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:java.home=/data/lixh/jdk1.8.0_201/jre
1481 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:
1487 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:java.library.path=:/data/dmdbms/bin:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
1487 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:java.io.tmpdir=/tmp
1487 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:java.compiler=
1488 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:os.name=Linux
1488 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:os.arch=amd64
1488 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:os.version=3.10.0-957.10.1.el7.x86_64
1488 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:user.name=root
1488 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:user.home=/root
1488 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Client environment:user.dir=/data/zhoufan/bigdata/janusgraph-0.3.1-hadoop2
1490 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=172.16.10.227:2181 sessionTimeout=90000 watcher=hconnection-0x1115ec150x0, quorum=172.16.10.227:2181, baseZNode=/hbase
1515 [main-SendThread(172.16.10.227:2181)] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 172.16.10.227/172.16.10.227:2181. Will not attempt to authenticate using SASL (unknown error)
1525 [main-SendThread(172.16.10.227:2181)] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - Socket connection established to 172.16.10.227/172.16.10.227:2181, initiating session
1535 [main-SendThread(172.16.10.227:2181)] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 172.16.10.227/172.16.10.227:2181, sessionid = 0x16bf9bfe8205892, negotiated timeout = 60000
2454 [main] INFO  org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation  - Closing master protocol: MasterService
2455 [main] INFO  org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation  - Closing zookeeper sessionid=0x16bf9bfe8205892
2457 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Session: 0x16bf9bfe8205892 closed
2457 [main-EventThread] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - EventThread shut down
2473 [main] INFO  org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration  - Generated unique-instance-id=ac100af120267-localhost-nn1
2526 [main] INFO  org.janusgraph.diskstorage.hbase.HBaseStoreManager  - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: 172.16.10.227
2527 [main] INFO  org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper  - Process identifier=hconnection-0x5b58ed3c connecting to ZooKeeper ensemble=172.16.10.227:2181
2527 [main] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=172.16.10.227:2181 sessionTimeout=90000 watcher=hconnection-0x5b58ed3c0x0, quorum=172.16.10.227:2181, baseZNode=/hbase
2528 [main-SendThread(172.16.10.227:2181)] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - Opening socket connection to server 172.16.10.227/172.16.10.227:2181. Will not attempt to authenticate using SASL (unknown error)
2529 [main-SendThread(172.16.10.227:2181)] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - Socket connection established to 172.16.10.227/172.16.10.227:2181, initiating session
2530 [main-SendThread(172.16.10.227:2181)] INFO  org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn  - Session establishment complete on server 172.16.10.227/172.16.10.227:2181, sessionid = 0x16bf9bfe8205893, negotiated timeout = 60000
2536 [main] INFO  org.janusgraph.diskstorage.Backend  - Configuring index [search]
3295 [main] INFO  org.janusgraph.diskstorage.Backend  - Initiated backend operations thread pool of size 80
3396 [main] INFO  org.janusgraph.graphdb.database.IndexSerializer  - Hashing index keys
3519 [main] INFO  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Loaded unidentified ReadMarker start time 2019-07-24T05:41:08.496Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@30b2b76f
3521 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Graph [graph] was successfully configured via [conf/http-janusgraph-hbase-es.properties].
3521 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized Gremlin thread pool.  Threads in pool named with pattern gremlin-*
3748 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized GremlinExecutor and preparing GremlinScriptEngines instances.
5258 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - Initialized gremlin-groovy GremlinScriptEngine and registered metrics
5264 [main] INFO  org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor  - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardjanusgraph[hbase:[172.16.10.227]], standard]
5285 [main] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the standard OpProcessor.
5288 [main] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the session OpProcessor.
5372 [main] INFO  org.apache.tinkerpop.gremlin.server.op.OpLoader  - Adding the traversal OpProcessor.
5380 [main] INFO  org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor  - Initialized cache for TraversalOpProcessor with size 1000 and expiration time of 600000 ms
5413 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Executing start up LifeCycleHook
5428 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Executed once at startup of Gremlin Server.
5433 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - idleConnectionTimeout was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
5433 [main] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - keepAliveInterval was set to 0 which resolves to 0 seconds when configuring this value - this feature will be disabled
5493 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
5493 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
5515 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
5515 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
5526 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
5526 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
5531 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/vnd.gremlin-v3.0+json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
5531 [main] INFO  org.apache.tinkerpop.gremlin.server.AbstractChannelizer  - Configured application/json with org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0
5618 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Gremlin Server configured with worker thread pool of 1, gremlin pool of 40 and boss thread pool of 1.
5618 [gremlin-server-boss-1] INFO  org.apache.tinkerpop.gremlin.server.GremlinServer  - Channel started at port 8182.

可以看到GremlinServer启动的过程,JG通过插件的形式被加载了进去,对外显示的还是GremlinServer进程。

JanusGraph环境搭建实战_第7张图片

通过客户端工具连接,这里直接使用tinkerpop-console进行连接

  1. ${TINKERPOP_GREMLIN_CONSOEL}/bin/gremlin.sh
  2. :remote connect tinkerpop.server conf/remote.yaml session
  3. :remote console

JanusGraph环境搭建实战_第8张图片
4. 输入 graph查看当前图的实例对象是?
在这里插入图片描述
StandardJanusGraph正是JanusGraph对org.apache.tinkerpop.gremlin.structure.Graph图模型的实现,之后我们就可以使用JanusGraph为我们提供的API进行图的操作。

再看看ES和HBase里面存放了什么?

ES中索引数据存放在janusgraph_edgesjanusgraph_vertices

在这里插入图片描述
HBase数据存在名为’janusgraph’的一张大表中,当然这些表名是可以通过配置文件修改的
JanusGraph环境搭建实战_第9张图片

参考内容

https://docs.janusgraph.org/latest/configuration.html
https://docs.janusgraph.org/latest/config-ref.html
https://docs.janusgraph.org/latest/deployment-scenarios.html

你可能感兴趣的:(图数据库)