How to Import Large Data into neo4j Database

How to Import Large Data into Neo4j Database

Preface

We only introduce how to import large data into graph database neo4j run on a docker container environment operated on Linux. Here, we use the official method provided by neo4j database:

neo4j-admin import

This method:

  • Very fast. For data nodes and relations, it usually takes only several seconds to import.
  • Need the datafile to be some specific form.
  • Can only import the data to unused database. Once the database is activated, data can not be imported by this method.

Step 1. Setup the Neo4j database

We first pull the image of neo4j database.

docker pull neo4j:enterprise

Here, it is necessary to use the enterprise version of neo4j database. This is because we need to create different database in one container, and this function is not included in community version.
Now, build the database container with name, say neo4j.

sudo docker run   -idt \
--env=NEO4J_ACCEPT_LICENSE_AGREEMENT=yes \
--name neo4j \
-p 7474:7474   -p 7687:7687  \
-v $HOME/workspace/neo4j/data:/var/lib/neo4j/data \
-v $HOME/workspace/neo4j/logs:/var/lib/neo4j/logs \
-v $HOME/workspace/neo4j/import:/var/lib/neo4j/import \
-v $HOME/workspace/neo4j/plugins:/var/lib/neo4j/plugins \
-v $HOME/workspace/neo4j/conf:/var/lib/neo4j/conf \
neo4j:enterprise

Step 2. Make Your Datafile of the Required Form

Generally, we need two kinds of .csv files: the nodes and the relationships.
The nodes.csv should be like this:

personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor

and the relationships.csv should be like this:

:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN

The ID should be the unique tag for each node in this graph.
For more details, please refer to the official guide.
Please note: a variable like personId:ID means that it is bounded with some specific property. Although you name the variable personId, you will not see this tag personId for your nodes in your neo4j database being imported.
If you want to add such tag, you can name two variable with same value, for instance:

personId:ID, persionId

and the a tag personId will be add for these nodes.

Step 3. Import the Data

Now, we have finished the preliminary work, and can begin to import the data.
First we need to move the datafile into the import directory of the neo4j database. Since we have set the directory projection from our host machine to our container by

-v $HOME/workspace/neo4j/import:/var/lib/neo4j/import

, we only need to move our datafile to the directory

/workspace/neo4j/import

by using the command cp or mv command on linux command line.

After this, we can import the data to a new database, say database1 by

docker exec --interactive --tty neo4j neo4j-admin import --database=database1 --nodes=import/nodes.csv --relationships=import/relations.csv

if we need to ignore the bad nodes or bad relationships, we can add the parameters

--skip-bad-relationships=true

Step4. Create the Database

First, get in the neo4j browser, by imputing the ip address(say localhost) and the port number(say 7474).

http://localhost:7474

Normally, you will be asked to input the username and password. The default username is neo4j and the default password is neo4j. After imputing, you will be asked to set a new password.
After logging in, we can see neo4j(default) as the default database. Now, switch the current cypher command line to system, and then create new database with the same name as you imported data to

system$ :CREATE DATABASE database1

and then you may see the database that you just created. However, this when you use this database, it will tell you this database is still not available. What you need to do is to restart the whole neo4j database.
A simple way to do this is to restart the neo4j container by the command

docker stop neo4j

and

docker start neo4j

Now, when you log in your neo4j by neo4j browser, you will see your data imported to the database1! Congratulations!

你可能感兴趣的:(How to Import Large Data into neo4j Database)