Attempting to batch create nodes & relationships - batch creation is failing - Traceback at end of the post
Note code functions with smaller subset of nodes - fails when get into massive number of relationships, unclear at what limit this is occurring.
Wondering if I need to increase ulimit above 40,000 open files
Read somewhere where persons were running into Xstream issues with REST API while conducting batch create - unclear if the problem set is on the py2neo end of the spectrum, or on the Neo4j server tuning/configuration, or on the Python end of the spectrum.
Any guidance would be greatly appreciated.
One cluster within the data set ends up with around 625525 relationships out of 700+ nodes.
Total Relationships will be 1M+ - utilizing an Apple Macbook Pro Retina with x86_64 - Ubuntu 13.04, SSD, 8GB memory.
Neo4j: configured auto_indexing & auto_relationships set to ON
Nodes Clustered/Grouped via Python Panadas DataFrame.groupby()
Nodes: contain 3 properties
Relationships Properties: 1 -> IN & Out Relationships created
ulimit set to 40,000 files open
Code
Operating System: Ubuntu 13.04
Python version: 2.7.5
py2neo Version: 1.5.1
Java version: 1.7.0_25-b15
Neo4j version: Community Edition 1.9.2
Traceback
Traceback (most recent call last):
File "/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py", line 63, in
main()
File "/home/alienone/Programming/Python/OSINT/MANDIANTAPT/spitball.py", line 59, in main
graph_db.create(*sorted_nodes)
File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 420, in create
return batch.submit()
File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 2123, in submit
for response in self._submit()
File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/neo4j.py", line 2092, in submit
for id, request in enumerate(self.requests)
File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 428, in _send
return self._client().send(request)
File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 365, in send
return Response(request.graph_db, rs.status, request.uri, rs.getheader("Location", None), rs_body)
File "/home/alienone/.pythonbrew/pythons/Python-2.7.5/lib/python2.7/site-packages/py2neo/rest.py", line 279, in init
raise SystemError(body)
SystemError: None
Process finished with exit code 1
解决方案
I had a similar issue. One way to deal with it is to do the batch.submit() for chunks of your data and not the whole data set. This is slower of course, but splitting one million nodes in chunks of 5000 is still faster than adding every node separately.
I use a small helper class to do this, note that all my nodes are indexed: https://gist.github.com/anonymous/6293739