目录
前言
一、redisgraph-bulk-loader介绍
二、使用步骤
1.要求
2.创建docker network
3.创建docker redisgraph
4.安装python3解释器
5.生成csv文件
实体属性
标签文件格式:
关系文件
输入模式
ID命名空间
作者使用的是docker版本的redisgraph
用于从CSV输入构建RedisGraph数据库的Python实用程序
批量加载器实用程序需要Python 3解释器。
代码如下(示例):
docker network create data-import
docker run -p 6379:6379 -d --network data-import --network-alias redisgraph --name redisgraph redislabs/redisgraph
docker run -it --network data-import --network-alias python --name python python
1).docker run -it --name python python bash
2).退出
3).docker start python
4).docker run -it python bash
5) .安装 pip install redisgraph-bulk-loader
假设已经有数据
List pointsList = new ArrayList<>();
CSVUtils.assetsToCsvFile(topic.getFolderId()+"",deviceFolderCacheDtoList);
/**
* 生成csv文件
* @param pointsList
* @return
*/
public static void assetsToCsvFile(String graphName, List pointsList){
log.info("assetsToCsvFile-inCnt:"+pointsList.size());
// 表格头,按照redisgraph-bulk-loader格式要求创建
String[] headArr = new String[]{":ID(Assets)","system_folderId:STRING","system_folderName:STRING","system_tenantId:STRING","system_folderMark:STRING","system_nodeType:STRING","system_code:STRING","system_parentId:STRING","system_folderDesc:STRING"};
//CSV文件路径及名称
String filePath = mainFilePath+graphName; //CSV文件路径
String fileName = "Assets.csv";//CSV文件名称
File csvFile = null;
BufferedWriter csvWriter = null;
try {
csvFile = new File(filePath + File.separator + fileName);
log.info("assetsToCsvFile-csvFile:{}",csvFile.getAbsolutePath()+ csvFile.getName());
File parent = csvFile.getParentFile();
log.info("assetsToCsvFile-parent:{}",parent.getAbsolutePath()+parent.getName());
if (parent != null && !parent.exists()) {
boolean flag = parent.mkdirs();
log.info("assetsToCsvFile-parent.mkdirs():{}",flag);
}
boolean flag = csvFile.createNewFile();
log.info("assetsToCsvFile-csvFile.createNewFile():{}",flag);
csvWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(csvFile), "UTF-8"), 1024);
// 写入文件头部标题行
csvWriter.write(String.join("|", headArr));
csvWriter.newLine();
// 写入文件内容
int cnt = 0;
int totalCnt = 0;
if(pointsList!=null && pointsList.size() > 0){
for (DeviceFolderCacheDto points : pointsList) {
csvWriter.write(points.toRow());
cnt ++;
if(cnt>=1000){
csvWriter.flush();
cnt = 0;
}
totalCnt++;
csvWriter.newLine();
}
}
csvWriter.flush();
log.info("assetsToCsvFile-outCnt:"+totalCnt);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
csvWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
python3 redisgraph_bulk_loader/bulk_insert.py GRAPHNAME [OPTIONS]
Flags | Extended flags | Parameter |
---|---|---|
-h | --host TEXT | Redis server host (default: 127.0.0.1) redisgraph地址:docker版本地址为容器名 |
-p | --port INTEGER | Redis server port (default: 6379) |
-a | --password TEXT | Redis server password (default: none) |
-n | --nodes TEXT | Path to Node CSV file with the filename as the Node Label 指定节点,如果有多个节点,可这么写 --node xxx.csv --node xxx2.csv ,文件名则为node名称 |
-N | --nodes-with-label TEXT | Node Label followed by path to Node CSV file |
-r | --relations TEXT | Path to Relationship CSV file with the filename as the Relationship Type 同节点写法 |
-R | --relations-with-type TEXT | Relationship Type followed by path to relationship CSV file |
-o | --separator CHAR | Field token separator in CSV files (default: comma) CSV文件分隔符,建议使用一个| 或者多个|,因为逗号经常容易出错 |
“”“”-d | --enforce-schema | Requires each cell to adhere to the schema defined in the CSV header 强行按照自定义表头的类型进行录入,不过不指定则按照默认推断 |
-s | --skip-invalid-nodes | Skip nodes that reuse previously defined IDs instead of exiting with an error 跳过错误node而不是退出 |
-e | --skip-invalid-edges | Skip edges that use invalid IDs for endpoints instead of exiting with an error 跳过错误边而不是退出 |
-q | --quote INT | The quoting format used in the CSV file. QUOTE_MINIMAL=0,QUOTE_ALL=1,QUOTE_NONNUMERIC=2,QUOTE_NONE=3 |
-t | --max-token-count INT | (Debug argument) Max number of tokens sent in each Redis query (default 1024) |
-b | --max-buffer-size INT | (Debug argument) Max batch size (MBs) of each Redis query (default 4096) |
-c | --max-token-size INT | (Debug argument) Max size (MBs) of each token sent to Redis (default 500) |
-i | --index Label:Property | After bulk import, create an Index on provided Label:Property pair (optional) |
-f | --full-text-index Label:Property | After bulk import, create an full text index on provided Label:Property pair (optional) |
boolean
:true
或false
(不区分大小写,不加引号)。integer
:一个无引号的值,可以将其读取为整数类型。double
:一个无引号的值,可以将其读取为浮点类型。string
:任何用引号插入或不能转换为数字或布尔类型的字段。array
:任何类型的元素的括号内插数组。数组中的字符串必须明确地用引号插入。数组属性要求对CSV(-o
)使用非逗号分隔符。--enforce-schema
标志和一个输入模式,如果类型推断不期望应使用。如果--enforce-schema
指定了该标志,则所有输入的CSV均应在标题中指定每一列的数据类型。
此格式取消了默认CSV格式的一些限制,例如ID字段为第一列。
大多数标题字段应为属性名称及其数据类型的冒号分隔对,例如Name:STRING
。某些数据类型不需要名称字符串,如下所示。
可接受的数据类型为:
Type String | Description | Requires name string |
---|---|---|
ID | Label files only - Unique identifier for a node | Optional |
START_ID | Relation files only - The ID field of this relation's source | No |
END_ID | Relation files only - The ID field of this relation's destination | No |
IGNORE | This column will not be added to the graph | Optional |
DOUBLE / FLOAT | A signed 64-bit floating-point value | Yes |
INT / INTEGER / LONG | A signed 64-bit integer value | Yes |
BOOLEAN | A boolean value indicated by the string 'true' or 'false' | Yes |
STRING | A string value | Yes |
ARRAY | An array value |
通常,节点标识符在所有输入CSV中必须唯一。使用输入模式时,(可选)可以创建ID名称空间,并且标识符仅需要在其名称空间中是唯一的。当每个输入CSV的主键相互重叠时,这特别有用。
要引入名称空间,请在:ID
类型字符串后加上括号内插的名称空间字符串,例如:ID(User)
。应该在关系文件的:START_ID
或:END_ID
字段中指定相同的名称空间,如中所述:START_ID(User)
。
引用连接:https://github.com/redisgraph/redisgraph-bulk-loader
最终生成的命令示例:
python3 redisgraph-bulk-loader 3661091848455168 --host redisgraph --port 6379 --enforce-schema --separator '|' --skip-invalid-edges --nodes /data/redisgraph-data/3661091848455168/Point.csv --nodes /data/redisgraph-data/3661091848455168/Assets.csv --relations /data/redisgraph-data/3661091848455168/assets_assets.csv --relations /data/redisgraph-data/3661091848455168/assets_point.csv
在python3容易的/usr/local/bin 下面执行该命令则可以导入数据
注:一个图只能一次性生成,如果已经存在该图名称则报错