使用towhee向milvus插入数据后Number of data inserted is 0问题

使用的程序片断如下所示:

......
112 p_insert = (
113     pipe.input('img_path')
114         .map('img_path', 'img', ops.image_decode('rgb'))
115         .map('img', 'vec', ops.image_embedding.timm(model_name=MODEL, device=DEVICE))
116         .map('vec', 'vec', lambda x: x / numpy.linalg.norm(x, axis=0))
117         .map(('img_path', 'vec'), 'mr', ops.ann_insert.milvus_client(
118                     host=HOST,
119                     port=PORT,
120                     collection_name=COLLECTION_NAME
121                     ))
122           .output('mr')
123 )
124
125 for img_path in to_insert:
126     p_insert(img_path)
127 print('Number of data inserted:', collection.num_entities)

运行结果总是:Number of data inserted:0
这个问题困扰了许久,总是不知什么原因。期间查找Milvus官网有提到需要注意使用flush来写数据,但是并没有太在意,或者好像试过没有出现效果。今天在代码中重新尝试一下,即在如上代码的print之前添加以下两行语句:

127 collection.flush()
128 collection.load()

终于出现了期待的输出:
Number of data inserted: 1000
至此,问题解决。
Milvus的官网中提到的flush说明见https://milvus.io/docs/insert_data.md:

Flush the Data in Milvus
When data is inserted into Milvus it is inserted into segments. Segments have to reach a certain size to be sealed
 and indexed. Unsealed segments will be searched brute force. In order to avoid this with any remainder data, it is 
 best to call flush(). The flush call with seal any remaining segments and send them for indexing. It is important to 
 only call this at the end of an insert session, as calling this too much will cause fragmented data that will need to 
 be cleaned later on.

你可能感兴趣的:(milvus,towhee)