坑1:Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
SLF4J jar 需要被添加到classpath
添加依赖。
org.slf4j
slf4j-api
1.7.30
坑2:NoClassDefFoundError: org/apache/logging/log4j/util/StackLocatorUtil
项目中已经有log4j了,与milevus-sdk.pom中的log4j产生冲突,将其exclusion即可。
io.milvus
milvus-sdk-java
0.6.0
log4j-slf4j-impl
org.apache.logging.log4j
坑3:NoClassDefFoundError: com/google/protobuf/GeneratedMessageV3
milevus-sdk中的protobuf依赖没生效,需要在当前项目中显示添加protobuf依赖。(注意版本要一致)
com.google.protobuf
protobuf-java-util
3.11.0
坑4:NoSuchMethodError:com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;J)V
原因是当前guava版本过低。
升级:
com.google.guava
guava
23.6-jre
坑5:查询较慢
1、可能是milevus服务端参数cache配置过小,(cache_config.cpu_cache_capacity),此配置默认是4G,如果存的向量容量大于4G,就会很慢。
2、可能是没有显示手动创建索引
先连接milevus,再创建collection,并在collection里创建index,之后根据需要在collection里创建分区,最后往collection里对应的分区里插数据。
连接milevus
public static MilvusClient getMilevusClient(String host, int port) {
logger.info("prepared to getMilevusClient...");
if (milvusClient == null || !milvusClient.isConnected()) {
milvusClient = new MilvusGrpcClient();
ConnectParam connectParam = new ConnectParam.Builder()
.withHost(host)
.withPort(port)
.withConnectTimeout(10, TimeUnit.SECONDS)
.build();
logger.info("getMilevusClient milvusClient: {}, connected: {}", milvusClient, milvusClient.isConnected());
try {
Response connectResponse = milvusClient.connect(connectParam);
} catch (ConnectFailedException e) {
System.out.println("Failed to connect to Milvus server: " + e.toString());
logger.error("Failed to connect to Milvus server: {}", e);
}
logger.info("getMilevusClient milevus isConnected: {}", milvusClient.isConnected());
return milvusClient.isConnected() ? milvusClient : null;
}
return milvusClient;
}
检查collection并创建
现在milevus不支持字符串索引,所以需要将业务端id与milevusid的id映射关系存下来,这里选型MongoDB。
try {
// check partiton exist
HasPartitionResponse isPartitionExist = milvusClient.hasPartition(collectionName, partitionName);
if (isPartitionExist.ok() && !isPartitionExist.hasPartition()) {
// create partition
milvusClient.createPartition(collectionName, partitionName);
}
// batch insert id mapping to mongo
MongoUtils.batchInsertDocList(mongoClient, ANN_DATABASE_NAME, ANN_COLLECTION_NAME, documentList);
logger.info("success to insert to mongo, documentList size is: {}", documentList.size());
System.out.println("batch inserting to mongo...");
// batch insert to milvus
InsertParam insertParam = new InsertParam
.Builder(collectionName)
.withPartitionTag(partitionName)
.withFloatVectors(batchVectorList)
.withVectorIds(batchMilevusIdList)
.build();
logger.info("batchVectorList size: {}", batchVectorList.size());
logger.info("batchMilevusIdList size: {}", batchMilevusIdList.size());
ListenableFuture<InsertResponse> insertResponse = milvusClient.insertAsync(insertParam);
} catch (Exception e) {
logger.error("batch insert occur error, exp: {}", e);
System.out.println(Arrays.toString(e.getStackTrace()));
}
注意
如果选择内积作为衡量向量相似度的指标(即IP),需要在向量导入前做归一化。
static List<Float> normalizeVector(List<Float> vector) {
float squareSum = vector.stream().map(x -> x * x).reduce((float) 0, Float::sum);
final float norm = (float) Math.sqrt(squareSum);
vector = vector.stream().map(x -> x / norm).collect(Collectors.toList());
return vector;
}