Efficient Token-Guided Image-Text Retrieval withConsistent Multimodal Contrastive Training
paper:https://arxiv.org/pdf/2306.08789.pdfcode:https://github.com/LCFractal/TGDT1.论文核心思想整合了粗粒度与细粒度检索,利用了二者的优点新的训练目标:ConsistentMultimodalContrastive(CMC)loss,确保模态内和模态间语义一致性基于混合全局和局部的跨模态相似性两阶段推理方法效果:检索精