Cross-modal Retrieval

Cross-modal retrieval aims at retrieving relevant items that are of different nature with respect to the query format.

Four Challenges:

1.representation

2.translation

3.alignment(对齐)

4.co-learning

挑战:The main challenge is to measure the similarity between different modalities of data.

方法:map images and texts into a shared latent space F in which they can be compared

对齐的两种策略

1) global alignment methods aiming at mapping each modal manifold in F such that semantically similar regions share the same directions in F;

全局对齐方法,将每个模态流形映射到F中,使得语义上相似的区域在F中共享相同的方向。

2) local metric learning approaches aiming at mapping each modal manifold such that semantically similar items have a short distances in F

局部度量方法:映射每个模态流形,使得语义相似的items在F中距离更短。

 

 

Multimodal alignment faces a number of difficulties:

1) there are few datasets with explicitly annotated alignments;

2) it is difficult to design similarity metrics between modalities;(模态间的相似度度量)

3) there may exist multiple possible alignments and not all elements in one modality have correspondences in another(可能存在多个匹配或者无匹配)

你可能感兴趣的:(论文阅读)