Multimodal Contrastive Training for Visual Representation Learning

Multimodal Contrastive Training for Visual Representation Learning_第1张图片
parameterize the image encoder as f i q _{iq} iq
Multimodal Contrastive Training for Visual Representation Learning_第2张图片
query feature q i i _{ii} ii,key feature k i i _{ii} ii
parameterize the textual encoder as f c q ( ⋅ ; Θ q , Φ c q ) f_{cq}(·; Θ_q, Φ_{cq}) fcq(⋅;Θq,Φcq),momentum textual encoder as f c k ( ⋅ ; Θ k , Φ i k ) f_{ck}(·; Θ_k, Φ_{ik}) fck(⋅;Θk,Φik). c j † c^†_j cj c j ⋆ c^\star_j cj是different augmented examples
在这里插入图片描述

吐槽

第一张图字母下标被黑色背景盖住了,且作者不公布代码,不该是CVPR的“水平”

你可能感兴趣的:(表征学习,人工智能,计算机视觉,算法,深度学习)