论文阅读:MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
1、abstractWepresentMMFT-BERT(MultiModalFusionTransformerwithBERTencodings),tosolveVisualQuestionAnswering(VQA)ensuringindividualandcombinedprocessingofmultipleinputmodalities.Ourapproachbenefitsfrompr