InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning【翻译】
目录Abstract1Introduction2Vision-LanguageInstructionTuning2.1TasksandDatasets2.2TrainingandEvaluationProtocols2.3Instruction-awareVisualFeatureExtraction2.4TrainingDatasetBalancing2.5InferenceMethods2.6