Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
TasksVisualDescriptionGenerationImageDescriptionGenerationStandardImageDescriptionGenerationDenseImageDescriptionGeneration:旨在局部目标处生成描述ImageParagraphGeneration:生成段落SpokenLanguageImageDescriptionGenera