遥感多模态基础大模型汇总-实时更新

本文内容来自下面链接,考虑到很多同学登录不了,故在此平台进行分享。
遥感基础大模型

Table of Contents

  • Models
  • Remote Sensing Vision Foundation Models 遥感视觉基础模型
  • Remote Sensing Vision-Language Foundation Models 遥感视觉语言基础模型
  • Remote Sensing Generative Foundation Models 遥感生成式基础模型
  • Remote Sensing Vision-Location Foundation Models 遥感视觉定位基础模型
  • Remote Sensing Vision-Audio Foundation Models 遥感视觉视频基础模型
  • Remote Sensing Task-specific Foundation Models 遥感特定任务基础模型
  • Remote Sensing Agents 遥感智能体
  • Datasets & Benchmarks 基准数据集
  • Benchmarks for RSFMs 遥感预训练模型
  • (Large-scale) Pre-training Datasets 遥感大尺度预训练数据集
  • Others
  • Relevant Projects
  • Survey Papers

Remote Sensing Vision Foundation Models

Abbreviation Title Publication Paper Code & Weights
GeoKR Geographical Knowledge-Driven Representation Learning for Remote Sensing Images TGRS2021 GeoKR link
- Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding CVPRW2021 Paper link
GASSL Geography-Aware Self-Supervised Learning ICCV2021 GASSL link
SeCo Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data ICCV2021 SeCo link
DINO-MM Self-supervised Vision Transformers for Joint SAR-optical Representation Learning IGARSS2022 DINO-MM link
SatMAE SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery NeurIPS2022 SatMAE link
RS-BYOL Self-Supervised Learning for Invariant Representations From Multi-Spectral and SAR Images JSTARS2022 RS-BYOL null
GeCo Geographical Supervision Correction for Remote Sensing Representation Learning TGRS2022 GeCo null
RingMo RingMo: A remote sensing foundation model with masked image modeling TGRS2022 RingMo Code
RVSA Advancing plain vision transformer toward remote sensing foundation model TGRS2022 RVSA link
RSP An Empirical Study of Remote Sensing Pretraining TGRS2022 RSP link
MATTER Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks CVPR2022 MATTER null
CSPT Consecutive Pre-Training: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain RS2022 CSPT link
- Self-supervised Vision Transformers for Land-cover Segmentation and Classification CVPRW2022 Paper link
BFM A billion-scale foundation model for remote sensing images Arxiv2023 BFM null
TOV TOV: The original vision model for optical remote sensing image understanding via self-supervised learning JSTARS2023 TOV link
CMID CMID: A Unified Self-Supervised Learning Framework for Remote Sensing Image Understanding TGRS2023 CMID link
RingMo-Sense RingMo-Sense: Remote Sensing Foundation Model for Spatiotemporal Prediction via Spatiotemporal Evolution Disentangling TGRS2023 RingMo-Sense null
IaI-SimCLR Multi-Modal Multi-Objective Contrastive Learning for Sentinel-1/2 Imagery CVPRW2023 IaI-SimCLR null
CACo Change-Aware Sampling and Contrastive Learning for Satellite Images CVPR2023 CACo link
SatLas SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding ICCV2023 SatLas link
GFM Towards Geospatial Foundation Models via Continual Pretraining ICCV2023 GFM link
Scale-MAE Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning ICCV2023 Scale-MAE link
DINO-MC DINO-MC: Self-supervised Contrastive Learning for Remote Sensing Imagery with Multi-sized Local Crops Arxiv2023 DINO-MC link
CROMA CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders NeurIPS2023 CROMA link
Cross-Scale MAE Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing NeurIPS2023 Cross-Scale MAE link
DeCUR DeCUR: decoupling common & unique representations for multimodal self-supervision Arxiv2023 DeCUR link
Presto Lightweight, Pre-trained Transformers for Remote Sensing Timeseries Arxiv2023 Presto link
CtxMIM CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding Arxiv2023 CtxMIM null
FG-MAE Feature Guided Masked Autoencoder for Self-supervised Learning in Remote Sensing Arxiv2023 FG-MAE link
Prithvi Foundation Models for Generalist Geospatial Artificial Intelligence Arxiv2023 Prithvi link
RingMo-lite RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework Arxiv2023 RingMo-lite null
- A Self-Supervised Cross-Modal Remote Sensing Foundation Model with Multi-Domain Representation and Cross-Domain Fusion IGARSS2023 Paper null
EarthPT EarthPT: a foundation model for Earth Observation NeurIPS2023 CCAI workshop EarthPT link
USat USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery Arxiv2023 USat link
FoMo-Bench FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models Arxiv2023 FoMo-Bench link
AIEarth Analytical Insight of Earth: A Cloud-Platform of Intelligent Computing for Geospatial Big Data Arxiv2023 AIEarth link
- Self-Supervised Learning for SAR ATR with a Knowledge-Guided Predictive Architecture Arxiv2023 Paper link
Clay Clay Foundation Model - null link
Hydro Hydro–A Foundation Model for Water in Satellite Imagery - null link
U-BARN Self-Supervised Spatio-Temporal Representation Learning of Satellite Image Time Series JSTARS2024 Paper link
GeRSP Generic Knowledge Boosted Pre-training For Remote Sensing Images Arxiv2024 GeRSP GeRSP
SwiMDiff SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image Arxiv2024 SwiMDiff null
OFA-Net One for All: Toward Unified Foundation Models for Earth Vision Arxiv2024 OFA-Net null
SMLFR Generative ConvNet Foundation Model With Sparse Modeling and Low-Frequency Reconstruction for Remote Sensing Image Interpretation TGRS2024 SMLFR link
SpectralGPT SpectralGPT: Spectral Foundation Model TPAMI2024 SpectralGPT link
S2MAE S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data CVPR2024 S2MAE null
SatMAE++ Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery CVPR2024 SatMAE++ link
msGFM Bridging Remote Sensors with Multisensor Geospatial Foundation Models CVPR2024 msGFM link
SkySense SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery CVPR2024 SkySense Comming soon
MTP MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining Arxiv2024 MTP link
DOFA Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities Arxiv2024 DOFA link
PIS Pretrain A Remote Sensing Foundation Model by Promoting Intra-instance Similarity - null link
MMEarth MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning Arxiv2024 MMEarth link
SARATR-X SARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target Recognition Arxiv2024 SARATR-X link
LeMeViT LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation IJCAI2024 LeMeViT link
SoftCon Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining Arxiv2024 SoftCon link
RS-DFM RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks Arxiv2024 RS-DFM null
A2-MAE A2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder Arxiv2024 A2-MAE null
HyperSIGMA HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model Arxiv2024 HyperSIGMA link
SelectiveMAE Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset Arxiv2024 SelectiveMAE link
OmniSat OmniSat: Self-Supervised Modality Fusion for Earth Observation ECCV2024 OmniSat link
MM-VSF Towards a Knowledge guided Multimodal Foundation Model for Spatio-Temporal Remote Sensing Applications Arxiv2024 MM-VSF null
MA3E Masked Angle-Aware Autoencoder for Remote Sensing Images ECCV2024 MA3E link
SpectralEarth SpectralEarth: Training Hyperspectral Foundation Models at Scale Arxiv2024 SpectralEarth null

Remote Sensing Vision-Language Foundation Models

Abbreviation Title Publication Paper Code & Weights
RSGPT RSGPT: A Remote Sensing Vision Language Model and Benchmark Arxiv2023 RSGPT link
RemoteCLIP RemoteCLIP: A Vision Language Foundation Model for Remote Sensing Arxiv2023 RemoteCLIP link
GeoRSCLIP RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv2023 GeoRSCLIP link
GRAFT Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment ICLR2024 GRAFT null
- Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs Arxiv2023 Paper link
- Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models Arxiv2024 Paper link
SkyEyeGPT SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model Arxiv2024 Paper link
EarthGPT EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain Arxiv2024 Paper null
SkyCLIP SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing AAAI2024 SkyCLIP link
GeoChat GeoChat: Grounded Large Vision-Language Model for Remote Sensing CVPR2024 GeoChat link
LHRS-Bot LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model Arxiv2024 Paper link
H2RSVLM H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model Arxiv2024 Paper link
RS-LLaVA RS-LLaVA: Large Vision Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery RS2024 Paper link
SkySenseGPT SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding Arxiv2024 Paper link

Remote Sensing Generative Foundation Models

Abbreviation Title Publication Paper Code & Weights
Seg2Sat Seg2Sat - Segmentation to aerial view using pretrained diffuser models Github null link
- Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps NeurIPSW2023 Paper link
GeoRSSD RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv2023 Paper link
DiffusionSat DiffusionSat: A Generative Foundation Model for Satellite Imagery ICLR2024 DiffusionSat link
CRS-Diff CRS-Diff: Controllable Generative Remote Sensing Foundation Model Arxiv2024 Paper null
MetaEarth MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation Arxiv2024 Paper link

Remote Sensing Vision-Location Foundation Models

Abbreviation Title Publication Paper Code & Weights
CSP CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations ICML2023 CSP link
GeoCLIP GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization NeurIPS2023 GeoCLIP link
SatCLIP SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery Arxiv2023 SatCLIP link

Remote Sensing Vision-Audio Foundation Models

Abbreviation Title Publication Paper Code & Weights
- Self-supervised audiovisual representation learning for remote sensing data JAG2022 Paper link

Remote Sensing Task-specific Foundation Models

Abbreviation Title Publication Paper Code & Weights Task
SS-MAE SS-MAE: Spatial-Spectral Masked Auto-Encoder for Mulit-Source Remote Sensing Image Classification TGRS2023 Paper link Image Classification
TTP Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection Arxiv2023 Paper link Change Detection
CSMAE Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing Arxiv2024 Paper link Image Retrieval
RSPrompter RSPrompter: Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model TGRS2024 Paper link Instance Segmentation
BAN A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection TGRS2024 Paper link Change Detection
- Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM) Arxiv2024 Paper null Change Detection (Optical & OSM data)
AnyChange Segment Any Change Arxiv2024 Paper null Zero-shot Change Detection
RS-CapRet Large Language Models for Captioning and Retrieving Remote Sensing Images Arxiv2024 Paper null Image Caption & Text-image Retrieval
- Task Specific Pretraining with Noisy Labels for Remote sensing Image Segmentation Arxiv2024 Paper null Image Segmentation (Noisy labels)
RSBuilding RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model Arxiv2024 Paper link Building Extraction and Change Detection
SAM-Road Segment Anything Model for Road Network Graph Extraction Arxiv2024 Paper link Road Extraction

Remote Sensing Agents

Abbreviation Title Publication Paper Code & Weights
GeoLLM-QA Evaluating Tool-Augmented Agents in Remote Sensing Platforms ICLR 2024 ML4RS Workshop Paper null
RS-Agent RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents Arxiv2024 Paper null

Benchmarks for RSFMs

Abbreviation Title Publication Paper Link Downstream Tasks
- Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters Arxiv2023 Paper link Classification
GEO-Bench GEO-Bench: Toward Foundation Models for Earth Monitoring Arxiv2023 Paper link Classification & Segmentation
FoMo-Bench FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models Arxiv2023 FoMo-Bench Comming soon Classification & Segmentation & Detection for forest monitoring
PhilEO PhilEO Bench: Evaluating Geo-Spatial Foundation Models Arxiv2024 Paper link Segmentation & Regression estimation
SkySense SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery CVPR2024 SkySense Comming Soon Classification & Segmentation & Detection & Change detection & Multi-Modal Segmentation: Time-insensitive LandCover Mapping & Multi-Modal Segmentation: Time-sensitive Crop Mapping & Multi-Modal Scene Classification
VLEO-Bench Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data Arxiv2024 VLEO-bench link Location Recognition & Captioning & Scene Classification & Counting & Detection & Change detection
VRSBench VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Arxiv2024 VRSBench link Image Captioning & Object Referring & Visual Question Answering

(Large-scale) Pre-training Datasets

Abbreviation Title Publication Paper Attribute Link
fMoW Functional Map of the World CVPR2018 fMoW Vision link
SEN12MS SEN12MS – A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion - SEN12MS Vision link
BEN-MM BigEarthNet-MM: A Large Scale Multi-Modal Multi-Label Benchmark Archive for Remote Sensing Image Classification and Retrieval GRSM2021 BEN-MM Vision link
MillionAID On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID JSTARS2021 MillionAID Vision link
SeCo Seasonal Contrast: Unsupervised Pre-Training From Uncurated Remote Sensing Data ICCV2021 SeCo Vision link
fMoW-S2 SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery NeurIPS2022 fMoW-S2 Vision link
TOV-RS-Balanced TOV: The original vision model for optical remote sensing image understanding via self-supervised learning JSTARS2023 TOV Vision link
SSL4EO-S12 SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation GRSM2023 SSL4EO-S12 Vision link
SSL4EO-L SSL4EO-L: Datasets and Foundation Models for Landsat Imagery Arxiv2023 SSL4EO-L Vision link
SatlasPretrain SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding ICCV2023 SatlasPretrain Vision (Supervised) link
CACo Change-Aware Sampling and Contrastive Learning for Satellite Images CVPR2023 CACo Vision Comming soon
SAMRS SAMRS: Scaling-up Remote Sensing Segmentation Dataset with Segment Anything Model NeurIPS2023 SAMRS Vision link
RSVG RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data TGRS2023 RSVG Vision-Language link
RS5M RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model Arxiv2023 RS5M Vision-Language link
GEO-Bench GEO-Bench: Toward Foundation Models for Earth Monitoring Arxiv2023 GEO-Bench Vision (Evaluation) link
RSICap & RSIEval RSGPT: A Remote Sensing Vision Language Model and Benchmark Arxiv2023 RSGPT Vision-Language Comming soon
Clay Clay Foundation Model - null Vision link
SATIN SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models ICCVW2023 SATIN Vision-Language link
SkyScript SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing AAAI2024 SkyScript Vision-Language link
ChatEarthNet ChatEarthNet: A Global-Scale, High-Quality Image-Text Dataset for Remote Sensing Arxiv2024 ChatEarthNet Vision-Language link
LuoJiaHOG LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrieval Arxiv2024 LuoJiaHOG Vision-Language null
MMEarth MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning Arxiv2024 MMEarth Vision link
SeeFar SeeFar: Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models Arxiv2024 SeeFar Vision link
FIT-RS SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding Arxiv2024 Paper Vision-Language link
RS-GPT4V RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding Arxiv2024 Paper Vision-Language link
RS-4M Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset Arxiv2024 RS-4M Vision link
Major TOM Major TOM: Expandable Datasets for Earth Observation Arxiv2024 Major TOM Vision link
VRSBench VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding Arxiv2024 VRSBench Vision-Language link

Relevant Projects

(TODO. This section is dedicated to recommending more relevant and impactful projects, with the hope of promoting the development of the RS community. )

Title Link Brief Introduction
RSFMs (Remote Sensing Foundation Models) Playground link An open-source playground to streamline the evaluation and fine-tuning of RSFMs on various datasets.

Survey Papers

Title Publication Paper Attribute
Self-Supervised Remote Sensing Feature Learning: Learning Paradigms, Challenges, and Future Works TGRS2023 Paper Vision & Vision-Language
The Potential of Visual ChatGPT For Remote Sensing Arxiv2023 Paper Vision-Language
遥感大模型:进展与前瞻 武汉大学学报 (信息科学版) 2023 Paper Vision & Vision-Language
地理人工智能样本:模型、质量与服务 武汉大学学报 (信息科学版) 2023 Paper -
Brain-Inspired Remote Sensing Foundation Models and Open Problems: A Comprehensive Survey JSTARS2023 Paper Vision & Vision-Language
Revisiting pre-trained remote sensing model benchmarks: resizing and normalization matters Arxiv2023 Paper Vision
An Agenda for Multimodal Foundation Models for Earth Observation IGARSS2023 Paper Vision
Transfer learning in environmental remote sensing RSE2024 Paper Transfer learning
遥感基础模型发展综述与未来设想 遥感学报2023 Paper -
On the Promises and Challenges of Multimodal Foundation Models for Geographical, Environmental, Agricultural, and Urban Planning Applications Arxiv2023 Paper Vision-Language
Vision-Language Models in Remote Sensing: Current Progress and Future Trends IEEE GRSM2024 Paper Vision-Language
On the Foundations of Earth and Climate Foundation Models Arxiv2024 Paper Vision & Vision-Language
Towards Vision-Language Geo-Foundation Model: A Survey Arxiv2024 Paper Vision-Language
AI Foundation Models in Remote Sensing: A Survey Arxiv2024 Paper Vision

Citation

If you find this repository useful, please consider giving a star ⭐️ and citation:

@inproceedings{guo2024skysense,
  title={Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery},
  author={Guo, Xin and Lao, Jiangwei and Dang, Bo and Zhang, Yingying and Yu, Lei and Ru, Lixiang and Zhong, Liheng and Huang, Ziyuan and Wu, Kang and Hu, Dingxiang and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={27672--27683},
  year={2024}
}

欢迎点赞,收藏,关注,支持小生,打造一个好的遥感领域知识分享专栏。遥感专栏
同时欢迎私信咨询讨论学习,咨询讨论的方向不限于:地物分类/语义分割(如水体,云,建筑物,耕地,冬小麦等各种地物类型的提取),变化检测,夜光遥感数据处理,目标检测,图像处理(几何矫正,辐射矫正(大气校正),图像去噪等),遥感时空融合,定量遥感(土壤盐渍化/水质参数反演/气溶胶反演/森林参数(生物量,植被覆盖度,植被生产力等)/地表温度/地表反射率等反演)以及高光谱数据处理等领域以及深度学习,机器学习等技术算法讨论,以及相关实验指导/论文指导,考研复习等多方面。

你可能感兴趣的:(#遥感,#图像处理,深度学习,#大模型,#遥感大模型)