Saved in:
| Main Authors: | Zhan, Yang, Xiong, Zhitong, Yuan, Yuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.09712 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
by: Luo, Junwei, et al.
Published: (2024)
by: Luo, Junwei, et al.
Published: (2024)
ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models
by: Yuan, Zhenghang, et al.
Published: (2024)
by: Yuan, Zhenghang, et al.
Published: (2024)
UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models
by: Li, Yujie, et al.
Published: (2024)
by: Li, Yujie, et al.
Published: (2024)
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
by: Li, Xiang, et al.
Published: (2023)
by: Li, Xiang, et al.
Published: (2023)
Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models
by: Guo, Haonan, et al.
Published: (2024)
by: Guo, Haonan, et al.
Published: (2024)
Hierarchical Semi-Supervised Active Learning for Remote Sensing
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning
by: Bai, Yang, et al.
Published: (2024)
by: Bai, Yang, et al.
Published: (2024)
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
by: Zhan, Yufei, et al.
Published: (2024)
by: Zhan, Yufei, et al.
Published: (2024)
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning
by: Chen, Zewen, et al.
Published: (2024)
by: Chen, Zewen, et al.
Published: (2024)
An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain
by: Silva, João Daniel, et al.
Published: (2025)
by: Silva, João Daniel, et al.
Published: (2025)
One for All: Toward Unified Foundation Models for Earth Vision
by: Xiong, Zhitong, et al.
Published: (2024)
by: Xiong, Zhitong, et al.
Published: (2024)
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing
by: Zhang, Zilun, et al.
Published: (2025)
by: Zhang, Zilun, et al.
Published: (2025)
Instruction-Free Tuning of Large Vision Language Models for Medical Instruction Following
by: Kang, Myeongkyun, et al.
Published: (2026)
by: Kang, Myeongkyun, et al.
Published: (2026)
UAVBench and UAVIT-1M: Benchmarking and Enhancing MLLMs for Low-Altitude UAV Vision-Language Understanding
by: Zhan, Yang, et al.
Published: (2026)
by: Zhan, Yang, et al.
Published: (2026)
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models
by: An, Xiao, et al.
Published: (2024)
by: An, Xiao, et al.
Published: (2024)
Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning
by: Park, Dongmin, et al.
Published: (2024)
by: Park, Dongmin, et al.
Published: (2024)
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
by: Zhang, Shilong, et al.
Published: (2023)
by: Zhang, Shilong, et al.
Published: (2023)
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Jinrui, et al.
Published: (2024)
by: Zhang, Jinrui, et al.
Published: (2024)
Large Vision-Language Models for Remote Sensing Visual Question Answering
by: Siripong, Surasakdi, et al.
Published: (2024)
by: Siripong, Surasakdi, et al.
Published: (2024)
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of Experts
by: Liu, Xu, et al.
Published: (2024)
by: Liu, Xu, et al.
Published: (2024)
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
by: Liu, Fan, et al.
Published: (2023)
by: Liu, Fan, et al.
Published: (2023)
DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark
by: Li, Haodong, et al.
Published: (2024)
by: Li, Haodong, et al.
Published: (2024)
Continual LLaVA: Continual Instruction Tuning in Large Vision-Language Models
by: Cao, Meng, et al.
Published: (2024)
by: Cao, Meng, et al.
Published: (2024)
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
by: Wang, Xunguang, et al.
Published: (2023)
by: Wang, Xunguang, et al.
Published: (2023)
DOFA-CLIP: Multimodal Vision-Language Foundation Models for Earth Observation
by: Xiong, Zhitong, et al.
Published: (2025)
by: Xiong, Zhitong, et al.
Published: (2025)
LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation
by: Li, Zhenshi, et al.
Published: (2024)
by: Li, Zhenshi, et al.
Published: (2024)
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
by: Zhang, Yingying, et al.
Published: (2025)
by: Zhang, Yingying, et al.
Published: (2025)
GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing
by: Ou, Ruizhe, et al.
Published: (2025)
by: Ou, Ruizhe, et al.
Published: (2025)
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization
by: Shen, Yang, et al.
Published: (2024)
by: Shen, Yang, et al.
Published: (2024)
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation
by: Lin, Tianwei, et al.
Published: (2025)
by: Lin, Tianwei, et al.
Published: (2025)
Decoding the Delta: Unifying Remote Sensing Change Detection and Understanding with Multimodal Large Language Models
by: Li, Xiaohe, et al.
Published: (2026)
by: Li, Xiaohe, et al.
Published: (2026)
UMIT: Unifying Medical Imaging Tasks via Vision-Language Models
by: Yu, Haiyang, et al.
Published: (2025)
by: Yu, Haiyang, et al.
Published: (2025)
ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning
by: Deng, Pei, et al.
Published: (2024)
by: Deng, Pei, et al.
Published: (2024)
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
by: Yuan, Zhengqing, et al.
Published: (2023)
by: Yuan, Zhengqing, et al.
Published: (2023)
Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives
by: Weng, Xingxing, et al.
Published: (2025)
by: Weng, Xingxing, et al.
Published: (2025)
GeoLLaVA: Efficient Fine-Tuned Vision-Language Models for Temporal Change Detection in Remote Sensing
by: Elgendy, Hosam, et al.
Published: (2024)
by: Elgendy, Hosam, et al.
Published: (2024)
MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models
by: Yan, Qiao, et al.
Published: (2025)
by: Yan, Qiao, et al.
Published: (2025)
SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images
by: Si, Dongchen, et al.
Published: (2025)
by: Si, Dongchen, et al.
Published: (2025)
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
by: Dang, Yunkai, et al.
Published: (2025)
by: Dang, Yunkai, et al.
Published: (2025)
Similar Items
-
SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding
by: Luo, Junwei, et al.
Published: (2024) -
ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models
by: Yuan, Zhenghang, et al.
Published: (2024) -
UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models
by: Li, Yujie, et al.
Published: (2024) -
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024) -
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
by: Li, Xiang, et al.
Published: (2023)