Saved in:
| Main Authors: | Zhou, Yue, Zhong, Zhihang, Yang, Xue |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.09385 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
by: Zhang, Yue, et al.
Published: (2024)
by: Zhang, Yue, et al.
Published: (2024)
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
by: Xu, Zhen, et al.
Published: (2025)
by: Xu, Zhen, et al.
Published: (2025)
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
by: Peng, Wenshuo, et al.
Published: (2024)
by: Peng, Wenshuo, et al.
Published: (2024)
Explainability for Vision Foundation Models: A Survey
by: Kazmierczak, Rémi, et al.
Published: (2025)
by: Kazmierczak, Rémi, et al.
Published: (2025)
Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation
by: Yu, Wenjun, et al.
Published: (2025)
by: Yu, Wenjun, et al.
Published: (2025)
UniVBench: Towards Unified Evaluation for Video Foundation Models
by: Wei, Jianhui, et al.
Published: (2026)
by: Wei, Jianhui, et al.
Published: (2026)
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
by: Huang, Ziyue, et al.
Published: (2025)
by: Huang, Ziyue, et al.
Published: (2025)
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
by: Zhang, Jinjin, et al.
Published: (2025)
by: Zhang, Jinjin, et al.
Published: (2025)
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
by: Xie, Shenghao, et al.
Published: (2024)
by: Xie, Shenghao, et al.
Published: (2024)
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
by: Liu, Fan, et al.
Published: (2023)
by: Liu, Fan, et al.
Published: (2023)
Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
by: Li, Zhuoxiao, et al.
Published: (2023)
by: Li, Zhuoxiao, et al.
Published: (2023)
Towards Cross-View Point Correspondence in Vision-Language Models
by: Wang, Yipu, et al.
Published: (2025)
by: Wang, Yipu, et al.
Published: (2025)
Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)
by: Zhang, Jingyi, et al.
Published: (2023)
Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation
by: Xue, Xizhe, et al.
Published: (2025)
by: Xue, Xizhe, et al.
Published: (2025)
A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
by: Liu, Daizong, et al.
Published: (2024)
by: Liu, Daizong, et al.
Published: (2024)
Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
by: Le, Long, et al.
Published: (2024)
by: Le, Long, et al.
Published: (2024)
HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)
by: Jiang, Songtao, et al.
Published: (2025)
CanViT: Toward Active-Vision Foundation Models
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)
Vision Foundation Models in Remote Sensing: A Survey
by: Lu, Siqi, et al.
Published: (2024)
by: Lu, Siqi, et al.
Published: (2024)
FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
by: Liu, Yuanqing, et al.
Published: (2026)
by: Liu, Yuanqing, et al.
Published: (2026)
A Survey on Efficient Vision-Language Models
by: Shinde, Gaurav, et al.
Published: (2025)
by: Shinde, Gaurav, et al.
Published: (2025)
Survey of Multimodal Geospatial Foundation Models: Techniques, Applications, and Challenges
by: Yang, Liling, et al.
Published: (2025)
by: Yang, Liling, et al.
Published: (2025)
EVLF-FM: Explainable Vision Language Foundation Model for Medicine
by: Bai, Yang, et al.
Published: (2025)
by: Bai, Yang, et al.
Published: (2025)
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)
by: Zhang, Jianshu, et al.
Published: (2026)
A Survey on Hallucination in Large Vision-Language Models
by: Liu, Hanchao, et al.
Published: (2024)
by: Liu, Hanchao, et al.
Published: (2024)
DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
by: Wu, Hao, et al.
Published: (2024)
by: Wu, Hao, et al.
Published: (2024)
One for All: Toward Unified Foundation Models for Earth Vision
by: Xiong, Zhitong, et al.
Published: (2024)
by: Xiong, Zhitong, et al.
Published: (2024)
Towards a Unified Copernicus Foundation Model for Earth Vision
by: Wang, Yi, et al.
Published: (2025)
by: Wang, Yi, et al.
Published: (2025)
Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset
by: Huang, Wenhui, et al.
Published: (2026)
by: Huang, Wenhui, et al.
Published: (2026)
A Vision-Language Foundation Model for Leaf Disease Identification
by: Quoc, Khang Nguyen, et al.
Published: (2025)
by: Quoc, Khang Nguyen, et al.
Published: (2025)
A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
by: Zhang, Chaoning, et al.
Published: (2023)
by: Zhang, Chaoning, et al.
Published: (2023)
Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
by: Liu, Jiaqi, et al.
Published: (2025)
by: Liu, Jiaqi, et al.
Published: (2025)
Dual-Pathway Circuits of Object Hallucination in Vision-Language Models
by: Liu, Jiaxin, et al.
Published: (2026)
by: Liu, Jiaxin, et al.
Published: (2026)
Co-Training Vision Language Models for Remote Sensing Multi-task Learning
by: Li, Qingyun, et al.
Published: (2025)
by: Li, Qingyun, et al.
Published: (2025)
Falcon: A Remote Sensing Vision-Language Foundation Model (Technical Report)
by: Yao, Kelu, et al.
Published: (2025)
by: Yao, Kelu, et al.
Published: (2025)
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)
by: Cai, Hengxing, et al.
Published: (2025)
Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics
by: Yang, Yuchen, et al.
Published: (2025)
by: Yang, Yuchen, et al.
Published: (2025)
Implicit Modeling for Transferability Estimation of Vision Foundation Models
by: Zheng, Yaoyan, et al.
Published: (2025)
by: Zheng, Yaoyan, et al.
Published: (2025)
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)
by: Truong, Thanh-Dat, et al.
Published: (2025)
Similar Items
-
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024) -
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
by: Zhang, Yue, et al.
Published: (2024) -
Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
by: Xu, Zhen, et al.
Published: (2025) -
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
by: Peng, Wenshuo, et al.
Published: (2024) -
Explainability for Vision Foundation Models: A Survey
by: Kazmierczak, Rémi, et al.
Published: (2025)