:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Yue, Zhong, Zhihang, Yang, Xue
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2406.09385
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
by: Zhang, Yue, et al.
Published: (2024)

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation
by: Xu, Zhen, et al.
Published: (2025)

Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
by: Peng, Wenshuo, et al.
Published: (2024)

Explainability for Vision Foundation Models: A Survey
by: Kazmierczak, Rémi, et al.
Published: (2025)

Vision-Language Semantic Aggregation Leveraging Foundation Model for Generalizable Medical Image Segmentation
by: Yu, Wenjun, et al.
Published: (2025)

UniVBench: Towards Unified Evaluation for Video Foundation Models
by: Wei, Jianhui, et al.
Published: (2026)

A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
by: Huang, Ziyue, et al.
Published: (2025)

Towards Training-free Anomaly Detection with Vision and Language Foundation Models
by: Zhang, Jinjin, et al.
Published: (2025)

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
by: Xie, Shenghao, et al.
Published: (2024)

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
by: Liu, Fan, et al.
Published: (2023)

Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
by: Li, Zhuoxiao, et al.
Published: (2023)

Towards Cross-View Point Correspondence in Vision-Language Models
by: Wang, Yipu, et al.
Published: (2025)

Vision-Language Models for Vision Tasks: A Survey
by: Zhang, Jingyi, et al.
Published: (2023)

Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation
by: Xue, Xizhe, et al.
Published: (2025)

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends
by: Liu, Daizong, et al.
Published: (2024)

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model
by: Le, Long, et al.
Published: (2024)

HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language Models
by: Jiang, Songtao, et al.
Published: (2025)

CanViT: Toward Active-Vision Foundation Models
by: Berreby, Yohaï-Eliel, et al.
Published: (2026)

Vision Foundation Models in Remote Sensing: A Survey
by: Lu, Siqi, et al.
Published: (2024)

FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
by: Liu, Yuanqing, et al.
Published: (2026)

A Survey on Efficient Vision-Language Models
by: Shinde, Gaurav, et al.
Published: (2025)

Survey of Multimodal Geospatial Foundation Models: Techniques, Applications, and Challenges
by: Yang, Liling, et al.
Published: (2025)

EVLF-FM: Explainable Vision Language Foundation Model for Medicine
by: Bai, Yang, et al.
Published: (2025)

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)

A Survey on Hallucination in Large Vision-Language Models
by: Liu, Hanchao, et al.
Published: (2024)

DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding
by: Wu, Hao, et al.
Published: (2024)

One for All: Toward Unified Foundation Models for Earth Vision
by: Xiong, Zhitong, et al.
Published: (2024)

Towards a Unified Copernicus Foundation Model for Earth Vision
by: Wang, Yi, et al.
Published: (2025)

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset
by: Huang, Wenhui, et al.
Published: (2026)

A Vision-Language Foundation Model for Leaf Disease Identification
by: Quoc, Khang Nguyen, et al.
Published: (2025)

A Survey on Segment Anything Model (SAM): Vision Foundation Model Meets Prompt Engineering
by: Zhang, Chaoning, et al.
Published: (2023)

Towards Faithful Reasoning in Remote Sensing: A Perceptually-Grounded GeoSpatial Chain-of-Thought for Vision-Language Models
by: Liu, Jiaqi, et al.
Published: (2025)

Dual-Pathway Circuits of Object Hallucination in Vision-Language Models
by: Liu, Jiaxin, et al.
Published: (2026)

Co-Training Vision Language Models for Remote Sensing Multi-task Learning
by: Li, Qingyun, et al.
Published: (2025)

Falcon: A Remote Sensing Vision-Language Foundation Model (Technical Report)
by: Yao, Kelu, et al.
Published: (2025)

FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)

Learnable SMPLify: A Neural Solution for Optimization-Free Human Pose Inverse Kinematics
by: Yang, Yuchen, et al.
Published: (2025)

Implicit Modeling for Transferability Estimation of Vision Foundation Models
by: Zheng, Yaoyan, et al.
Published: (2025)

Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)