Saved in:
| Main Authors: | Liao, Jiacheng, Qian, Feng, Fan, Ziyin, Guo, Yongjian |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.00998 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
by: Wang, Haiyu, et al.
Published: (2026)
by: Wang, Haiyu, et al.
Published: (2026)
A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving
by: Long, Keke, et al.
Published: (2025)
by: Long, Keke, et al.
Published: (2025)
Text-Guided Video Masked Autoencoder
by: Fan, David, et al.
Published: (2024)
by: Fan, David, et al.
Published: (2024)
Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models
by: Choi, In Chong, et al.
Published: (2026)
by: Choi, In Chong, et al.
Published: (2026)
Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models
by: Sun, Yuhao, et al.
Published: (2026)
by: Sun, Yuhao, et al.
Published: (2026)
Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?
by: Liao, Yuan-Hong, et al.
Published: (2024)
by: Liao, Yuan-Hong, et al.
Published: (2024)
Beyond Low-Rank Tuning: Model Prior-Guided Rank Allocation for Effective Transfer in Low-Data and Large-Gap Regimes
by: Zhang, Chuyan, et al.
Published: (2025)
by: Zhang, Chuyan, et al.
Published: (2025)
Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification
by: Wang, Zhongqi, et al.
Published: (2025)
by: Wang, Zhongqi, et al.
Published: (2025)
MVT: Mask-Grounded Vision-Language Models for Taxonomy-Aligned Land-Cover Tagging
by: Chen, Siyi, et al.
Published: (2025)
by: Chen, Siyi, et al.
Published: (2025)
GeoGround: A Unified Large Vision-Language Model for Remote Sensing Visual Grounding
by: Zhou, Yue, et al.
Published: (2024)
by: Zhou, Yue, et al.
Published: (2024)
Data-efficient Event Camera Pre-training via Disentangled Masked Modeling
by: Huang, Zhenpeng, et al.
Published: (2024)
by: Huang, Zhenpeng, et al.
Published: (2024)
PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models
by: Zhang, Yongjian, et al.
Published: (2025)
by: Zhang, Yongjian, et al.
Published: (2025)
Low-Rank Few-Shot Adaptation of Vision-Language Models
by: Zanella, Maxime, et al.
Published: (2024)
by: Zanella, Maxime, et al.
Published: (2024)
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models
by: Lu, Jiaying, et al.
Published: (2023)
by: Lu, Jiaying, et al.
Published: (2023)
Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
by: Wen, Wen, et al.
Published: (2025)
by: Wen, Wen, et al.
Published: (2025)
Are Large Vision-Language Models Ready to Guide Blind and Low-Vision Individuals?
by: Kim, Eunki, et al.
Published: (2025)
by: Kim, Eunki, et al.
Published: (2025)
Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models
by: Xu, Longwei, et al.
Published: (2026)
by: Xu, Longwei, et al.
Published: (2026)
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
by: Monsefi, Amin Karimi, et al.
Published: (2024)
by: Monsefi, Amin Karimi, et al.
Published: (2024)
Show and Guide: Instructional-Plan Grounded Vision and Language Model
by: Glória-Silva, Diogo, et al.
Published: (2024)
by: Glória-Silva, Diogo, et al.
Published: (2024)
Grounding and Enhancing Grid-based Models for Neural Fields
by: Zhao, Zelin, et al.
Published: (2024)
by: Zhao, Zelin, et al.
Published: (2024)
Curing Semantic Drift: A Dynamic Approach to Grounding Generation in Large Vision-Language Models
by: Chen, Jiahe, et al.
Published: (2025)
by: Chen, Jiahe, et al.
Published: (2025)
LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
by: Su, Yuchen, et al.
Published: (2025)
by: Su, Yuchen, et al.
Published: (2025)
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
by: Khan, Md Azim, et al.
Published: (2025)
by: Khan, Md Azim, et al.
Published: (2025)
When Language Model Guides Vision: Grounding DINO for Cattle Muzzle Detection
by: Dulal, Rabin, et al.
Published: (2025)
by: Dulal, Rabin, et al.
Published: (2025)
Q-Ground: Image Quality Grounding with Large Multi-modality Models
by: Chen, Chaofeng, et al.
Published: (2024)
by: Chen, Chaofeng, et al.
Published: (2024)
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
by: Zou, Jialv, et al.
Published: (2024)
by: Zou, Jialv, et al.
Published: (2024)
Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
by: Yang, Mingkun, et al.
Published: (2024)
by: Yang, Mingkun, et al.
Published: (2024)
Breaking the Low-Rank Dilemma of Linear Attention
by: Fan, Qihang, et al.
Published: (2024)
by: Fan, Qihang, et al.
Published: (2024)
GMT: Guided Mask Transformer for Leaf Instance Segmentation
by: Chen, Feng, et al.
Published: (2024)
by: Chen, Feng, et al.
Published: (2024)
Vision Graph Prompting via Semantic Low-Rank Decomposition
by: Ai, Zixiang, et al.
Published: (2025)
by: Ai, Zixiang, et al.
Published: (2025)
Collaborative Low-Rank Adaptation for Pre-Trained Vision Transformers
by: Liu, Zheng, et al.
Published: (2025)
by: Liu, Zheng, et al.
Published: (2025)
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
by: Jin, Can, et al.
Published: (2025)
by: Jin, Can, et al.
Published: (2025)
LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models
by: Kojima, Yuto, et al.
Published: (2025)
by: Kojima, Yuto, et al.
Published: (2025)
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
by: Chen, Qizhou, et al.
Published: (2024)
by: Chen, Qizhou, et al.
Published: (2024)
FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Language Models
by: Li, Haoyang, et al.
Published: (2026)
by: Li, Haoyang, et al.
Published: (2026)
Mask Grounding for Referring Image Segmentation
by: Chng, Yong Xien, et al.
Published: (2023)
by: Chng, Yong Xien, et al.
Published: (2023)
SIGMA: Sinkhorn-Guided Masked Video Modeling
by: Salehi, Mohammadreza, et al.
Published: (2024)
by: Salehi, Mohammadreza, et al.
Published: (2024)
Compact Model Training by Low-Rank Projection with Energy Transfer
by: Guo, Kailing, et al.
Published: (2022)
by: Guo, Kailing, et al.
Published: (2022)
Contrastive Masked Autoencoders are Stronger Vision Learners
by: Huang, Zhicheng, et al.
Published: (2022)
by: Huang, Zhicheng, et al.
Published: (2022)
TALDS-Net: Task-Aware Adaptive Local Descriptors Selection for Few-shot Image Classification
by: Qiao, Qian, et al.
Published: (2023)
by: Qiao, Qian, et al.
Published: (2023)
Similar Items
-
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
by: Wang, Haiyu, et al.
Published: (2026) -
A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving
by: Long, Keke, et al.
Published: (2025) -
Text-Guided Video Masked Autoencoder
by: Fan, David, et al.
Published: (2024) -
Multi-Turn Adaptive Prompting Attack on Large Vision-Language Models
by: Choi, In Chong, et al.
Published: (2026) -
Let's Roll a BiFTA: Bi-refinement for Fine-grained Text-visual Alignment in Vision-Language Models
by: Sun, Yuhao, et al.
Published: (2026)