Saved in:
| Main Authors: | Li, Jiaqi, Wang, Guangming, Zheng, Shuntian, Ni, Minzhe, Lu, Xiaoman, Ye, Guanghui, Guan, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21078 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding
by: Li, Jiaqi, et al.
Published: (2026)
by: Li, Jiaqi, et al.
Published: (2026)
Why Learn What Physics Already Knows? Realizing Agile mmWave-based Human Pose Estimation via Physics-Guided Preprocessing
by: Zheng, Shuntian, et al.
Published: (2026)
by: Zheng, Shuntian, et al.
Published: (2026)
A Two-Stage Motion-Aware Framework for mmWave-based Human Mesh Recovery
by: Pham, Hoang Hai, et al.
Published: (2026)
by: Pham, Hoang Hai, et al.
Published: (2026)
Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026)
by: Wang, Fengshun, et al.
Published: (2026)
Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language Models
by: Zhu, Yingjie, et al.
Published: (2025)
by: Zhu, Yingjie, et al.
Published: (2025)
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
by: Liu, Yicheng, et al.
Published: (2025)
by: Liu, Yicheng, et al.
Published: (2025)
Towards Completeness: A Generalizable Action Proposal Generator for Zero-Shot Temporal Action Localization
by: Du, Jia-Run, et al.
Published: (2024)
by: Du, Jia-Run, et al.
Published: (2024)
See-Saw Modality Balance: See Gradient, and Sew Impaired Vision-Language Balance to Mitigate Dominant Modality Bias
by: Kwon, JuneHyoung, et al.
Published: (2025)
by: Kwon, JuneHyoung, et al.
Published: (2025)
D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition
by: Pei, Wenjie, et al.
Published: (2023)
by: Pei, Wenjie, et al.
Published: (2023)
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization
by: Lim, Geuntaek, et al.
Published: (2024)
by: Lim, Geuntaek, et al.
Published: (2024)
Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models
by: Atabuzzaman, Md., et al.
Published: (2025)
by: Atabuzzaman, Md., et al.
Published: (2025)
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
by: Tian, Xinyu, et al.
Published: (2025)
by: Tian, Xinyu, et al.
Published: (2025)
STAT: Towards Generalizable Temporal Action Localization
by: Liu, Yangcen, et al.
Published: (2024)
by: Liu, Yangcen, et al.
Published: (2024)
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
by: Han, Feng, et al.
Published: (2026)
by: Han, Feng, et al.
Published: (2026)
Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation
by: Nie, Chang, et al.
Published: (2026)
by: Nie, Chang, et al.
Published: (2026)
Freeze and Reveal: Exposing Modality Bias in Vision-Language Models
by: Kavuri, Vivek Hruday, et al.
Published: (2025)
by: Kavuri, Vivek Hruday, et al.
Published: (2025)
debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias
by: Sasse, Kuleen, et al.
Published: (2024)
by: Sasse, Kuleen, et al.
Published: (2024)
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)
by: Wang, Zhaowei, et al.
Published: (2024)
Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding
by: Liang, Xiaoyu, et al.
Published: (2024)
by: Liang, Xiaoyu, et al.
Published: (2024)
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Jinrui, et al.
Published: (2024)
by: Zhang, Jinrui, et al.
Published: (2024)
ReViP: Mitigating False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance
by: Li, Zhuohao, et al.
Published: (2026)
by: Li, Zhuohao, et al.
Published: (2026)
Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models
by: Imran, Muhammad, et al.
Published: (2025)
by: Imran, Muhammad, et al.
Published: (2025)
Information Router for Mitigating Modality Dominance in Vision-Language Models
by: Kim, Seulgi, et al.
Published: (2026)
by: Kim, Seulgi, et al.
Published: (2026)
Tactile Modality Fusion for Vision-Language-Action Models
by: Morissette, Charlotte, et al.
Published: (2026)
by: Morissette, Charlotte, et al.
Published: (2026)
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
by: Huang, Qidong, et al.
Published: (2024)
by: Huang, Qidong, et al.
Published: (2024)
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
by: GigaBrain Team, et al.
Published: (2025)
by: GigaBrain Team, et al.
Published: (2025)
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
by: Ye, Angen, et al.
Published: (2025)
by: Ye, Angen, et al.
Published: (2025)
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models
by: Szu-Tu, Li-Zhong, et al.
Published: (2025)
by: Szu-Tu, Li-Zhong, et al.
Published: (2025)
Modality Bias in LVLMs: Analyzing and Mitigating Object Hallucination via Attention Lens
by: Zheng, Haohan, et al.
Published: (2025)
by: Zheng, Haohan, et al.
Published: (2025)
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
by: Wang, Jiaqi, et al.
Published: (2024)
by: Wang, Jiaqi, et al.
Published: (2024)
Person Parametric Physics-informed Representation for mmWave-based Human Pose Estimation
by: Zheng, Shuntian, et al.
Published: (2025)
by: Zheng, Shuntian, et al.
Published: (2025)
Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models
by: Han, Chaolei, et al.
Published: (2025)
by: Han, Chaolei, et al.
Published: (2025)
Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images
by: Munia, Nusrat, et al.
Published: (2025)
by: Munia, Nusrat, et al.
Published: (2025)
Semantic Granularity Navigation in Image Editing
by: Lu, Liangsi, et al.
Published: (2026)
by: Lu, Liangsi, et al.
Published: (2026)
Bridge the Modality and Capability Gaps in Vision-Language Model Selection
by: Yi, Chao, et al.
Published: (2024)
by: Yi, Chao, et al.
Published: (2024)
MedCFVQA: A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering
by: Ye, Shuchang, et al.
Published: (2025)
by: Ye, Shuchang, et al.
Published: (2025)
DriveMA: Driving Vision-Language-Action Models with verifiable Meta-Actions
by: Zheng, Weicheng, et al.
Published: (2026)
by: Zheng, Weicheng, et al.
Published: (2026)
MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models
by: Shi, Yang, et al.
Published: (2026)
by: Shi, Yang, et al.
Published: (2026)
Bias Detection and Rotation-Robustness Mitigation in Vision-Language Models and Generative Image Models
by: Mithila, Tarannum
Published: (2026)
by: Mithila, Tarannum
Published: (2026)
Deciphering Functions of Neurons in Vision-Language Models
by: Xu, Jiaqi, et al.
Published: (2025)
by: Xu, Jiaqi, et al.
Published: (2025)
Similar Items
-
Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding
by: Li, Jiaqi, et al.
Published: (2026) -
Why Learn What Physics Already Knows? Realizing Agile mmWave-based Human Pose Estimation via Physics-Guided Preprocessing
by: Zheng, Shuntian, et al.
Published: (2026) -
A Two-Stage Motion-Aware Framework for mmWave-based Human Mesh Recovery
by: Pham, Hoang Hai, et al.
Published: (2026) -
Masked Diffusion Vision-Language Models for Temporal Action Localization
by: Wang, Fengshun, et al.
Published: (2026) -
Beyond the Vision Encoder: Identifying and Mitigating Spatial Bias in Large Vision-Language Models
by: Zhu, Yingjie, et al.
Published: (2025)