Saved in:
| Main Authors: | Zhou, Yajing, Kong, Xiangyu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.18194 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment
by: Jia, Ziheng, et al.
Published: (2025)
by: Jia, Ziheng, et al.
Published: (2025)
Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
by: Yang, Haobo, et al.
Published: (2025)
by: Yang, Haobo, et al.
Published: (2025)
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
by: Zhu, Zhiyu, et al.
Published: (2025)
by: Zhu, Zhiyu, et al.
Published: (2025)
PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis
by: Xu, Jiao, et al.
Published: (2026)
by: Xu, Jiao, et al.
Published: (2026)
Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation
by: Liu, Che, et al.
Published: (2026)
by: Liu, Che, et al.
Published: (2026)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy
by: Li, Hong, et al.
Published: (2024)
by: Li, Hong, et al.
Published: (2024)
Thinking with Patterns: Breaking the Perceptual Bottleneck in Visual Planning via Pattern Induction
by: Jian, Yichang, et al.
Published: (2026)
by: Jian, Yichang, et al.
Published: (2026)
Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion
by: Kang, Xueyang, et al.
Published: (2025)
by: Kang, Xueyang, et al.
Published: (2025)
A Two-Stage Multi-Modal MRI Framework for Lifespan Brain Age Prediction
by: Zhang, Dingyi, et al.
Published: (2026)
by: Zhang, Dingyi, et al.
Published: (2026)
Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
by: Wang, Ying, et al.
Published: (2023)
by: Wang, Ying, et al.
Published: (2023)
Decoupling Stability and Plasticity for Multi-Modal Test-Time Adaptation
by: He, Yongbo, et al.
Published: (2026)
by: He, Yongbo, et al.
Published: (2026)
ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
by: Zhou, Jingqi, et al.
Published: (2024)
by: Zhou, Jingqi, et al.
Published: (2024)
The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design
by: Liu, Anjie, et al.
Published: (2026)
by: Liu, Anjie, et al.
Published: (2026)
Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
by: Chen, Zhawnen, et al.
Published: (2024)
by: Chen, Zhawnen, et al.
Published: (2024)
Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task
by: Jiang, Yanbei, et al.
Published: (2025)
by: Jiang, Yanbei, et al.
Published: (2025)
Perceptual Quality-based Model Training under Annotator Label Uncertainty
by: Zhou, Chen, et al.
Published: (2024)
by: Zhou, Chen, et al.
Published: (2024)
The Cartesian Shortcut: Re-evaluate Vision Reasoning in Polar Coordinate Space
by: Hu, Xia, et al.
Published: (2026)
by: Hu, Xia, et al.
Published: (2026)
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
by: Fan, Xianzhe, et al.
Published: (2025)
by: Fan, Xianzhe, et al.
Published: (2025)
Beyond the First Read: AI-Assisted Perceptual Error Detection in Chest Radiography Accounting for Interobserver Variability
by: Vutukuri, Adhrith, et al.
Published: (2025)
by: Vutukuri, Adhrith, et al.
Published: (2025)
MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck
by: Wen, Liangjian, et al.
Published: (2024)
by: Wen, Liangjian, et al.
Published: (2024)
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
by: Yang, Yuncong, et al.
Published: (2025)
by: Yang, Yuncong, et al.
Published: (2025)
SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models
by: Zhao, Ruosen, et al.
Published: (2025)
by: Zhao, Ruosen, et al.
Published: (2025)
PSA-VLM: Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
by: Liu, Zhendong, et al.
Published: (2024)
by: Liu, Zhendong, et al.
Published: (2024)
MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification
by: Wang, Chunjiang, et al.
Published: (2025)
by: Wang, Chunjiang, et al.
Published: (2025)
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
by: Feng, Sicheng, et al.
Published: (2025)
by: Feng, Sicheng, et al.
Published: (2025)
Beyond Cross-Modal Alignment: Measuring and Leveraging Modality Gap in Vision-Language Models
by: Yan, Hanqi, et al.
Published: (2025)
by: Yan, Hanqi, et al.
Published: (2025)
M3D-BFS: a Multi-stage Dynamic Fusion Strategy for Sample-Adaptive Multi-Modal Brain Network Analysis
by: Dong, Rui, et al.
Published: (2026)
by: Dong, Rui, et al.
Published: (2026)
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
by: Jia, Pengyue, et al.
Published: (2024)
by: Jia, Pengyue, et al.
Published: (2024)
EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos
by: Li, Yuxuan, et al.
Published: (2025)
by: Li, Yuxuan, et al.
Published: (2025)
Beyond CNNs: Efficient Fine-Tuning of Multi-Modal LLMs for Object Detection on Low-Data Regimes
by: Elamon, Nirmal, et al.
Published: (2025)
by: Elamon, Nirmal, et al.
Published: (2025)
Feature Learning with Multi-Stage Vision Transformers on Inter-Modality HER2 Status Scoring and Tumor Classification on Whole Slides
by: Oyelade, Olaide N., et al.
Published: (2025)
by: Oyelade, Olaide N., et al.
Published: (2025)
HSCP: A Two-Stage Spectral Clustering Framework for Resource-Constrained UAV Identification
by: Wang, Maoyu, et al.
Published: (2025)
by: Wang, Maoyu, et al.
Published: (2025)
Probing Perceptual Constancy in Large Vision-Language Models
by: Sun, Haoran, et al.
Published: (2025)
by: Sun, Haoran, et al.
Published: (2025)
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
by: Shi, Haojun, et al.
Published: (2024)
by: Shi, Haojun, et al.
Published: (2024)
MVBoost: Boost 3D Reconstruction with Multi-View Refinement
by: Liu, Xiangyu, et al.
Published: (2024)
by: Liu, Xiangyu, et al.
Published: (2024)
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
by: Tien, Dong Nguyen, et al.
Published: (2025)
by: Tien, Dong Nguyen, et al.
Published: (2025)
The Illusion of Clinical Reasoning: A Benchmark Reveals the Pervasive Gap in Vision-Language Models for Clinical Competency
by: Wang, Dingyu, et al.
Published: (2025)
by: Wang, Dingyu, et al.
Published: (2025)
TinyAlign: Boosting Lightweight Vision-Language Models by Mitigating Modal Alignment Bottlenecks
by: Hu, Yuanze, et al.
Published: (2025)
by: Hu, Yuanze, et al.
Published: (2025)
Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured Sampling
by: Fu, Xiaolong, et al.
Published: (2025)
by: Fu, Xiaolong, et al.
Published: (2025)
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)
by: Cai, Minghong, et al.
Published: (2024)
Similar Items
-
Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment
by: Jia, Ziheng, et al.
Published: (2025) -
Leveraging Geometric Visual Illusions as Perceptual Inductive Biases for Vision Models
by: Yang, Haobo, et al.
Published: (2025) -
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
by: Zhu, Zhiyu, et al.
Published: (2025) -
PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis
by: Xu, Jiao, et al.
Published: (2026) -
Boosting Omni-Modal Language Models: Staged Post-Training with Visually Debiased Evaluation
by: Liu, Che, et al.
Published: (2026)