Saved in:
| Main Authors: | Zou, Xiaohan, Kang, Jian, Kesidis, George, Lin, Lu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.13095 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TechING: Towards Real World Technical Image Understanding via VLMs
by: Nadeem, Tafazzul, et al.
Published: (2026)
by: Nadeem, Tafazzul, et al.
Published: (2026)
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
by: Chen, Menglan, et al.
Published: (2025)
by: Chen, Menglan, et al.
Published: (2025)
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
by: Liang, Yijun, et al.
Published: (2025)
by: Liang, Yijun, et al.
Published: (2025)
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024)
by: Li, Shuo, et al.
Published: (2024)
On the Perception Bottleneck of VLMs for Chart Understanding
by: Liu, Junteng, et al.
Published: (2025)
by: Liu, Junteng, et al.
Published: (2025)
Leveraging NTPs for Efficient Hallucination Detection in VLMs
by: Azachi, Ofir, et al.
Published: (2025)
by: Azachi, Ofir, et al.
Published: (2025)
Sensitivity of Generative VLMs to Semantically and Lexically Altered Prompts
by: Dumpala, Sri Harsha, et al.
Published: (2024)
by: Dumpala, Sri Harsha, et al.
Published: (2024)
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
by: Xu, Xiao, et al.
Published: (2025)
by: Xu, Xiao, et al.
Published: (2025)
Bidirectional Long-Range Parser for Sequential Data Understanding
by: Leotescu, George, et al.
Published: (2024)
by: Leotescu, George, et al.
Published: (2024)
Robustness of Structured Data Extraction from Perspectively Distorted Documents
by: Nakada, Hyakka, et al.
Published: (2025)
by: Nakada, Hyakka, et al.
Published: (2025)
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
by: Park, Simon, et al.
Published: (2025)
by: Park, Simon, et al.
Published: (2025)
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
by: Nasiriany, Soroush, et al.
Published: (2024)
by: Nasiriany, Soroush, et al.
Published: (2024)
PerLA: Perceptive 3D Language Assistant
by: Mei, Guofeng, et al.
Published: (2024)
by: Mei, Guofeng, et al.
Published: (2024)
Can World Models Benefit VLMs for World Dynamics?
by: Zhang, Kevin, et al.
Published: (2025)
by: Zhang, Kevin, et al.
Published: (2025)
Unraveling the Truth: Do VLMs really Understand Charts? A Deep Dive into Consistency and Robustness
by: Mukhopadhyay, Srija, et al.
Published: (2024)
by: Mukhopadhyay, Srija, et al.
Published: (2024)
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
by: Pan, Zhiyu, et al.
Published: (2026)
by: Pan, Zhiyu, et al.
Published: (2026)
VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety
by: Palaskar, Shruti, et al.
Published: (2025)
by: Palaskar, Shruti, et al.
Published: (2025)
Temporal Preference Optimization for Long-Form Video Understanding
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models
by: Karamcheti, Siddharth, et al.
Published: (2024)
by: Karamcheti, Siddharth, et al.
Published: (2024)
Fine-tuning MLLMs Without Forgetting Is Easier Than You Think
by: Li, He, et al.
Published: (2026)
by: Li, He, et al.
Published: (2026)
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
by: Daxberger, Erik, et al.
Published: (2025)
by: Daxberger, Erik, et al.
Published: (2025)
Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs
by: Wang, Xiyao, et al.
Published: (2025)
by: Wang, Xiyao, et al.
Published: (2025)
Toward Inherently Robust VLMs Against Visual Perception Attacks
by: MohajerAnsari, Pedram, et al.
Published: (2025)
by: MohajerAnsari, Pedram, et al.
Published: (2025)
Towards Efficient Vision-Language Tuning: More Information Density, More Generalizability
by: Hao, Tianxiang, et al.
Published: (2023)
by: Hao, Tianxiang, et al.
Published: (2023)
Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models
by: Liu, Zhining, et al.
Published: (2026)
by: Liu, Zhining, et al.
Published: (2026)
DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry
by: Wu, Changti, et al.
Published: (2025)
by: Wu, Changti, et al.
Published: (2025)
Evaluating and Advancing Multimodal Large Language Models in Perception Ability Lens
by: Chen, Feng, et al.
Published: (2024)
by: Chen, Feng, et al.
Published: (2024)
Improving Language Understanding from Screenshots
by: Gao, Tianyu, et al.
Published: (2024)
by: Gao, Tianyu, et al.
Published: (2024)
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
by: Liao, Yuan-Hong, et al.
Published: (2025)
by: Liao, Yuan-Hong, et al.
Published: (2025)
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
by: Ding, Yi, et al.
Published: (2024)
by: Ding, Yi, et al.
Published: (2024)
VisMin: Visual Minimal-Change Understanding
by: Awal, Rabiul, et al.
Published: (2024)
by: Awal, Rabiul, et al.
Published: (2024)
CIVET: Systematic Evaluation of Understanding in VLMs
by: Rizzoli, Massimo, et al.
Published: (2025)
by: Rizzoli, Massimo, et al.
Published: (2025)
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
by: Wang, Xiao, et al.
Published: (2024)
by: Wang, Xiao, et al.
Published: (2024)
MLLM-as-a-Judge for Image Safety without Human Labeling
by: Wang, Zhenting, et al.
Published: (2024)
by: Wang, Zhenting, et al.
Published: (2024)
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
by: Heyward, Joseph, et al.
Published: (2024)
by: Heyward, Joseph, et al.
Published: (2024)
Multimodal Information Fusion for Chart Understanding: A Survey of MLLMs -- Evolution, Limitations, and Cognitive Enhancement
by: Yi, Zhihang, et al.
Published: (2026)
by: Yi, Zhihang, et al.
Published: (2026)
DocAtlas: Multilingual Document Understanding Across 80+ Languages
by: Heakl, Ahmed, et al.
Published: (2026)
by: Heakl, Ahmed, et al.
Published: (2026)
Symbiotic-MoE: Unlocking the Synergy between Generation and Understanding
by: Liu, Xiangyue, et al.
Published: (2026)
by: Liu, Xiangyue, et al.
Published: (2026)
Similar Items
-
TechING: Towards Real World Technical Image Understanding via VLMs
by: Nadeem, Tafazzul, et al.
Published: (2026) -
VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
by: Chen, Menglan, et al.
Published: (2025) -
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
by: Liang, Yijun, et al.
Published: (2025) -
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
by: Li, Shuo, et al.
Published: (2024) -
On the Perception Bottleneck of VLMs for Chart Understanding
by: Liu, Junteng, et al.
Published: (2025)