Saved in:
| Main Authors: | Phute, Mansi, Balakrishnan, Ravikumar |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.08521 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2025)
by: Balakrishnan, Ravikumar, et al.
Published: (2025)
VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026)
by: Taioli, Francesco, et al.
Published: (2026)
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
by: Shen, Yucheng, et al.
Published: (2026)
by: Shen, Yucheng, et al.
Published: (2026)
Language Models Can Explain Visual Features via Steering
by: Ferrando, Javier, et al.
Published: (2026)
by: Ferrando, Javier, et al.
Published: (2026)
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
by: Zhang, Yuanhong, et al.
Published: (2026)
by: Zhang, Yuanhong, et al.
Published: (2026)
SynthVision -- Harnessing Minimal Input for Maximal Output in Computer Vision Models using Synthetic Image data
by: Kularathne, Yudara, et al.
Published: (2024)
by: Kularathne, Yudara, et al.
Published: (2024)
Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models
by: Yang, Jiaxi, et al.
Published: (2026)
by: Yang, Jiaxi, et al.
Published: (2026)
Dynamic Multimodal Activation Steering for Hallucination Mitigation in Large Vision-Language Models
by: Yin, Jianghao, et al.
Published: (2026)
by: Yin, Jianghao, et al.
Published: (2026)
Learning to Steer: Input-dependent Steering for Multimodal LLMs
by: Parekh, Jayneel, et al.
Published: (2025)
by: Parekh, Jayneel, et al.
Published: (2025)
Activation Steering Meets Preference Optimization: Defense Against Jailbreaks in Vision Language Models
by: Wu, Sihao, et al.
Published: (2025)
by: Wu, Sihao, et al.
Published: (2025)
Robustness of Vision Language Models Against Split-Image Harmful Input Attacks
by: Rashid, Md Rafi Ur, et al.
Published: (2026)
by: Rashid, Md Rafi Ur, et al.
Published: (2026)
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering
by: Li, Zhuowei, et al.
Published: (2025)
by: Li, Zhuowei, et al.
Published: (2025)
Reading Between the Pixels: Linking Text-Image Embedding Alignment to Typographic Attack Success on Vision-Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2026)
by: Balakrishnan, Ravikumar, et al.
Published: (2026)
3D Gaussian and Diffusion-Based Gaze Redirection
by: Panchalingam, Abiram, et al.
Published: (2025)
by: Panchalingam, Abiram, et al.
Published: (2025)
Single-Input Multi-Output Model Merging: Leveraging Foundation Models for Dense Multi-Task Learning
by: Giraldo, Juan Garcia, et al.
Published: (2025)
by: Giraldo, Juan Garcia, et al.
Published: (2025)
Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models
by: Zou, Zhengtao, et al.
Published: (2025)
by: Zou, Zhengtao, et al.
Published: (2025)
Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering
by: Jana, Soumyadeep, et al.
Published: (2026)
by: Jana, Soumyadeep, et al.
Published: (2026)
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation
by: Sarowar, Md Selim, et al.
Published: (2025)
by: Sarowar, Md Selim, et al.
Published: (2025)
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
by: Liu, Zeyu, et al.
Published: (2026)
by: Liu, Zeyu, et al.
Published: (2026)
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
by: Maeda, Koki, et al.
Published: (2024)
by: Maeda, Koki, et al.
Published: (2024)
Decoding Vision Transformers: the Diffusion Steering Lens
by: Takatsuki, Ryota, et al.
Published: (2025)
by: Takatsuki, Ryota, et al.
Published: (2025)
Skill-Conditioned Visual Geolocation for Vision-Language Models
by: Yang, Chenjie, et al.
Published: (2026)
by: Yang, Chenjie, et al.
Published: (2026)
Generative Visual Communication in the Era of Vision-Language Models
by: Vinker, Yael
Published: (2024)
by: Vinker, Yael
Published: (2024)
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
by: YU, Mark, et al.
Published: (2025)
by: YU, Mark, et al.
Published: (2025)
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
by: Tao, Hongyuan, et al.
Published: (2025)
by: Tao, Hongyuan, et al.
Published: (2025)
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
by: Chae, Hyunsik, et al.
Published: (2025)
by: Chae, Hyunsik, et al.
Published: (2025)
Implicit Visual Bias Mitigation by Posterior Estimate Sharpening of a Bayesian Neural Network
by: Stone, Rebecca S, et al.
Published: (2023)
by: Stone, Rebecca S, et al.
Published: (2023)
Built Environment Reasoning from Remote Sensing Imagery Using Large Vision--Language Models
by: Wang, Dongdong, et al.
Published: (2026)
by: Wang, Dongdong, et al.
Published: (2026)
Vision-Language Introspection: Mitigating Overconfident Hallucinations in MLLMs via Interpretable Bi-Causal Steering
by: Liu, Shuliang, et al.
Published: (2026)
by: Liu, Shuliang, et al.
Published: (2026)
Reasoning under Vision: Understanding Visual-Spatial Cognition in Vision-Language Models for CAPTCHA
by: Song, Python, et al.
Published: (2025)
by: Song, Python, et al.
Published: (2025)
Fine-Tuning Vision-Language Models for Visual Navigation Assistance
by: Li, Xiao, et al.
Published: (2025)
by: Li, Xiao, et al.
Published: (2025)
The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models
by: Ong, Kenneth J. K.
Published: (2026)
by: Ong, Kenneth J. K.
Published: (2026)
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
by: Babaiee, Zahra, et al.
Published: (2025)
by: Babaiee, Zahra, et al.
Published: (2025)
When Visuals Aren't the Problem: Evaluating Vision-Language Models on Misleading Data Visualizations
by: Lalai, Harsh Nishant, et al.
Published: (2026)
by: Lalai, Harsh Nishant, et al.
Published: (2026)
Mitigating the Reasoning Tax in Vision-Language Fine-Tuning with Input-Adaptive Depth Aggregation
by: Ren, Yiming, et al.
Published: (2026)
by: Ren, Yiming, et al.
Published: (2026)
Reducing Hallucinations in Vision-Language Models via Latent Space Steering
by: Liu, Sheng, et al.
Published: (2024)
by: Liu, Sheng, et al.
Published: (2024)
Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation
by: Choi, YoungChan, et al.
Published: (2025)
by: Choi, YoungChan, et al.
Published: (2025)
Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
by: Kim, Sohee, et al.
Published: (2025)
by: Kim, Sohee, et al.
Published: (2025)
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models
by: Góral, Gracjan, et al.
Published: (2025)
by: Góral, Gracjan, et al.
Published: (2025)
Similar Items
-
VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
by: Balakrishnan, Ravikumar, et al.
Published: (2025) -
VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation
by: Taioli, Francesco, et al.
Published: (2026) -
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
by: Shen, Yucheng, et al.
Published: (2026) -
Language Models Can Explain Visual Features via Steering
by: Ferrando, Javier, et al.
Published: (2026) -
Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
by: Zhang, Yuanhong, et al.
Published: (2026)