:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bangde, Yashwant Pravinrao, Roy, Debaditya
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.25809
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Temporal Action Segmentation via Constraint-Aware Decoding
by: Ee, Yeo Keat, et al.
Published: (2026)

MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models
by: Biswas, Shristi Das, et al.
Published: (2026)

Predicting the Next Action by Modeling the Abstract Goal
by: Roy, Debaditya, et al.
Published: (2022)

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
by: Wang, Xintong, et al.
Published: (2024)

Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
by: Jaiswal, Shantanu, et al.
Published: (2024)

Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
by: Rajendiran, Ramanathan, et al.
Published: (2025)

Interaction Region Visual Transformer for Egocentric Action Anticipation
by: Roy, Debaditya, et al.
Published: (2022)

Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
by: Verma, Dhruv, et al.
Published: (2024)

Modelling Spatio-Temporal Interactions For Compositional Action Recognition
by: Rajendiran, Ramanathan, et al.
Published: (2023)

Show and Guide: Instructional-Plan Grounded Vision and Language Model
by: Glória-Silva, Diogo, et al.
Published: (2024)

GLaRE: A Graph-based Landmark Region Embedding Network for Emotion Recognition
by: Maji, Debasis, et al.
Published: (2025)

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
by: Cheng, An-Chieh, et al.
Published: (2024)

Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning
by: Busaranuvong, Palawat, et al.
Published: (2026)

SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
by: Park, Woohyeon, et al.
Published: (2025)

Can Vision Transformers with ResNet's Global Features Fairly Authenticate Demographic Faces?
by: Sufian, Abu, et al.
Published: (2025)

Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models
by: Zhao, Jianfei, et al.
Published: (2025)

How Reasoning Influences Intersectional Biases in Vision Language Models
by: Desai, Adit, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)

Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
by: Jin, Hyundong, et al.
Published: (2025)

StreamSense: Streaming Social Task Detection with Selective Vision-Language Model Routing
by: Wang, Han, et al.
Published: (2026)

Temporal Contrastive Learning for Video Temporal Reasoning in Large Vision-Language Models
by: Souza, Rafael, et al.
Published: (2024)

Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)

Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
by: Won, John, et al.
Published: (2025)

Context-Aware Pesticide Recommendation via Few-Shot Pest Recognition for Precision Agriculture
by: Ghosh, Anirudha, et al.
Published: (2026)

Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding
by: Mahdavi, Zahra, et al.
Published: (2025)

Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models
by: Gröpl, Marcel, et al.
Published: (2026)

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
by: Ko, Dohwan, et al.
Published: (2025)

Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models
by: Lee, Yi-Lun, et al.
Published: (2024)

VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
by: Wang, Jiaqi, et al.
Published: (2024)

DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation Models
by: Sufian, Abu, et al.
Published: (2025)

Generating Key Postures of Bharatanatyam Adavus with Pose Estimation
by: Kamble, Jagadish Kashinath, et al.
Published: (2026)

ViTCN: Vision Transformer Contrastive Network For Reasoning
by: Song, Bo, et al.
Published: (2024)

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)

SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models
by: Xia, Yuxuan, et al.
Published: (2026)

Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation
by: Bose, Sarosij, et al.
Published: (2025)

Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)

Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding
by: Yoon, Hee Suk, et al.
Published: (2026)

C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning
by: Ma, Ji, et al.
Published: (2024)

Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
by: Pang, Yuqi, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)
by: Manevich, Avshalom, et al.
Published: (2024)