Saved in:
| Main Authors: | Bangde, Yashwant Pravinrao, Roy, Debaditya |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.25809 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Temporal Action Segmentation via Constraint-Aware Decoding
by: Ee, Yeo Keat, et al.
Published: (2026)
by: Ee, Yeo Keat, et al.
Published: (2026)
MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models
by: Biswas, Shristi Das, et al.
Published: (2026)
by: Biswas, Shristi Das, et al.
Published: (2026)
Predicting the Next Action by Modeling the Abstract Goal
by: Roy, Debaditya, et al.
Published: (2022)
by: Roy, Debaditya, et al.
Published: (2022)
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
by: Wang, Xintong, et al.
Published: (2024)
by: Wang, Xintong, et al.
Published: (2024)
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
by: Jaiswal, Shantanu, et al.
Published: (2024)
by: Jaiswal, Shantanu, et al.
Published: (2024)
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
by: Rajendiran, Ramanathan, et al.
Published: (2025)
by: Rajendiran, Ramanathan, et al.
Published: (2025)
Interaction Region Visual Transformer for Egocentric Action Anticipation
by: Roy, Debaditya, et al.
Published: (2022)
by: Roy, Debaditya, et al.
Published: (2022)
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
by: Verma, Dhruv, et al.
Published: (2024)
by: Verma, Dhruv, et al.
Published: (2024)
Modelling Spatio-Temporal Interactions For Compositional Action Recognition
by: Rajendiran, Ramanathan, et al.
Published: (2023)
by: Rajendiran, Ramanathan, et al.
Published: (2023)
Show and Guide: Instructional-Plan Grounded Vision and Language Model
by: Glória-Silva, Diogo, et al.
Published: (2024)
by: Glória-Silva, Diogo, et al.
Published: (2024)
GLaRE: A Graph-based Landmark Region Embedding Network for Emotion Recognition
by: Maji, Debasis, et al.
Published: (2025)
by: Maji, Debasis, et al.
Published: (2025)
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models
by: Cheng, An-Chieh, et al.
Published: (2024)
by: Cheng, An-Chieh, et al.
Published: (2024)
Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning
by: Busaranuvong, Palawat, et al.
Published: (2026)
by: Busaranuvong, Palawat, et al.
Published: (2026)
SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding
by: Park, Woohyeon, et al.
Published: (2025)
by: Park, Woohyeon, et al.
Published: (2025)
Can Vision Transformers with ResNet's Global Features Fairly Authenticate Demographic Faces?
by: Sufian, Abu, et al.
Published: (2025)
by: Sufian, Abu, et al.
Published: (2025)
Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models
by: Zhao, Jianfei, et al.
Published: (2025)
by: Zhao, Jianfei, et al.
Published: (2025)
How Reasoning Influences Intersectional Biases in Vision Language Models
by: Desai, Adit, et al.
Published: (2025)
by: Desai, Adit, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Instruction-Grounded Visual Projectors for Continual Learning of Generative Vision-Language Models
by: Jin, Hyundong, et al.
Published: (2025)
by: Jin, Hyundong, et al.
Published: (2025)
StreamSense: Streaming Social Task Detection with Selective Vision-Language Model Routing
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Temporal Contrastive Learning for Video Temporal Reasoning in Large Vision-Language Models
by: Souza, Rafael, et al.
Published: (2024)
by: Souza, Rafael, et al.
Published: (2024)
Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models
by: Zhang, Jialiang, et al.
Published: (2026)
by: Zhang, Jialiang, et al.
Published: (2026)
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
by: Won, John, et al.
Published: (2025)
by: Won, John, et al.
Published: (2025)
Context-Aware Pesticide Recommendation via Few-Shot Pest Recognition for Precision Agriculture
by: Ghosh, Anirudha, et al.
Published: (2026)
by: Ghosh, Anirudha, et al.
Published: (2026)
Med-VCD: Mitigating Hallucination for Medical Large Vision Language Models through Visual Contrastive Decoding
by: Mahdavi, Zahra, et al.
Published: (2025)
by: Mahdavi, Zahra, et al.
Published: (2025)
Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models
by: Gröpl, Marcel, et al.
Published: (2026)
by: Gröpl, Marcel, et al.
Published: (2026)
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
by: Ko, Dohwan, et al.
Published: (2025)
by: Ko, Dohwan, et al.
Published: (2025)
Delve into Visual Contrastive Decoding for Hallucination Mitigation of Large Vision-Language Models
by: Lee, Yi-Lun, et al.
Published: (2024)
by: Lee, Yi-Lun, et al.
Published: (2024)
VaLiD: Mitigating the Hallucination of Large Vision Language Models by Visual Layer Fusion Contrastive Decoding
by: Wang, Jiaqi, et al.
Published: (2024)
by: Wang, Jiaqi, et al.
Published: (2024)
DemoBias: An Empirical Study to Trace Demographic Biases in Vision Foundation Models
by: Sufian, Abu, et al.
Published: (2025)
by: Sufian, Abu, et al.
Published: (2025)
Generating Key Postures of Bharatanatyam Adavus with Pose Estimation
by: Kamble, Jagadish Kashinath, et al.
Published: (2026)
by: Kamble, Jagadish Kashinath, et al.
Published: (2026)
ViTCN: Vision Transformer Contrastive Network For Reasoning
by: Song, Bo, et al.
Published: (2024)
by: Song, Bo, et al.
Published: (2024)
iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)
by: Yao, Manyi, et al.
Published: (2025)
SDCD: Structure-Disrupted Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models
by: Xia, Yuxuan, et al.
Published: (2026)
by: Xia, Yuxuan, et al.
Published: (2026)
Visual Alignment of Medical Vision-Language Models for Grounded Radiology Report Generation
by: Bose, Sarosij, et al.
Published: (2025)
by: Bose, Sarosij, et al.
Published: (2025)
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding
by: Huang, Yanxiang, et al.
Published: (2026)
by: Huang, Yanxiang, et al.
Published: (2026)
Decomposed On-Policy Distillation for Vision-Language Reasoning: Steering Gradients for Visual Grounding
by: Yoon, Hee Suk, et al.
Published: (2026)
by: Yoon, Hee Suk, et al.
Published: (2026)
C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning
by: Ma, Ji, et al.
Published: (2024)
by: Ma, Ji, et al.
Published: (2024)
Language Models Can See Better: Visual Contrastive Decoding For LLM Multimodal Reasoning
by: Pang, Yuqi, et al.
Published: (2025)
by: Pang, Yuqi, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)
by: Manevich, Avshalom, et al.
Published: (2024)
by: Manevich, Avshalom, et al.
Published: (2024)
Similar Items
-
Improving Temporal Action Segmentation via Constraint-Aware Decoding
by: Ee, Yeo Keat, et al.
Published: (2026) -
MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models
by: Biswas, Shristi Das, et al.
Published: (2026) -
Predicting the Next Action by Modeling the Abstract Goal
by: Roy, Debaditya, et al.
Published: (2022) -
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding
by: Wang, Xintong, et al.
Published: (2024) -
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
by: Jaiswal, Shantanu, et al.
Published: (2024)