:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gautam, Somraj, Purohit, Nachiketa, Harit, Gaurav
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2509.20003
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents
by: Gautam, Somraj, et al.
Published: (2026)

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
by: Gautam, Somraj, et al.
Published: (2025)

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
by: Vempati, Shashank, et al.
Published: (2025)

What explains the success of cross-modal fine-tuning with ORCA?
by: García-de-Herreros, Paloma, et al.
Published: (2024)

ReCoRe: Regularized Contrastive Representation Learning of World Model
by: Poudel, Rudra P. K., et al.
Published: (2023)

Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
by: Saha, Pramit, et al.
Published: (2024)

Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
by: Deng, Naihao, et al.
Published: (2024)

Portable Active Learning for Object Detection
by: Sharma, Rashi, et al.
Published: (2026)

Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
by: Garg, Rahul, et al.
Published: (2024)

Text Change Detection in Multilingual Documents Using Image Comparison
by: Park, Doyoung, et al.
Published: (2024)

Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
by: Miyai, Atsuyuki, et al.
Published: (2024)

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
by: Constantinou, Christos, et al.
Published: (2024)

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)

Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
by: Mei, Jingbiao, et al.
Published: (2025)

Deep Delta Learning
by: Zhang, Yifan, et al.
Published: (2026)

Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -
by: Fieback, Laura, et al.
Published: (2025)

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
by: Xiao, Wenyi, et al.
Published: (2024)

Advancing Autonomous Vehicle Intelligence: Deep Learning and Multimodal LLM for Traffic Sign Recognition and Robust Lane Detection
by: Sah, Chandan Kumar, et al.
Published: (2025)

Retrospective Learning from Interactions
by: Chen, Zizhao, et al.
Published: (2024)

Learning to Instruct for Visual Instruction Tuning
by: Zhou, Zhihan, et al.
Published: (2025)

Can Visual Encoder Learn to See Arrows?
by: Terashita, Naoyuki, et al.
Published: (2025)

Differentiable Prompt Learning for Vision Language Models
by: Huang, Zhenhan, et al.
Published: (2024)

MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)

Impact of Noisy Supervision in Foundation Model Learning
by: Chen, Hao, et al.
Published: (2024)

Learning to Steer: Input-dependent Steering for Multimodal LLMs
by: Parekh, Jayneel, et al.
Published: (2025)

A Survey of Deep Learning for Geometry Problem Solving
by: Ma, Jianzhe, et al.
Published: (2025)

Is Pre-training Truly Better Than Meta-Learning?
by: Miranda, Brando, et al.
Published: (2023)

SpecPL: Disentangling Spectral Granularity for Prompt Learning
by: Zhou, Jingtao, et al.
Published: (2026)

Many-Shot In-Context Learning in Multimodal Foundation Models
by: Jiang, Yixing, et al.
Published: (2024)

Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
by: Yang, Hunmin, et al.
Published: (2024)

Towards the Dynamics of a DNN Learning Symbolic Interactions
by: Ren, Qihan, et al.
Published: (2024)

GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)

Analyzing the Roles of Language and Vision in Learning from Limited Data
by: Chen, Allison, et al.
Published: (2024)

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
by: Xie, Tianchi, et al.
Published: (2024)

DAM: Dynamic Adapter Merging for Continual Video QA Learning
by: Cheng, Feng, et al.
Published: (2024)

CoGen: Learning from Feedback with Coupled Comprehension and Generation
by: Gul, Mustafa Omer, et al.
Published: (2024)

ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion
by: Shen, Ying, et al.
Published: (2023)

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)

CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025)