Saved in:
| Main Authors: | Gautam, Somraj, Purohit, Nachiketa, Harit, Gaurav |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.20003 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents
by: Gautam, Somraj, et al.
Published: (2026)
by: Gautam, Somraj, et al.
Published: (2026)
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
by: Gautam, Somraj, et al.
Published: (2025)
by: Gautam, Somraj, et al.
Published: (2025)
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
by: Vempati, Shashank, et al.
Published: (2025)
by: Vempati, Shashank, et al.
Published: (2025)
What explains the success of cross-modal fine-tuning with ORCA?
by: García-de-Herreros, Paloma, et al.
Published: (2024)
by: García-de-Herreros, Paloma, et al.
Published: (2024)
ReCoRe: Regularized Contrastive Representation Learning of World Model
by: Poudel, Rudra P. K., et al.
Published: (2023)
by: Poudel, Rudra P. K., et al.
Published: (2023)
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
by: Saha, Pramit, et al.
Published: (2024)
by: Saha, Pramit, et al.
Published: (2024)
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs
by: Deng, Naihao, et al.
Published: (2024)
by: Deng, Naihao, et al.
Published: (2024)
Portable Active Learning for Object Detection
by: Sharma, Rashi, et al.
Published: (2026)
by: Sharma, Rashi, et al.
Published: (2026)
Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
by: Garg, Rahul, et al.
Published: (2024)
by: Garg, Rahul, et al.
Published: (2024)
Text Change Detection in Multilingual Documents Using Image Comparison
by: Park, Doyoung, et al.
Published: (2024)
by: Park, Doyoung, et al.
Published: (2024)
Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
by: Miyai, Atsuyuki, et al.
Published: (2024)
by: Miyai, Atsuyuki, et al.
Published: (2024)
Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
by: Constantinou, Christos, et al.
Published: (2024)
by: Constantinou, Christos, et al.
Published: (2024)
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
by: Wang, Hao, et al.
Published: (2026)
by: Wang, Hao, et al.
Published: (2026)
Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
by: Mei, Jingbiao, et al.
Published: (2025)
by: Mei, Jingbiao, et al.
Published: (2025)
Deep Delta Learning
by: Zhang, Yifan, et al.
Published: (2026)
by: Zhang, Yifan, et al.
Published: (2026)
Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models -
by: Fieback, Laura, et al.
Published: (2025)
by: Fieback, Laura, et al.
Published: (2025)
Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback
by: Xiao, Wenyi, et al.
Published: (2024)
by: Xiao, Wenyi, et al.
Published: (2024)
Advancing Autonomous Vehicle Intelligence: Deep Learning and Multimodal LLM for Traffic Sign Recognition and Robust Lane Detection
by: Sah, Chandan Kumar, et al.
Published: (2025)
by: Sah, Chandan Kumar, et al.
Published: (2025)
Retrospective Learning from Interactions
by: Chen, Zizhao, et al.
Published: (2024)
by: Chen, Zizhao, et al.
Published: (2024)
Learning to Instruct for Visual Instruction Tuning
by: Zhou, Zhihan, et al.
Published: (2025)
by: Zhou, Zhihan, et al.
Published: (2025)
Can Visual Encoder Learn to See Arrows?
by: Terashita, Naoyuki, et al.
Published: (2025)
by: Terashita, Naoyuki, et al.
Published: (2025)
Differentiable Prompt Learning for Vision Language Models
by: Huang, Zhenhan, et al.
Published: (2024)
by: Huang, Zhenhan, et al.
Published: (2024)
MLLMs-Augmented Visual-Language Representation Learning
by: Liu, Yanqing, et al.
Published: (2023)
by: Liu, Yanqing, et al.
Published: (2023)
Impact of Noisy Supervision in Foundation Model Learning
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Learning to Steer: Input-dependent Steering for Multimodal LLMs
by: Parekh, Jayneel, et al.
Published: (2025)
by: Parekh, Jayneel, et al.
Published: (2025)
A Survey of Deep Learning for Geometry Problem Solving
by: Ma, Jianzhe, et al.
Published: (2025)
by: Ma, Jianzhe, et al.
Published: (2025)
Is Pre-training Truly Better Than Meta-Learning?
by: Miranda, Brando, et al.
Published: (2023)
by: Miranda, Brando, et al.
Published: (2023)
SpecPL: Disentangling Spectral Granularity for Prompt Learning
by: Zhou, Jingtao, et al.
Published: (2026)
by: Zhou, Jingtao, et al.
Published: (2026)
Many-Shot In-Context Learning in Multimodal Foundation Models
by: Jiang, Yixing, et al.
Published: (2024)
by: Jiang, Yixing, et al.
Published: (2024)
Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
by: Yang, Hunmin, et al.
Published: (2024)
by: Yang, Hunmin, et al.
Published: (2024)
Towards the Dynamics of a DNN Learning Symbolic Interactions
by: Ren, Qihan, et al.
Published: (2024)
by: Ren, Qihan, et al.
Published: (2024)
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
by: Chen, Yi, et al.
Published: (2025)
by: Chen, Yi, et al.
Published: (2025)
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
by: Wei, Lai, et al.
Published: (2025)
by: Wei, Lai, et al.
Published: (2025)
Analyzing the Roles of Language and Vision in Learning from Limited Data
by: Chen, Allison, et al.
Published: (2024)
by: Chen, Allison, et al.
Published: (2024)
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
by: Xie, Tianchi, et al.
Published: (2024)
by: Xie, Tianchi, et al.
Published: (2024)
DAM: Dynamic Adapter Merging for Continual Video QA Learning
by: Cheng, Feng, et al.
Published: (2024)
by: Cheng, Feng, et al.
Published: (2024)
CoGen: Learning from Feedback with Coupled Comprehension and Generation
by: Gul, Mustafa Omer, et al.
Published: (2024)
by: Gul, Mustafa Omer, et al.
Published: (2024)
ELBA: Learning by Asking for Embodied Visual Navigation and Task Completion
by: Shen, Ying, et al.
Published: (2023)
by: Shen, Ying, et al.
Published: (2023)
Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing
by: Wang, Baode, et al.
Published: (2025)
by: Wang, Baode, et al.
Published: (2025)
CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
by: Carvalho, Miguel, et al.
Published: (2025)
by: Carvalho, Miguel, et al.
Published: (2025)
Similar Items
-
INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents
by: Gautam, Somraj, et al.
Published: (2026) -
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
by: Gautam, Somraj, et al.
Published: (2025) -
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
by: Vempati, Shashank, et al.
Published: (2025) -
What explains the success of cross-modal fine-tuning with ORCA?
by: García-de-Herreros, Paloma, et al.
Published: (2024) -
ReCoRe: Regularized Contrastive Representation Learning of World Model
by: Poudel, Rudra P. K., et al.
Published: (2023)