Saved in:
| Main Authors: | Das, Trishanu, Nandy, Abhilash, Bajaj, Khush, S, Deepiha |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.01340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
by: Nandy, Abhilash, et al.
Published: (2024)
by: Nandy, Abhilash, et al.
Published: (2024)
Known Intents, New Combinations: Clause-Factorized Decoding for Compositional Multi-Intent Detection
by: Nandy, Abhilash
Published: (2026)
by: Nandy, Abhilash
Published: (2026)
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses
by: Sarti, Gabriele, et al.
Published: (2024)
by: Sarti, Gabriele, et al.
Published: (2024)
Leveraging Large Language Models for Predictive Analysis of Human Misery
by: Seal, Bishanka, et al.
Published: (2025)
by: Seal, Bishanka, et al.
Published: (2025)
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs
by: Muppidi, Ananth, et al.
Published: (2025)
by: Muppidi, Ananth, et al.
Published: (2025)
Graph Fusion Across Languages using Large Language Models
by: Kyaw, Kaung Myat, et al.
Published: (2026)
by: Kyaw, Kaung Myat, et al.
Published: (2026)
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
by: Kapadnis, Manav Nitin, et al.
Published: (2024)
by: Kapadnis, Manav Nitin, et al.
Published: (2024)
Order-Based Pre-training Strategies for Procedural Text Understanding
by: Nandy, Abhilash, et al.
Published: (2024)
by: Nandy, Abhilash, et al.
Published: (2024)
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
by: Roy, Aniruddha, et al.
Published: (2025)
by: Roy, Aniruddha, et al.
Published: (2025)
VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models
by: Ren, Yufan, et al.
Published: (2025)
by: Ren, Yufan, et al.
Published: (2025)
White-box Multimodal Jailbreaks Against Large Vision-Language Models
by: Wang, Ruofan, et al.
Published: (2024)
by: Wang, Ruofan, et al.
Published: (2024)
Creating a Lens of Chinese Culture: A Multimodal Dataset for Chinese Pun Rebus Art Understanding
by: Zhang, Tuo, et al.
Published: (2024)
by: Zhang, Tuo, et al.
Published: (2024)
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint
by: Lee, Heekyung, et al.
Published: (2025)
by: Lee, Heekyung, et al.
Published: (2025)
TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning
by: Liu, Daixian, et al.
Published: (2026)
by: Liu, Daixian, et al.
Published: (2026)
Hidden Meanings in Plain Sight: RebusBench for Evaluating Cognitive Visual Reasoning
by: Kasaei, Seyed Amir, et al.
Published: (2026)
by: Kasaei, Seyed Amir, et al.
Published: (2026)
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
by: Li, Hengzhi, et al.
Published: (2025)
by: Li, Hengzhi, et al.
Published: (2025)
A Pointer Network-based Approach for Joint Extraction and Detection of Multi-Label Multi-Class Intents
by: Mullick, Ankan, et al.
Published: (2024)
by: Mullick, Ankan, et al.
Published: (2024)
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
by: Wang, Shengkang, et al.
Published: (2024)
by: Wang, Shengkang, et al.
Published: (2024)
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
by: Jiang, Chaoya, et al.
Published: (2023)
by: Jiang, Chaoya, et al.
Published: (2023)
Language Modeling with Learned Meta-Tokens
by: Shah, Alok N., et al.
Published: (2025)
by: Shah, Alok N., et al.
Published: (2025)
M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks
by: Schneider, Florian, et al.
Published: (2024)
by: Schneider, Florian, et al.
Published: (2024)
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
by: Xia, Peng, et al.
Published: (2024)
by: Xia, Peng, et al.
Published: (2024)
Reasoning or Pattern Matching? Probing Large Vision-Language Models with Visual Puzzles
by: Lymperaiou, Maria, et al.
Published: (2026)
by: Lymperaiou, Maria, et al.
Published: (2026)
A conceptual framework for ideology beyond the left and right
by: Joseph, Kenneth, et al.
Published: (2026)
by: Joseph, Kenneth, et al.
Published: (2026)
EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
by: Das, Rocktim Jyoti, et al.
Published: (2024)
by: Das, Rocktim Jyoti, et al.
Published: (2024)
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning
by: Ghosal, Deepanway, et al.
Published: (2024)
by: Ghosal, Deepanway, et al.
Published: (2024)
PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving
by: Zhang, Zeyu, et al.
Published: (2025)
by: Zhang, Zeyu, et al.
Published: (2025)
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
by: Qin, Zhenyue, et al.
Published: (2024)
by: Qin, Zhenyue, et al.
Published: (2024)
PuzzlePlex: Benchmarking Foundation Models on Reasoning and Planning with Puzzles
by: Long, Yitao, et al.
Published: (2025)
by: Long, Yitao, et al.
Published: (2025)
MAGIC: Multimodal Alignment & Grounding-aware Instruction Coreset for Vision-Language Models
by: Biswas, Shristi Das, et al.
Published: (2026)
by: Biswas, Shristi Das, et al.
Published: (2026)
The Invalsi Benchmarks: measuring Linguistic and Mathematical understanding of Large Language Models in Italian
by: Puccetti, Giovanni, et al.
Published: (2024)
by: Puccetti, Giovanni, et al.
Published: (2024)
Benchmarking Linguistic Diversity of Large Language Models
by: Guo, Yanzhu, et al.
Published: (2024)
by: Guo, Yanzhu, et al.
Published: (2024)
SKIPNet: Spatial Attention Skip Connections for Enhanced Brain Tumor Classification
by: Mendiratta, Khush, et al.
Published: (2024)
by: Mendiratta, Khush, et al.
Published: (2024)
Rebus Immigrazione
by: Roberto Marinucci
Published: (2017)
by: Roberto Marinucci
Published: (2017)
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
by: Yuan, Zhengqing, et al.
Published: (2023)
by: Yuan, Zhengqing, et al.
Published: (2023)
Benchmarking Content-Based Puzzle Solvers on Corrupted Jigsaw Puzzles
by: Dirauf, Richard, et al.
Published: (2025)
by: Dirauf, Richard, et al.
Published: (2025)
ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling
by: Maina, Hernán, et al.
Published: (2025)
by: Maina, Hernán, et al.
Published: (2025)
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
by: Zeng, Yu, et al.
Published: (2026)
by: Zeng, Yu, et al.
Published: (2026)
VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation
by: Sajib, Rakib Hossain, et al.
Published: (2026)
by: Sajib, Rakib Hossain, et al.
Published: (2026)
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
by: Ren, Xiyu, et al.
Published: (2026)
by: Ren, Xiyu, et al.
Published: (2026)
Similar Items
-
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models
by: Nandy, Abhilash, et al.
Published: (2024) -
Known Intents, New Combinations: Clause-Factorized Decoding for Compositional Multi-Intent Detection
by: Nandy, Abhilash
Published: (2026) -
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses
by: Sarti, Gabriele, et al.
Published: (2024) -
Leveraging Large Language Models for Predictive Analysis of Human Misery
by: Seal, Bishanka, et al.
Published: (2025) -
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs
by: Muppidi, Ananth, et al.
Published: (2025)