:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ballout, Mohamad, Jassim, Serwan, Bruni, Elia
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2507.16572
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs
by: Mayer, Julius, et al.
Published: (2025)

GRASP: A novel benchmark for evaluating language GRounding And Situated Physics understanding in multimodal language models
by: Jassim, Serwan, et al.
Published: (2023)

Transformer Tafsir at QIAS 2025 Shared Task: Hybrid Retrieval-Augmented Generation for Islamic Knowledge Question Answering
by: Ahmad, Muhammad Abu, et al.
Published: (2025)

From Form(s) to Meaning: Probing the Semantic Depths of Language Models Using Multisense Consistency
by: Ohmer, Xenia, et al.
Published: (2024)

Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights
by: Ballout, Mohamad, et al.
Published: (2024)

Can you SPLICE it together? A Human Curated Benchmark for Probing Visual Reasoning in VLMs
by: Ballout, Mohamad, et al.
Published: (2025)

Show Me How It's Done: The Role of Explanations in Fine-Tuning Language Models
by: Ballout, Mohamad, et al.
Published: (2024)

Interpretability of Language Models via Task Spaces
by: Weber, Lucas, et al.
Published: (2024)

Enhancing SLM via ChatGPT and Dataset Augmentation
by: Pieper, Tom, et al.
Published: (2024)

Pixology: Probing the Linguistic and Visual Capabilities of Pixel-based Language Models
by: Tatariya, Kushal, et al.
Published: (2024)

Humor in Pixels: Benchmarking Large Multimodal Models Understanding of Online Comics
by: Ryan, Yuriel, et al.
Published: (2025)

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
by: Wang, Haochen, et al.
Published: (2025)

Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction
by: Wicke, Philipp
Published: (2024)

Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
by: Sun, Kaiser, et al.
Published: (2026)

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
by: Li, Xiujun, et al.
Published: (2023)

Multilingual Pretraining for Pixel Language Models
by: Kesen, Ilker, et al.
Published: (2025)

Do Multimodal Large Language Models Understand Welding?
by: Khvatskii, Grigorii, et al.
Published: (2025)

PIXAR: Auto-Regressive Language Modeling in Pixel Space
by: Tai, Yintao, et al.
Published: (2024)

From Reasoning to Pixels: Benchmarking the Alignment Gap in Unified Multimodal Models
by: Yang, Cheng, et al.
Published: (2026)

Probing Multimodal Large Language Models for Global and Local Semantic Representations
by: Tao, Mingxu, et al.
Published: (2024)

Evaluating Multimodal Large Language Models on Spoken Sarcasm Understanding
by: Li, Zhu, et al.
Published: (2025)

Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation
by: Huang, Jen-tse, et al.
Published: (2026)

MIND Your Reasoning: A Meta-Cognitive Intuitive-Reflective Network for Dual-Reasoning in Multimodal Stance Detection
by: Wang, Bingbing, et al.
Published: (2025)

Can Large Vision-Language Models Understand Multimodal Sarcasm?
by: Wang, Xinyu, et al.
Published: (2025)

Model-Dowser: Data-Free Importance Probing to Mitigate Catastrophic Forgetting in Multimodal Large Language Models
by: Hwang, Hyeontaek, et al.
Published: (2026)

MIXAR: Scaling Autoregressive Pixel-based Language Models to Multiple Languages and Scripts
by: Hu, Chen, et al.
Published: (2026)

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
by: Hyun, Lee, et al.
Published: (2023)

EmoVerse: Exploring Multimodal Large Language Models for Sentiment and Emotion Understanding
by: Li, Ao, et al.
Published: (2024)

CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
by: Go, Dongyoung, et al.
Published: (2024)

Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains
by: Paniv, Yurii, et al.
Published: (2024)

MINED: Probing and Updating with Multimodal Time-Sensitive Knowledge for Large Multimodal Models
by: Jiang, Kailin, et al.
Published: (2025)

Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts
by: Ying, Jiahao, et al.
Published: (2023)

Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability
by: Huang, Fan, et al.
Published: (2026)

Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies
by: Cekinel, Recep Firat, et al.
Published: (2024)

On Pre-training of Multimodal Language Models Customized for Chart Understanding
by: Fan, Wan-Cyuan, et al.
Published: (2024)

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
by: Chen, Zixin, et al.
Published: (2025)

From Text to Pixel: Advancing Long-Context Understanding in MLLMs
by: Lu, Yujie, et al.
Published: (2024)

Evaluating Pixel Language Models on Non-Standardized Languages
by: Muñoz-Ortiz, Alberto, et al.
Published: (2024)

Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding
by: Li, Yun, et al.
Published: (2025)

Tolerance Principle and Small Language Model Learning
by: Friedman, Adam E., et al.
Published: (2026)