Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Das, Trishanu, Nandy, Abhilash, Bajaj, Khush, S, Deepiha
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Computation and Language
Online Access:	https://arxiv.org/abs/2511.01340
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Understanding Rebus Puzzles (Rebus Puzzles use pictures, symbols, and letters to represent words or phrases creatively) requires a variety of skills such as image recognition, cognitive skills, commonsense reasoning, multi-step reasoning, image-based wordplay, etc., making this a challenging task for even current Vision-Language Models. In this paper, we present $\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$, a large and diverse benchmark of $1,333$ English Rebus Puzzles containing different artistic styles and levels of difficulty, spread across 18 categories such as food, idioms, sports, finance, entertainment, etc. We also propose $RebusDescProgICE$, a model-agnostic framework which uses a combination of an unstructured description and code-based, structured reasoning, along with better, reasoning-based in-context example selection, improving the performance of Vision-Language Models on $\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$ by $2.1-4.1\%$ and $20-30\%$ using closed-source and open-source models respectively compared to Chain-of-Thought Reasoning.

Similar Items