:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Padhi, Trilok, Kursuncu, Ugur, Kumar, Yaman, Shalin, Valerie L., Fronczek, Lane Peterson
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Computers and Society Human-Computer Interaction I.2.7; I.2.10; I.2.4; I.2.1
Online Access:	https://arxiv.org/abs/2402.03607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19
by: Khandelwal, Vedant, et al.
Published: (2024)

Human-Robot Dialogue Annotation for Multi-Modal Common Ground
by: Bonial, Claire, et al.
Published: (2024)

SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus
by: Lukin, Stephanie M., et al.
Published: (2024)

Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
by: Annese, Luca, et al.
Published: (2025)

Growing Perspectives: Modelling Embodied Perspective Taking and Inner Narrative Development Using Large Language Models
by: Patania, Sabrina, et al.
Published: (2025)

Learning the meanings of function words from grounded language using a visual question answering model
by: Portelance, Eva, et al.
Published: (2023)

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
by: Yang, Shan
Published: (2026)

Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)

Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
by: Shah, Nisarg A., et al.
Published: (2025)

PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active Vision
by: Patania, Sabrina, et al.
Published: (2025)

PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
by: Bian, Zhipeng, et al.
Published: (2026)

Emotions in the Loop: A Survey of Affective Computing for Emotional Support
by: Hegde, Karishma, et al.
Published: (2025)

Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?
by: Feng, Yichen, et al.
Published: (2026)

Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition
by: Dashore, Arushi, et al.
Published: (2025)

Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
by: Freitas, Diogo, et al.
Published: (2025)

A Human-Machine Collaboration Framework for the Development of Schemas
by: Isaak, Nicos
Published: (2024)

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
by: Ji, Yikun, et al.
Published: (2025)

CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge
by: Guo, Willis, et al.
Published: (2024)

The Epistemic Suite: A Post-Foundational Diagnostic Methodology for Assessing AI Knowledge Claims
by: Kelly, Matthew
Published: (2025)

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation
by: Wang, Han, et al.
Published: (2024)

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
by: Zhang, Zhong, et al.
Published: (2025)

Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons
by: Riemenschneider, Frederick, et al.
Published: (2025)

Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera
by: Tanioka, Hiroki, et al.
Published: (2024)

A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion
by: Niu, Guanglin, et al.
Published: (2024)

Evaluating Perspectival Biases in Cross-Modal Retrieval
by: Saengsukhiran, Teerapol, et al.
Published: (2025)

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
by: Portelance, Eva, et al.
Published: (2024)

SAGE: A Strategy-Aware Graph-Enhanced Generation Framework For Online Counseling
by: Aharon, Eliya Naomi, et al.
Published: (2026)

MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
by: Wei, Kaiwen, et al.
Published: (2025)

Incremental Bootstrapping and Classification of Structured Scenes in a Fuzzy Ontology
by: Buoncompagni, Luca, et al.
Published: (2024)

Automated Circuit Interpretation via Probe Prompting
by: Birardi, Giuseppe
Published: (2025)

ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing
by: Bucher, Martin JJ., et al.
Published: (2025)

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
by: Balasubramanian, Sriram, et al.
Published: (2025)

From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text
by: Le, Van-Truong
Published: (2026)

VidNum-1.4K: A Comprehensive Benchmark for Video-based Numerical Reasoning
by: Cui, Shaoyang, et al.
Published: (2026)

Defending against Backdoor Attacks via Module Switching
by: Li, Weijun, et al.
Published: (2025)

Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
by: Anh, Duy Le Dinh, et al.
Published: (2024)

What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
by: Yang, Dingyi, et al.
Published: (2024)

nuScenes Knowledge Graph -- A comprehensive semantic representation of traffic scenes for trajectory prediction
by: Mlodzian, Leon, et al.
Published: (2023)