Saved in:
| Main Authors: | Padhi, Trilok, Kursuncu, Ugur, Kumar, Yaman, Shalin, Valerie L., Fronczek, Lane Peterson |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.03607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19
by: Khandelwal, Vedant, et al.
Published: (2024)
by: Khandelwal, Vedant, et al.
Published: (2024)
Human-Robot Dialogue Annotation for Multi-Modal Common Ground
by: Bonial, Claire, et al.
Published: (2024)
by: Bonial, Claire, et al.
Published: (2024)
SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus
by: Lukin, Stephanie M., et al.
Published: (2024)
by: Lukin, Stephanie M., et al.
Published: (2024)
Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
by: Annese, Luca, et al.
Published: (2025)
by: Annese, Luca, et al.
Published: (2025)
Growing Perspectives: Modelling Embodied Perspective Taking and Inner Narrative Development Using Large Language Models
by: Patania, Sabrina, et al.
Published: (2025)
by: Patania, Sabrina, et al.
Published: (2025)
Learning the meanings of function words from grounded language using a visual question answering model
by: Portelance, Eva, et al.
Published: (2023)
by: Portelance, Eva, et al.
Published: (2023)
Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning
by: Yang, Shan
Published: (2026)
by: Yang, Shan
Published: (2026)
Spatially-Aware Speaker for Vision-and-Language Navigation Instruction Generation
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)
by: Gopinathan, Muraleekrishna, et al.
Published: (2024)
Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark
by: Shah, Nisarg A., et al.
Published: (2025)
by: Shah, Nisarg A., et al.
Published: (2025)
PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active Vision
by: Patania, Sabrina, et al.
Published: (2025)
by: Patania, Sabrina, et al.
Published: (2025)
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
by: Dai, Song, et al.
Published: (2025)
by: Dai, Song, et al.
Published: (2025)
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
by: Tong, Jingqi, et al.
Published: (2025)
by: Tong, Jingqi, et al.
Published: (2025)
ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
by: Bian, Zhipeng, et al.
Published: (2026)
by: Bian, Zhipeng, et al.
Published: (2026)
Emotions in the Loop: A Survey of Affective Computing for Emotional Support
by: Hegde, Karishma, et al.
Published: (2025)
by: Hegde, Karishma, et al.
Published: (2025)
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty?
by: Feng, Yichen, et al.
Published: (2026)
by: Feng, Yichen, et al.
Published: (2026)
Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition
by: Dashore, Arushi, et al.
Published: (2025)
by: Dashore, Arushi, et al.
Published: (2025)
Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models
by: Freitas, Diogo, et al.
Published: (2025)
by: Freitas, Diogo, et al.
Published: (2025)
A Human-Machine Collaboration Framework for the Development of Schemas
by: Isaak, Nicos
Published: (2024)
by: Isaak, Nicos
Published: (2024)
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
by: Ji, Yikun, et al.
Published: (2025)
by: Ji, Yikun, et al.
Published: (2025)
CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge
by: Guo, Willis, et al.
Published: (2024)
by: Guo, Willis, et al.
Published: (2024)
The Epistemic Suite: A Post-Foundational Diagnostic Methodology for Assessing AI Knowledge Claims
by: Kelly, Matthew
Published: (2025)
by: Kelly, Matthew
Published: (2025)
MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation
by: Wang, Han, et al.
Published: (2024)
by: Wang, Han, et al.
Published: (2024)
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
by: Zhang, Zhong, et al.
Published: (2025)
by: Zhang, Zhong, et al.
Published: (2025)
Cross-Lingual Generalization and Compression: From Language-Specific to Shared Neurons
by: Riemenschneider, Frederick, et al.
Published: (2025)
by: Riemenschneider, Frederick, et al.
Published: (2025)
Toward a Dialogue System Using a Large Language Model to Recognize User Emotions with a Camera
by: Tanioka, Hiroki, et al.
Published: (2024)
by: Tanioka, Hiroki, et al.
Published: (2024)
A Pluggable Common Sense-Enhanced Framework for Knowledge Graph Completion
by: Niu, Guanglin, et al.
Published: (2024)
by: Niu, Guanglin, et al.
Published: (2024)
Evaluating Perspectival Biases in Cross-Modal Retrieval
by: Saengsukhiran, Teerapol, et al.
Published: (2025)
by: Saengsukhiran, Teerapol, et al.
Published: (2025)
Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models
by: Portelance, Eva, et al.
Published: (2024)
by: Portelance, Eva, et al.
Published: (2024)
SAGE: A Strategy-Aware Graph-Enhanced Generation Framework For Online Counseling
by: Aharon, Eliya Naomi, et al.
Published: (2026)
by: Aharon, Eliya Naomi, et al.
Published: (2026)
MIRAGE: Scaling Test-Time Inference with Parallel Graph-Retrieval-Augmented Reasoning Chains
by: Wei, Kaiwen, et al.
Published: (2025)
by: Wei, Kaiwen, et al.
Published: (2025)
Incremental Bootstrapping and Classification of Structured Scenes in a Fuzzy Ontology
by: Buoncompagni, Luca, et al.
Published: (2024)
by: Buoncompagni, Luca, et al.
Published: (2024)
Automated Circuit Interpretation via Probe Prompting
by: Birardi, Giuseppe
Published: (2025)
by: Birardi, Giuseppe
Published: (2025)
ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing
by: Bucher, Martin JJ., et al.
Published: (2025)
by: Bucher, Martin JJ., et al.
Published: (2025)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models
by: Balasubramanian, Sriram, et al.
Published: (2025)
by: Balasubramanian, Sriram, et al.
Published: (2025)
From Benchmarking to Reasoning: A Dual-Aspect, Large-Scale Evaluation of LLMs on Vietnamese Legal Text
by: Le, Van-Truong
Published: (2026)
by: Le, Van-Truong
Published: (2026)
VidNum-1.4K: A Comprehensive Benchmark for Video-based Numerical Reasoning
by: Cui, Shaoyang, et al.
Published: (2026)
by: Cui, Shaoyang, et al.
Published: (2026)
Defending against Backdoor Attacks via Module Switching
by: Li, Weijun, et al.
Published: (2025)
by: Li, Weijun, et al.
Published: (2025)
Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
by: Anh, Duy Le Dinh, et al.
Published: (2024)
by: Anh, Duy Le Dinh, et al.
Published: (2024)
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
by: Yang, Dingyi, et al.
Published: (2024)
by: Yang, Dingyi, et al.
Published: (2024)
nuScenes Knowledge Graph -- A comprehensive semantic representation of traffic scenes for trajectory prediction
by: Mlodzian, Leon, et al.
Published: (2023)
by: Mlodzian, Leon, et al.
Published: (2023)
Similar Items
-
A Domain-Agnostic Neurosymbolic Approach for Big Social Data Analysis: Evaluating Mental Health Sentiment on Social Media during COVID-19
by: Khandelwal, Vedant, et al.
Published: (2024) -
Human-Robot Dialogue Annotation for Multi-Modal Common Ground
by: Bonial, Claire, et al.
Published: (2024) -
SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus
by: Lukin, Stephanie M., et al.
Published: (2024) -
Who Sees What? Structured Thought-Action Sequences for Epistemic Reasoning in LLMs
by: Annese, Luca, et al.
Published: (2025) -
Growing Perspectives: Modelling Embodied Perspective Taking and Inner Narrative Development Using Large Language Models
by: Patania, Sabrina, et al.
Published: (2025)