Saved in:
| Main Authors: | Bohus, Dan, Andrist, Sean, Bao, Yuwei, Horvitz, Eric, Paradiso, Ann |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.10525 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
by: Bohus, Dan, et al.
Published: (2025)
by: Bohus, Dan, et al.
Published: (2025)
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
by: Bohus, Dan, et al.
Published: (2024)
by: Bohus, Dan, et al.
Published: (2024)
Memory-Centric Embodied Question Answering
by: Zhai, Mingliang, et al.
Published: (2025)
by: Zhai, Mingliang, et al.
Published: (2025)
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
by: He, Zheqi, et al.
Published: (2024)
by: He, Zheqi, et al.
Published: (2024)
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
by: Zhang, Haoyu, et al.
Published: (2024)
by: Zhang, Haoyu, et al.
Published: (2024)
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
by: Zhang, Hanlei, et al.
Published: (2025)
by: Zhang, Hanlei, et al.
Published: (2025)
Towards Better Text-to-Image Generation Alignment via Attention Modulation
by: Wu, Yihang, et al.
Published: (2024)
by: Wu, Yihang, et al.
Published: (2024)
OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs
by: Yan, Qianqi, et al.
Published: (2026)
by: Yan, Qianqi, et al.
Published: (2026)
Multi-Domain Audio Question Answering Benchmark Toward Acoustic Content Reasoning
by: Yang, Chao-Han Huck, et al.
Published: (2025)
by: Yang, Chao-Han Huck, et al.
Published: (2025)
PediatricsMQA: a Multi-modal Pediatrics Question Answering Benchmark
by: Bahaj, Adil, et al.
Published: (2025)
by: Bahaj, Adil, et al.
Published: (2025)
MultiMedEdit: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA
by: Wen, Shengtao, et al.
Published: (2025)
by: Wen, Shengtao, et al.
Published: (2025)
PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment
by: Song, Shezheng, et al.
Published: (2024)
by: Song, Shezheng, et al.
Published: (2024)
Knowledge-Guided Dynamic Modality Attention Fusion Framework for Multimodal Sentiment Analysis
by: Feng, Xinyu, et al.
Published: (2024)
by: Feng, Xinyu, et al.
Published: (2024)
FineFake: A Knowledge-Enriched Dataset for Fine-Grained Multi-Domain Fake News Detection
by: Zhou, Ziyi, et al.
Published: (2024)
by: Zhou, Ziyi, et al.
Published: (2024)
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities
by: Chen, Lichang, et al.
Published: (2024)
by: Chen, Lichang, et al.
Published: (2024)
Enhancing Multimodal Affective Analysis with Learned Live Comment Features
by: Deng, Zhaoyuan, et al.
Published: (2024)
by: Deng, Zhaoyuan, et al.
Published: (2024)
SemEval-2024 Task 3: Multimodal Emotion Cause Analysis in Conversations
by: Wang, Fanfan, et al.
Published: (2024)
by: Wang, Fanfan, et al.
Published: (2024)
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding
by: Lu, Jinghui, et al.
Published: (2024)
by: Lu, Jinghui, et al.
Published: (2024)
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances
by: Zhang, Hanlei, et al.
Published: (2024)
by: Zhang, Hanlei, et al.
Published: (2024)
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction
by: Luo, Wen, et al.
Published: (2024)
by: Luo, Wen, et al.
Published: (2024)
Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?
by: Liu, Shuo, et al.
Published: (2025)
by: Liu, Shuo, et al.
Published: (2025)
LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition
by: Zhou, Qianrui, et al.
Published: (2025)
by: Zhou, Qianrui, et al.
Published: (2025)
SlideTailor: Personalized Presentation Slide Generation for Scientific Papers
by: Zeng, Wenzheng, et al.
Published: (2025)
by: Zeng, Wenzheng, et al.
Published: (2025)
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature
by: Ren, Yiming, et al.
Published: (2026)
by: Ren, Yiming, et al.
Published: (2026)
Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum
by: Yang, Xinglong, et al.
Published: (2025)
by: Yang, Xinglong, et al.
Published: (2025)
Cardiverse: Harnessing LLMs for Novel Card Game Prototyping
by: Li, Danrui, et al.
Published: (2025)
by: Li, Danrui, et al.
Published: (2025)
Retrieval-Augmented Generation for Electrocardiogram-Language Models
by: Song, Xiaoyu, et al.
Published: (2025)
by: Song, Xiaoyu, et al.
Published: (2025)
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
by: Cheng, Fenghua, et al.
Published: (2025)
by: Cheng, Fenghua, et al.
Published: (2025)
A Survey on Image-text Multimodal Models
by: Guo, Ruifeng, et al.
Published: (2023)
by: Guo, Ruifeng, et al.
Published: (2023)
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
by: Lu, Jinghui, et al.
Published: (2025)
by: Lu, Jinghui, et al.
Published: (2025)
Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing
by: Wu, Zichen, et al.
Published: (2025)
by: Wu, Zichen, et al.
Published: (2025)
A Survey of Generative Categories and Techniques in Multimodal Generative Models
by: Han, Longzhen, et al.
Published: (2025)
by: Han, Longzhen, et al.
Published: (2025)
History-Guided Iterative Visual Reasoning with Self-Correction
by: Yang, Xinglong, et al.
Published: (2026)
by: Yang, Xinglong, et al.
Published: (2026)
MultimodalHugs: Enabling Sign Language Processing in Hugging Face
by: Sant, Gerard, et al.
Published: (2025)
by: Sant, Gerard, et al.
Published: (2025)
Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis
by: Meng, Chunlei, et al.
Published: (2026)
by: Meng, Chunlei, et al.
Published: (2026)
A Multimodal Framework for Explainable Evaluation of Soft Skills in Educational Environments
by: Guerrero-Sosa, Jared D. T., et al.
Published: (2025)
by: Guerrero-Sosa, Jared D. T., et al.
Published: (2025)
A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task
by: Toyooka, Mashiro, et al.
Published: (2025)
by: Toyooka, Mashiro, et al.
Published: (2025)
Interpretable Multimodal Misinformation Detection with Logic Reasoning
by: Liu, Hui, et al.
Published: (2023)
by: Liu, Hui, et al.
Published: (2023)
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond
by: Wei, Tianxin, et al.
Published: (2024)
by: Wei, Tianxin, et al.
Published: (2024)
Seeing Culture: A Benchmark for Visual Reasoning and Grounding
by: Satar, Burak, et al.
Published: (2025)
by: Satar, Burak, et al.
Published: (2025)
Similar Items
-
SigmaCollab: An Application-Driven Dataset for Physically Situated Collaboration
by: Bohus, Dan, et al.
Published: (2025) -
SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research
by: Bohus, Dan, et al.
Published: (2024) -
Memory-Centric Embodied Question Answering
by: Zhai, Mingliang, et al.
Published: (2025) -
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
by: He, Zheqi, et al.
Published: (2024) -
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
by: Zhang, Haoyu, et al.
Published: (2024)