Saved in:
| Main Authors: | Jang, Jihyoung, Bae, Minwook, Kim, Minji, Hakkani-Tur, Dilek, Kim, Hyounghun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.00421 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
by: Jang, Jihyoung, et al.
Published: (2026)
by: Jang, Jihyoung, et al.
Published: (2026)
MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
by: Dongre, Vardhan, et al.
Published: (2025)
by: Dongre, Vardhan, et al.
Published: (2025)
Mixed-Session Conversation with Egocentric Memory
by: Jang, Jihyoung, et al.
Published: (2024)
by: Jang, Jihyoung, et al.
Published: (2024)
Collective Critics for Creative Story Generation
by: Bae, Minwook, et al.
Published: (2024)
by: Bae, Minwook, et al.
Published: (2024)
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
by: Kim, Donghoon, et al.
Published: (2025)
by: Kim, Donghoon, et al.
Published: (2025)
ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images
by: Kim, Sangwook, et al.
Published: (2025)
by: Kim, Sangwook, et al.
Published: (2025)
Towards Conversational Medical AI with Eyes, Ears and a Voice
by: Shah, Meet, et al.
Published: (2026)
by: Shah, Meet, et al.
Published: (2026)
Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation
by: Kim, Yunsoo, et al.
Published: (2025)
by: Kim, Yunsoo, et al.
Published: (2025)
Simulating User Agents for Embodied Conversational-AI
by: Philipov, Daniel, et al.
Published: (2024)
by: Philipov, Daniel, et al.
Published: (2024)
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
by: Kim, Jungeun, et al.
Published: (2024)
by: Kim, Jungeun, et al.
Published: (2024)
ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
by: Kim, Minchan, et al.
Published: (2024)
by: Kim, Minchan, et al.
Published: (2024)
Goal Alignment in LLM-Based User Simulators for Conversational AI
by: Mehri, Shuhaib, et al.
Published: (2025)
by: Mehri, Shuhaib, et al.
Published: (2025)
Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling
by: Dey, Suvodip, et al.
Published: (2025)
by: Dey, Suvodip, et al.
Published: (2025)
Question Generation for Assessing Early Literacy Reading Comprehension
by: Yang, Xiaocheng, et al.
Published: (2025)
by: Yang, Xiaocheng, et al.
Published: (2025)
ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting
by: Mishra, Abhijit, et al.
Published: (2025)
by: Mishra, Abhijit, et al.
Published: (2025)
Jailbreaking Multimodal Large Language Models using Multi-Clip Video
by: Kang, Choongwon, et al.
Published: (2026)
by: Kang, Choongwon, et al.
Published: (2026)
Revealing the Inherent Instructability of Pre-Trained Language Models
by: An, Seokhyun, et al.
Published: (2024)
by: An, Seokhyun, et al.
Published: (2024)
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
by: Zhang, Haonan, et al.
Published: (2025)
by: Zhang, Haonan, et al.
Published: (2025)
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025)
by: Chung, Jiwan, et al.
Published: (2025)
Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns
by: Kim, Yunsoo, et al.
Published: (2024)
by: Kim, Yunsoo, et al.
Published: (2024)
LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)
by: Kim, Seonok
Published: (2026)
CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections
by: Schneider, Florian, et al.
Published: (2025)
by: Schneider, Florian, et al.
Published: (2025)
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
by: Huang, Haoyu, et al.
Published: (2026)
by: Huang, Haoyu, et al.
Published: (2026)
Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer
by: Kim, Jaeyoung, et al.
Published: (2025)
by: Kim, Jaeyoung, et al.
Published: (2025)
Do LLMs Encode Functional Importance of Reasoning Tokens?
by: Singh, Janvijay, et al.
Published: (2026)
by: Singh, Janvijay, et al.
Published: (2026)
Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
by: Agarwal, Ishika, et al.
Published: (2025)
by: Agarwal, Ishika, et al.
Published: (2025)
Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue
by: Dongre, Vardhan, et al.
Published: (2026)
by: Dongre, Vardhan, et al.
Published: (2026)
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
by: Kim, Minji, et al.
Published: (2025)
by: Kim, Minji, et al.
Published: (2025)
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
by: Yuan, Yifei, et al.
Published: (2024)
by: Yuan, Yifei, et al.
Published: (2024)
Toward Multimodal Conversational AI for Age-Related Macular Degeneration
by: Gu, Ran, et al.
Published: (2026)
by: Gu, Ran, et al.
Published: (2026)
Dialog Flow Induction for Constrainable LLM-Based Chatbots
by: Agrawal, Stuti, et al.
Published: (2024)
by: Agrawal, Stuti, et al.
Published: (2024)
From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems
by: Rabbani, Parisa, et al.
Published: (2025)
by: Rabbani, Parisa, et al.
Published: (2025)
ReIn: Conversational Error Recovery with Reasoning Inception
by: Kim, Takyoung, et al.
Published: (2026)
by: Kim, Takyoung, et al.
Published: (2026)
Evaluating Multimodal Generative AI with Korean Educational Standards
by: Park, Sanghee, et al.
Published: (2025)
by: Park, Sanghee, et al.
Published: (2025)
VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
by: Waheed, Abdul, et al.
Published: (2025)
by: Waheed, Abdul, et al.
Published: (2025)
LLMs Behind the Scenes: Enabling Narrative Scene Illustration
by: Roemmele, Melissa, et al.
Published: (2025)
by: Roemmele, Melissa, et al.
Published: (2025)
User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
by: Hao, Yuren, et al.
Published: (2026)
by: Hao, Yuren, et al.
Published: (2026)
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)
by: Guo, Minghao, et al.
Published: (2026)
Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
by: Kim, Mingyeong, et al.
Published: (2026)
by: Kim, Mingyeong, et al.
Published: (2026)
DepthFocus: Controllable Depth Estimation for See-Through Scenes
by: Min, Junhong, et al.
Published: (2025)
by: Min, Junhong, et al.
Published: (2025)
Similar Items
-
AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
by: Jang, Jihyoung, et al.
Published: (2026) -
MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
by: Dongre, Vardhan, et al.
Published: (2025) -
Mixed-Session Conversation with Egocentric Memory
by: Jang, Jihyoung, et al.
Published: (2024) -
Collective Critics for Creative Story Generation
by: Bae, Minwook, et al.
Published: (2024) -
Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
by: Kim, Donghoon, et al.
Published: (2025)