:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jang, Jihyoung, Bae, Minwook, Kim, Minji, Hakkani-Tur, Dilek, Kim, Hyounghun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.00421
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AQuA: Toward Strategic Response Generation for Ambiguous Visual Questions
by: Jang, Jihyoung, et al.
Published: (2026)

MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
by: Dongre, Vardhan, et al.
Published: (2025)

Mixed-Session Conversation with Egocentric Memory
by: Jang, Jihyoung, et al.
Published: (2024)

Collective Critics for Creative Story Generation
by: Bae, Minwook, et al.
Published: (2024)

Visually Guided Decoding: Gradient-Free Hard Prompt Inversion with Language Models
by: Kim, Donghoon, et al.
Published: (2025)

ChatEXAONEPath: An Expert-level Multimodal Large Language Model for Histopathology Using Whole Slide Images
by: Kim, Sangwook, et al.
Published: (2025)

Towards Conversational Medical AI with Eyes, Ears and a Voice
by: Shah, Meet, et al.
Published: (2026)

Look & Mark: Leveraging Radiologist Eye Fixations and Bounding boxes in Multimodal Large Language Models for Chest X-ray Report Generation
by: Kim, Yunsoo, et al.
Published: (2025)

Simulating User Agents for Embodied Conversational-AI
by: Philipov, Daniel, et al.
Published: (2024)

Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
by: Kim, Jungeun, et al.
Published: (2024)

ESREAL: Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
by: Kim, Minchan, et al.
Published: (2024)

Goal Alignment in LLM-Based User Simulators for Conversational AI
by: Mehri, Shuhaib, et al.
Published: (2025)

Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling
by: Dey, Suvodip, et al.
Published: (2025)

Question Generation for Assessing Early Literacy Reading Comprehension
by: Yang, Xiaocheng, et al.
Published: (2025)

ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting
by: Mishra, Abhijit, et al.
Published: (2025)

Jailbreaking Multimodal Large Language Models using Multi-Clip Video
by: Kang, Choongwon, et al.
Published: (2026)

Revealing the Inherent Instructability of Pre-Trained Language Models
by: An, Seokhyun, et al.
Published: (2024)

OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction
by: Zhang, Haonan, et al.
Published: (2025)

v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
by: Chung, Jiwan, et al.
Published: (2025)

Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns
by: Kim, Yunsoo, et al.
Published: (2024)

LITTA: Late-Interaction and Test-Time Alignment for Visually-Grounded Multimodal Retrieval
by: Kim, Seonok
Published: (2026)

CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections
by: Schneider, Florian, et al.
Published: (2025)

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning
by: Huang, Haoyu, et al.
Published: (2026)

Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer
by: Kim, Jaeyoung, et al.
Published: (2025)

Do LLMs Encode Functional Importance of Reasoning Tokens?
by: Singh, Janvijay, et al.
Published: (2026)

Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
by: Agarwal, Ishika, et al.
Published: (2025)

Embodied Multi-Agent Coordination by Aligning World Models Through Dialogue
by: Dongre, Vardhan, et al.
Published: (2026)

Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
by: Kim, Minji, et al.
Published: (2025)

Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
by: Yuan, Yifei, et al.
Published: (2024)

Toward Multimodal Conversational AI for Age-Related Macular Degeneration
by: Gu, Ran, et al.
Published: (2026)

Dialog Flow Induction for Constrainable LLM-Based Chatbots
by: Agrawal, Stuti, et al.
Published: (2024)

From Fact to Judgment: Investigating the Impact of Task Framing on LLM Conviction in Dialogue Systems
by: Rabbani, Parisa, et al.
Published: (2025)

ReIn: Conversational Error Recovery with Reasoning Inception
by: Kim, Takyoung, et al.
Published: (2026)

Evaluating Multimodal Generative AI with Korean Educational Standards
by: Park, Sanghee, et al.
Published: (2025)

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding
by: Waheed, Abdul, et al.
Published: (2025)

LLMs Behind the Scenes: Enabling Narrative Scene Illustration
by: Roemmele, Melissa, et al.
Published: (2025)

User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction
by: Hao, Yuren, et al.
Published: (2026)

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
by: Guo, Minghao, et al.
Published: (2026)

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
by: Kim, Mingyeong, et al.
Published: (2026)

DepthFocus: Controllable Depth Estimation for See-Through Scenes
by: Min, Junhong, et al.
Published: (2025)