Saved in:
| Main Authors: | Chiyah-Garcia, Javier, Suglia, Alessandro, Eshghi, Arash |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.14247 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
by: Wu, Guande, et al.
Published: (2024)
by: Wu, Guande, et al.
Published: (2024)
"Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews
by: Wan, Ruyuan, et al.
Published: (2026)
by: Wan, Ruyuan, et al.
Published: (2026)
Low-code LLM: Graphical User Interface over Large Language Models
by: Cai, Yuzhe, et al.
Published: (2023)
by: Cai, Yuzhe, et al.
Published: (2023)
VisEval: A Benchmark for Data Visualization in the Era of Large Language Models
by: Chen, Nan, et al.
Published: (2024)
by: Chen, Nan, et al.
Published: (2024)
EmoHarbor: Evaluating Personalized Emotional Support by Simulating the User's Internal World
by: Ye, Jing, et al.
Published: (2026)
by: Ye, Jing, et al.
Published: (2026)
Evaluation of a Sign Language Avatar on Comprehensibility, User Experience \& Acceptability
by: Wasserroth, Fenya, et al.
Published: (2025)
by: Wasserroth, Fenya, et al.
Published: (2025)
Word Synchronization Challenge: A Benchmark for Word Association Responses for Large Language Models
by: Cazalets, Tanguy, et al.
Published: (2025)
by: Cazalets, Tanguy, et al.
Published: (2025)
Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study
by: Adhikary, Prottay Kumar, et al.
Published: (2024)
by: Adhikary, Prottay Kumar, et al.
Published: (2024)
SAUCE: Synchronous and Asynchronous User-Customizable Environment for Multi-Agent LLM Interaction
by: Neuberger, Shlomo, et al.
Published: (2024)
by: Neuberger, Shlomo, et al.
Published: (2024)
AEQ-Bench: Measuring Empathy of Omni-Modal Large Models
by: Luo, Xuan, et al.
Published: (2026)
by: Luo, Xuan, et al.
Published: (2026)
Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks
by: Yunusov, Sarfaroz, et al.
Published: (2025)
by: Yunusov, Sarfaroz, et al.
Published: (2025)
One Agent Too Many: User Perspectives on Approaches to Multi-agent Conversational AI
by: Clarke, Christopher, et al.
Published: (2024)
by: Clarke, Christopher, et al.
Published: (2024)
Understand User Opinions of Large Language Models via LLM-Powered In-the-Moment User Experience Interviews
by: Liu, Mengqiao, et al.
Published: (2025)
by: Liu, Mengqiao, et al.
Published: (2025)
K-QA: A Real-World Medical Q&A Benchmark
by: Manes, Itay, et al.
Published: (2024)
by: Manes, Itay, et al.
Published: (2024)
Many Ways to Be Fake: Benchmarking Fake News Detection Under Strategy-Driven AI Generation
by: Wang, Xinyu, et al.
Published: (2026)
by: Wang, Xinyu, et al.
Published: (2026)
Learning in Blocks: A Multi Agent Debate Assisted Personalized Adaptive Learning Framework for Language Learning
by: Scaria, Nicy, et al.
Published: (2026)
by: Scaria, Nicy, et al.
Published: (2026)
Does the Appearance of Autonomous Conversational Robots Affect User Spoken Behaviors in Real-World Conference Interactions?
by: Pang, Zi Haur, et al.
Published: (2025)
by: Pang, Zi Haur, et al.
Published: (2025)
ChatGPT Role-play Dataset: Analysis of User Motives and Model Naturalness
by: Tao, Yufei, et al.
Published: (2024)
by: Tao, Yufei, et al.
Published: (2024)
Thinking with Many Minds: Using Large Language Models for Multi-Perspective Problem-Solving
by: Park, Sanghyun, et al.
Published: (2025)
by: Park, Sanghyun, et al.
Published: (2025)
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting
by: Baihaqi, Muhammad Yeza, et al.
Published: (2024)
by: Baihaqi, Muhammad Yeza, et al.
Published: (2024)
Comparing How a Chatbot References User Utterances from Previous Chatting Sessions: An Investigation of Users' Privacy Concerns and Perceptions
by: Cox, Samuel Rhys, et al.
Published: (2023)
by: Cox, Samuel Rhys, et al.
Published: (2023)
A Survey on LLM-based Conversational User Simulation
by: Ni, Bo, et al.
Published: (2026)
by: Ni, Bo, et al.
Published: (2026)
Taxonomy of User Needs and Actions
by: Shelby, Renee, et al.
Published: (2025)
by: Shelby, Renee, et al.
Published: (2025)
PleaSQLarify: Visual Pragmatic Repair for Natural Language Database Querying
by: Chan, Robin Shing Moon, et al.
Published: (2026)
by: Chan, Robin Shing Moon, et al.
Published: (2026)
MultiVox: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
by: Selvakumar, Ramaneswaran, et al.
Published: (2025)
by: Selvakumar, Ramaneswaran, et al.
Published: (2025)
An Analysis of Dialogue Repair in Voice Assistants
by: Galbraith, Matthew
Published: (2023)
by: Galbraith, Matthew
Published: (2023)
ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming
by: Herskovitz, Jaylin, et al.
Published: (2024)
by: Herskovitz, Jaylin, et al.
Published: (2024)
Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches
by: Zhu-Tian, Chen, et al.
Published: (2024)
by: Zhu-Tian, Chen, et al.
Published: (2024)
Automated Interpretability and Feature Discovery in Language Models with Agents
by: Marin-Llobet, Arnau, et al.
Published: (2026)
by: Marin-Llobet, Arnau, et al.
Published: (2026)
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
by: Kim, Tae Soo, et al.
Published: (2023)
by: Kim, Tae Soo, et al.
Published: (2023)
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails
by: Chowdhury, Sankalan Pal, et al.
Published: (2024)
by: Chowdhury, Sankalan Pal, et al.
Published: (2024)
EICAP: Deep Dive in Assessment and Enhancement of Large Language Models in Emotional Intelligence through Multi-Turn Conversations
by: Nazar, Nizi, et al.
Published: (2025)
by: Nazar, Nizi, et al.
Published: (2025)
User Willingness-aware Sales Talk Dataset
by: Hentona, Asahi, et al.
Published: (2024)
by: Hentona, Asahi, et al.
Published: (2024)
Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners
by: Vaccaro Jr, Michael, et al.
Published: (2024)
by: Vaccaro Jr, Michael, et al.
Published: (2024)
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback
by: Wu, Jason, et al.
Published: (2024)
by: Wu, Jason, et al.
Published: (2024)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
by: Qian, Cheng, et al.
Published: (2024)
by: Qian, Cheng, et al.
Published: (2024)
Multi-turn Evaluation of Anthropomorphic Behaviours in Large Language Models
by: Ibrahim, Lujain, et al.
Published: (2025)
by: Ibrahim, Lujain, et al.
Published: (2025)
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models
by: Wang, Junling, et al.
Published: (2025)
by: Wang, Junling, et al.
Published: (2025)
ArguMentor: Augmenting User Experiences with Counter-Perspectives
by: Pitre, Priya, et al.
Published: (2024)
by: Pitre, Priya, et al.
Published: (2024)
Users Mispredict Their Own Preferences for AI Writing Assistance
by: Lai, Vivian, et al.
Published: (2026)
by: Lai, Vivian, et al.
Published: (2026)
Similar Items
-
Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World
by: Wu, Guande, et al.
Published: (2024) -
"Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews
by: Wan, Ruyuan, et al.
Published: (2026) -
Low-code LLM: Graphical User Interface over Large Language Models
by: Cai, Yuzhe, et al.
Published: (2023) -
VisEval: A Benchmark for Data Visualization in the Era of Large Language Models
by: Chen, Nan, et al.
Published: (2024) -
EmoHarbor: Evaluating Personalized Emotional Support by Simulating the User's Internal World
by: Ye, Jing, et al.
Published: (2026)