:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Elangovan, Aparna, Xu, Lei, Ko, Jongwoo, Elyasi, Mahsa, Liu, Ling, Bodapati, Sravan, Roth, Dan
Format:	Preprint
Published:	2024
Subjects:	Human-Computer Interaction Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.03775
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
by: Elangovan, Aparna, et al.
Published: (2024)

Human-Centered Design Recommendations for LLM-as-a-Judge
by: Pan, Qian, et al.
Published: (2024)

Generate, Evaluate, Iterate: Synthetic Data for Human-in-the-Loop Refinement of LLM Judges
by: Do, Hyo Jin, et al.
Published: (2025)

EvalAssist: A Human-Centered Tool for LLM-as-a-Judge
by: Ashktorab, Zahra, et al.
Published: (2025)

Can LLMs Synthesize Court-Ready Statistical Evidence? Evaluating AI-Assisted Sentencing Bias Analysis for California Racial Justice Act Claims
by: Komarla, Aparna
Published: (2026)

Human-Augmented Reality Interaction in Rebar Inspection
by: Sanei, Mahsa, et al.
Published: (2026)

Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks
by: Szymanski, Annalisa, et al.
Published: (2024)

MultEval: Supporting Collaborative Alignment for LLM-as-a-Judge Evaluation Criteria
by: Chiang, Charles, et al.
Published: (2026)

Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
by: Jayakumar, Eswari, et al.
Published: (2025)

All the Way There and Back: Inertial-Based, Phone-in-Pocket Indoor Wayfinding and Backtracking Apps for Blind Travelers
by: Tsai, Chia Hsuan, et al.
Published: (2024)

The Impact of Uncertainty Visualization on Trust in Thematic Maps
by: Srivastava, Varun, et al.
Published: (2026)

Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale
by: Li, Weiyue, et al.
Published: (2026)

Not All Uncertainty Is Equal: How Uncertainty Granularity Shapes Human Verification in LLM-Assisted Decision Making
by: Villavicencio, Mauricio, et al.
Published: (2026)

PRAISE: Enhancing Product Descriptions with LLM-Driven Structured Insights
by: Qidwai, Adnan, et al.
Published: (2025)

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?
by: Kim, Jane Paik
Published: (2026)

Provocation on Expertise in Social Impact Evaluations of Generative AI (and Beyond)
by: Kahn, Zoe, et al.
Published: (2024)

The Impact of Response Latency and Task Type on Human-LLM Interaction and Perception
by: Tan, Felicia Fang-Yi, et al.
Published: (2026)

AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals
by: Pasch, Stefan
Published: (2025)

Beyond Quantification: Navigating Uncertainty in Professional AI Systems
by: Delacroix, Sylvie, et al.
Published: (2025)

Striking a Balance: Evaluating How Aggregations of Multiple Forecasts Impact Judgment Under Uncertainty
by: Zou, Ruishi, et al.
Published: (2024)

Neural and Cognitive Impacts of AI: The Influence of Task Subjectivity on Human-LLM Collaboration
by: Russell, Matthew, et al.
Published: (2025)

DG Comics: Semi-Automatically Authoring Graph Comics for Dynamic Graphs
by: Kim, Joohee, et al.
Published: (2024)

Playing the Imitation Game: How Perceived Generated Content Shapes Player Experience
by: Bazzaz, Mahsa, et al.
Published: (2026)

Identifying Challenges in Designing, Developing and Evaluating Data Visualizations for Large Displays
by: Hamed, Mahsa Sinaei, et al.
Published: (2024)

Assessing Similarity Measures for the Evaluation of Human-Robot Motion Correspondence
by: Dietzel, Charles, et al.
Published: (2024)

The Impact of Concept Explanations and Interventions on Human-Machine Collaboration
by: Furby, Jack, et al.
Published: (2025)

Beyond the Hype: Mapping Uncertainty and Gratification in AI Assistant Use
by: Joy, Karen, et al.
Published: (2025)

MindCopilot: Towards Formalizing and Evaluating Granular Human-LLM Co-Writing
by: Fang, Youqing, et al.
Published: (2026)

Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots
by: Zou, Huiqi, et al.
Published: (2024)

VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
by: Sung, Yoo Yeon, et al.
Published: (2025)

MEGAnno+: A Human-LLM Collaborative Annotation System
by: Kim, Hannah, et al.
Published: (2024)

Analyzing the Impact of the Automatic Ball Strike System in Professional Baseball through a Case Study on KBO League Data
by: Lee, Kichang, et al.
Published: (2024)

How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond
by: Huang, Chen, et al.
Published: (2025)

On Arrival: Challenges and Opportunities Around Early-Stage Resettlement of Refugees in Australia
by: Song, Pinyao, et al.
Published: (2024)

Leveraging Internet of Things Network Metadata for Cost-Effective Automatic Smart Building Visualization
by: Staugaard, Benjamin, et al.
Published: (2024)

Towards Intelligent VR Training: A Physiological Adaptation Framework for Cognitive Load and Stress Detection
by: Nasri, Mahsa
Published: (2025)

LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation
by: Tao, Yiran, et al.
Published: (2025)

Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions
by: Kim, JiWoo, et al.
Published: (2025)

SensPS: Sensing Personal Space Comfortable Distance between Human-Human Using Multimodal Sensors
by: Watanabe, Ko, et al.
Published: (2025)

Evaluating Efficiency and Engagement in Scripted and LLM-Enhanced Human-Robot Interactions
by: Schreiter, Tim, et al.
Published: (2025)