Saved in:
| Main Authors: | Elangovan, Aparna, Xu, Lei, Ko, Jongwoo, Elyasi, Mahsa, Liu, Ling, Bodapati, Sravan, Roth, Dan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.03775 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
by: Elangovan, Aparna, et al.
Published: (2024)
by: Elangovan, Aparna, et al.
Published: (2024)
Human-Centered Design Recommendations for LLM-as-a-Judge
by: Pan, Qian, et al.
Published: (2024)
by: Pan, Qian, et al.
Published: (2024)
Generate, Evaluate, Iterate: Synthetic Data for Human-in-the-Loop Refinement of LLM Judges
by: Do, Hyo Jin, et al.
Published: (2025)
by: Do, Hyo Jin, et al.
Published: (2025)
EvalAssist: A Human-Centered Tool for LLM-as-a-Judge
by: Ashktorab, Zahra, et al.
Published: (2025)
by: Ashktorab, Zahra, et al.
Published: (2025)
Can LLMs Synthesize Court-Ready Statistical Evidence? Evaluating AI-Assisted Sentencing Bias Analysis for California Racial Justice Act Claims
by: Komarla, Aparna
Published: (2026)
by: Komarla, Aparna
Published: (2026)
Human-Augmented Reality Interaction in Rebar Inspection
by: Sanei, Mahsa, et al.
Published: (2026)
by: Sanei, Mahsa, et al.
Published: (2026)
Limitations of the LLM-as-a-Judge Approach for Evaluating LLM Outputs in Expert Knowledge Tasks
by: Szymanski, Annalisa, et al.
Published: (2024)
by: Szymanski, Annalisa, et al.
Published: (2024)
MultEval: Supporting Collaborative Alignment for LLM-as-a-Judge Evaluation Criteria
by: Chiang, Charles, et al.
Published: (2026)
by: Chiang, Charles, et al.
Published: (2026)
Large Language Model Agent Personality and Response Appropriateness: Evaluation by Human Linguistic Experts, LLM-as-Judge, and Natural Language Processing Model
by: Jayakumar, Eswari, et al.
Published: (2025)
by: Jayakumar, Eswari, et al.
Published: (2025)
All the Way There and Back: Inertial-Based, Phone-in-Pocket Indoor Wayfinding and Backtracking Apps for Blind Travelers
by: Tsai, Chia Hsuan, et al.
Published: (2024)
by: Tsai, Chia Hsuan, et al.
Published: (2024)
The Impact of Uncertainty Visualization on Trust in Thematic Maps
by: Srivastava, Varun, et al.
Published: (2026)
by: Srivastava, Varun, et al.
Published: (2026)
Grading Scale Impact on LLM-as-a-Judge: Human-LLM Alignment Is Highest on 0-5 Grading Scale
by: Li, Weiyue, et al.
Published: (2026)
by: Li, Weiyue, et al.
Published: (2026)
Not All Uncertainty Is Equal: How Uncertainty Granularity Shapes Human Verification in LLM-Assisted Decision Making
by: Villavicencio, Mauricio, et al.
Published: (2026)
by: Villavicencio, Mauricio, et al.
Published: (2026)
PRAISE: Enhancing Product Descriptions with LLM-Driven Structured Insights
by: Qidwai, Adnan, et al.
Published: (2025)
by: Qidwai, Adnan, et al.
Published: (2025)
Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?
by: Kim, Jane Paik
Published: (2026)
by: Kim, Jane Paik
Published: (2026)
Provocation on Expertise in Social Impact Evaluations of Generative AI (and Beyond)
by: Kahn, Zoe, et al.
Published: (2024)
by: Kahn, Zoe, et al.
Published: (2024)
The Impact of Response Latency and Task Type on Human-LLM Interaction and Perception
by: Tan, Felicia Fang-Yi, et al.
Published: (2026)
by: Tan, Felicia Fang-Yi, et al.
Published: (2026)
AI vs. Human Judgment of Content Moderation: LLM-as-a-Judge and Ethics-Based Response Refusals
by: Pasch, Stefan
Published: (2025)
by: Pasch, Stefan
Published: (2025)
Beyond Quantification: Navigating Uncertainty in Professional AI Systems
by: Delacroix, Sylvie, et al.
Published: (2025)
by: Delacroix, Sylvie, et al.
Published: (2025)
Striking a Balance: Evaluating How Aggregations of Multiple Forecasts Impact Judgment Under Uncertainty
by: Zou, Ruishi, et al.
Published: (2024)
by: Zou, Ruishi, et al.
Published: (2024)
Neural and Cognitive Impacts of AI: The Influence of Task Subjectivity on Human-LLM Collaboration
by: Russell, Matthew, et al.
Published: (2025)
by: Russell, Matthew, et al.
Published: (2025)
DG Comics: Semi-Automatically Authoring Graph Comics for Dynamic Graphs
by: Kim, Joohee, et al.
Published: (2024)
by: Kim, Joohee, et al.
Published: (2024)
Playing the Imitation Game: How Perceived Generated Content Shapes Player Experience
by: Bazzaz, Mahsa, et al.
Published: (2026)
by: Bazzaz, Mahsa, et al.
Published: (2026)
Identifying Challenges in Designing, Developing and Evaluating Data Visualizations for Large Displays
by: Hamed, Mahsa Sinaei, et al.
Published: (2024)
by: Hamed, Mahsa Sinaei, et al.
Published: (2024)
Assessing Similarity Measures for the Evaluation of Human-Robot Motion Correspondence
by: Dietzel, Charles, et al.
Published: (2024)
by: Dietzel, Charles, et al.
Published: (2024)
The Impact of Concept Explanations and Interventions on Human-Machine Collaboration
by: Furby, Jack, et al.
Published: (2025)
by: Furby, Jack, et al.
Published: (2025)
Beyond the Hype: Mapping Uncertainty and Gratification in AI Assistant Use
by: Joy, Karen, et al.
Published: (2025)
by: Joy, Karen, et al.
Published: (2025)
MindCopilot: Towards Formalizing and Evaluating Granular Human-LLM Co-Writing
by: Fang, Youqing, et al.
Published: (2026)
by: Fang, Youqing, et al.
Published: (2026)
Can LLM "Self-report"?: Evaluating the Validity of Self-report Scales in Measuring Personality Design in LLM-based Chatbots
by: Zou, Huiqi, et al.
Published: (2024)
by: Zou, Huiqi, et al.
Published: (2024)
VeriLA: A Human-Centered Evaluation Framework for Interpretable Verification of LLM Agent Failures
by: Sung, Yoo Yeon, et al.
Published: (2025)
by: Sung, Yoo Yeon, et al.
Published: (2025)
MEGAnno+: A Human-LLM Collaborative Annotation System
by: Kim, Hannah, et al.
Published: (2024)
by: Kim, Hannah, et al.
Published: (2024)
Analyzing the Impact of the Automatic Ball Strike System in Professional Baseball through a Case Study on KBO League Data
by: Lee, Kichang, et al.
Published: (2024)
by: Lee, Kichang, et al.
Published: (2024)
How to Enable Effective Cooperation Between Humans and NLP Models: A Survey of Principles, Formalizations, and Beyond
by: Huang, Chen, et al.
Published: (2025)
by: Huang, Chen, et al.
Published: (2025)
On Arrival: Challenges and Opportunities Around Early-Stage Resettlement of Refugees in Australia
by: Song, Pinyao, et al.
Published: (2024)
by: Song, Pinyao, et al.
Published: (2024)
Leveraging Internet of Things Network Metadata for Cost-Effective Automatic Smart Building Visualization
by: Staugaard, Benjamin, et al.
Published: (2024)
by: Staugaard, Benjamin, et al.
Published: (2024)
Towards Intelligent VR Training: A Physiological Adaptation Framework for Cognitive Load and Stress Detection
by: Nasri, Mahsa
Published: (2025)
by: Nasri, Mahsa
Published: (2025)
LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation
by: Tao, Yiran, et al.
Published: (2025)
by: Tao, Yiran, et al.
Published: (2025)
Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions
by: Kim, JiWoo, et al.
Published: (2025)
by: Kim, JiWoo, et al.
Published: (2025)
SensPS: Sensing Personal Space Comfortable Distance between Human-Human Using Multimodal Sensors
by: Watanabe, Ko, et al.
Published: (2025)
by: Watanabe, Ko, et al.
Published: (2025)
Evaluating Efficiency and Engagement in Scripted and LLM-Enhanced Human-Robot Interactions
by: Schreiter, Tim, et al.
Published: (2025)
by: Schreiter, Tim, et al.
Published: (2025)
Similar Items
-
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
by: Elangovan, Aparna, et al.
Published: (2024) -
Human-Centered Design Recommendations for LLM-as-a-Judge
by: Pan, Qian, et al.
Published: (2024) -
Generate, Evaluate, Iterate: Synthetic Data for Human-in-the-Loop Refinement of LLM Judges
by: Do, Hyo Jin, et al.
Published: (2025) -
EvalAssist: A Human-Centered Tool for LLM-as-a-Judge
by: Ashktorab, Zahra, et al.
Published: (2025) -
Can LLMs Synthesize Court-Ready Statistical Evidence? Evaluating AI-Assisted Sentencing Bias Analysis for California Racial Justice Act Claims
by: Komarla, Aparna
Published: (2026)