Saved in:
| Main Authors: | Cook, Jonathan, Rocktäschel, Tim, Foerster, Jakob, Aumiller, Dennis, Wang, Alex |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.03608 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Creative Beam Search: LLM-as-a-Judge For Improving Response Generation
by: Franceschelli, Giorgio, et al.
Published: (2024)
by: Franceschelli, Giorgio, et al.
Published: (2024)
LLM Attributor: Interactive Visual Attribution for LLM Generation
by: Lee, Seongmin, et al.
Published: (2024)
by: Lee, Seongmin, et al.
Published: (2024)
PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
by: Fu, Xiao, et al.
Published: (2025)
by: Fu, Xiao, et al.
Published: (2025)
DigiData: Training and Evaluating General-Purpose Mobile Control Agents
by: Sun, Yuxuan, et al.
Published: (2025)
by: Sun, Yuxuan, et al.
Published: (2025)
Properties and Challenges of LLM-Generated Explanations
by: Kunz, Jenny, et al.
Published: (2024)
by: Kunz, Jenny, et al.
Published: (2024)
Large Language Models for Cancer Communication: Evaluating Linguistic Quality, Safety, and Accessibility in Generative AI
by: Saha, Agnik, et al.
Published: (2025)
by: Saha, Agnik, et al.
Published: (2025)
Generative UI: LLMs are Effective UI Generators
by: Leviathan, Yaniv, et al.
Published: (2026)
by: Leviathan, Yaniv, et al.
Published: (2026)
Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems
by: Rajagopal, Shreya, et al.
Published: (2024)
by: Rajagopal, Shreya, et al.
Published: (2024)
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
by: Kahng, Minsuk, et al.
Published: (2024)
by: Kahng, Minsuk, et al.
Published: (2024)
Can LLM-Generated Misinformation Be Detected?
by: Chen, Canyu, et al.
Published: (2023)
by: Chen, Canyu, et al.
Published: (2023)
Programming by Backprop: An Instruction is Worth 100 Examples When Finetuning LLMs
by: Cook, Jonathan, et al.
Published: (2025)
by: Cook, Jonathan, et al.
Published: (2025)
ABLEIST: Intersectional Disability Bias in LLM-Generated Hiring Scenarios
by: Phutane, Mahika, et al.
Published: (2025)
by: Phutane, Mahika, et al.
Published: (2025)
Building Trust in Mental Health Chatbots: Safety Metrics and LLM-Based Evaluation Tools
by: Park, Jung In, et al.
Published: (2024)
by: Park, Jung In, et al.
Published: (2024)
The Behavior Gap: Evaluating Zero-shot LLM Agents in Complex Task-Oriented Dialogs
by: Baidya, Avinash, et al.
Published: (2025)
by: Baidya, Avinash, et al.
Published: (2025)
Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?
by: Kim, Jane Paik
Published: (2026)
by: Kim, Jane Paik
Published: (2026)
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
by: Dammu, Preetam Prabhu Srikar, et al.
Published: (2024)
by: Dammu, Preetam Prabhu Srikar, et al.
Published: (2024)
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas
by: Si, Chenglei, et al.
Published: (2025)
by: Si, Chenglei, et al.
Published: (2025)
Transformer Explainer: Interactive Learning of Text-Generative Models
by: Cho, Aeree, et al.
Published: (2024)
by: Cho, Aeree, et al.
Published: (2024)
Survey of User Interface Design and Interaction Techniques in Generative AI Applications
by: Luera, Reuben, et al.
Published: (2024)
by: Luera, Reuben, et al.
Published: (2024)
Demo: Statistically Significant Results On Biases and Errors of LLMs Do Not Guarantee Generalizable Results
by: Liu, Jonathan, et al.
Published: (2025)
by: Liu, Jonathan, et al.
Published: (2025)
The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind
by: Lupu, Andrei, et al.
Published: (2025)
by: Lupu, Andrei, et al.
Published: (2025)
Agent Laboratory: Using LLM Agents as Research Assistants
by: Schmidgall, Samuel, et al.
Published: (2025)
by: Schmidgall, Samuel, et al.
Published: (2025)
DiscoverLLM: From Executing Intents to Discovering Them
by: Kim, Tae Soo, et al.
Published: (2026)
by: Kim, Tae Soo, et al.
Published: (2026)
Policy Maps: Tools for Guiding the Unbounded Space of LLM Behaviors
by: Lam, Michelle S., et al.
Published: (2024)
by: Lam, Michelle S., et al.
Published: (2024)
UniAutoML: A Human-Centered Framework for Unified Discriminative and Generative AutoML with Large Language Models
by: Guo, Jiayi, et al.
Published: (2024)
by: Guo, Jiayi, et al.
Published: (2024)
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
by: Zhang, Lechen, et al.
Published: (2024)
by: Zhang, Lechen, et al.
Published: (2024)
Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning
by: Guo, Dongyang, et al.
Published: (2025)
by: Guo, Dongyang, et al.
Published: (2025)
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
by: Wu, Xiaoyuan, et al.
Published: (2025)
by: Wu, Xiaoyuan, et al.
Published: (2025)
Language Models as Zero-Shot Trajectory Generators
by: Kwon, Teyun, et al.
Published: (2023)
by: Kwon, Teyun, et al.
Published: (2023)
Talking with Oompa Loompas: A novel framework for evaluating linguistic acquisition of LLM agents
by: Swain, Sankalp Tattwadarshi, et al.
Published: (2025)
by: Swain, Sankalp Tattwadarshi, et al.
Published: (2025)
Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages
by: Zhang, Lechen, et al.
Published: (2025)
by: Zhang, Lechen, et al.
Published: (2025)
Call2Instruct: Automated Pipeline for Generating Q&A Datasets from Call Center Recordings for LLM Fine-Tuning
by: Echeverria, Alex, et al.
Published: (2025)
by: Echeverria, Alex, et al.
Published: (2025)
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
by: Lee, Dong Won, et al.
Published: (2024)
by: Lee, Dong Won, et al.
Published: (2024)
Heterogeneous Value Alignment Evaluation for Large Language Models
by: Zhang, Zhaowei, et al.
Published: (2023)
by: Zhang, Zhaowei, et al.
Published: (2023)
Never Start from Scratch: Expediting On-Device LLM Personalization via Explainable Model Selection
by: Wang, Haoming, et al.
Published: (2025)
by: Wang, Haoming, et al.
Published: (2025)
Evaluating Large Language Models for Health-related Queries with Presuppositions
by: Kaur, Navreet, et al.
Published: (2023)
by: Kaur, Navreet, et al.
Published: (2023)
PRECISE Framework: GPT-based Text For Improved Readability, Reliability, and Understandability of Radiology Reports For Patient-Centered Care
by: Tripathi, Satvik, et al.
Published: (2024)
by: Tripathi, Satvik, et al.
Published: (2024)
Can LLM feedback enhance review quality? A randomized study of 20K reviews at ICLR 2025
by: Thakkar, Nitya, et al.
Published: (2025)
by: Thakkar, Nitya, et al.
Published: (2025)
AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
by: Zeng, Qiuhai, et al.
Published: (2025)
by: Zeng, Qiuhai, et al.
Published: (2025)
EduAgent: Generative Student Agents in Learning
by: Xu, Songlin, et al.
Published: (2024)
by: Xu, Songlin, et al.
Published: (2024)
Similar Items
-
Creative Beam Search: LLM-as-a-Judge For Improving Response Generation
by: Franceschelli, Giorgio, et al.
Published: (2024) -
LLM Attributor: Interactive Visual Attribution for LLM Generation
by: Lee, Seongmin, et al.
Published: (2024) -
PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
by: Fu, Xiao, et al.
Published: (2025) -
DigiData: Training and Evaluating General-Purpose Mobile Control Agents
by: Sun, Yuxuan, et al.
Published: (2025) -
Properties and Challenges of LLM-Generated Explanations
by: Kunz, Jenny, et al.
Published: (2024)