:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Branch, Boyd, Mirowski, Piotr, Mathewson, Kory, Ppali, Sophia, Covaci, Alexandra
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2405.07111
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Theater Stage as Laboratory: Review of Real-Time Comedy LLM Systems for Live Performance
by: Mirowski, Piotr Wojciech, et al.
Published: (2025)

A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
by: Mirowski, Piotr Wojciech, et al.
Published: (2024)

Divergent Creativity in Humans and Large Language Models
by: Bellemare-Pepin, Antoine, et al.
Published: (2024)

Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
by: Qadri, Rida, et al.
Published: (2024)

Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry
by: Mathewson, Kyle Elliott
Published: (2026)

Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)

On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
by: Mendonça, John, et al.
Published: (2024)

Evaluation of Code LLMs on Geospatial Code Generation
by: Gramacki, Piotr, et al.
Published: (2024)

Evaluating the Creativity of LLMs in Persian Literary Text Generation
by: Tourajmehr, Armin, et al.
Published: (2025)

IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs
by: Carichon, F., et al.
Published: (2026)

Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity
by: Li, Zhu, et al.
Published: (2026)

Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs
by: Mendonça, John, et al.
Published: (2024)

Do LLMs Agree on the Creativity Evaluation of Alternative Uses?
by: Rabeyah, Abdullah Al, et al.
Published: (2024)

Sensing Heritage: Exploring Creative Approaches for Capturing, Experiencing and Safeguarding the Sensorial Aspects of Cultural Heritage
by: Ppali, Sophia, et al.
Published: (2024)

An Evaluation of LLMs for Detecting Harmful Computing Terms
by: Jacas, Joshua, et al.
Published: (2025)

Dynamic Evaluation for Oversensitivity in LLMs
by: Pu, Sophia Xiao, et al.
Published: (2025)

VR as a "Drop-In" Well-being Tool for Knowledge Workers
by: Ppali, Sophia, et al.
Published: (2025)

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
by: Badola, Kartikeya, et al.
Published: (2025)

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
by: Kim, Eunsu, et al.
Published: (2025)

Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)

Are LLMs Robust for Spoken Dialogues?
by: Mousavi, Seyed Mahed, et al.
Published: (2024)

DialogBench: Evaluating LLMs as Human-like Dialogue Systems
by: Ou, Jiao, et al.
Published: (2023)

Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization
by: Jin, Keyan, et al.
Published: (2025)

Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations
by: Lu, Li-Chun, et al.
Published: (2025)

CREATE: Testing LLMs for Associative Creativity
by: Wadhwa, Manya, et al.
Published: (2026)

MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
by: Mendonça, John, et al.
Published: (2025)

LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue
by: Kowalyshyn, Katharine, et al.
Published: (2025)

Leveraging LLMs for Dialogue Quality Measurement
by: Jia, Jinghan, et al.
Published: (2024)

Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs
by: Siro, Clemencia, et al.
Published: (2024)

TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
by: Tang, Liyan, et al.
Published: (2024)

Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations
by: Wu, Yihao, et al.
Published: (2025)

Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs
by: Kabir, Mohsinul, et al.
Published: (2025)

CoCo-CoLa: Evaluating and Improving Language Adherence in Multilingual LLMs
by: Rahmati, Elnaz, et al.
Published: (2025)

Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
by: Balepur, Nishant, et al.
Published: (2025)

CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation
by: DeLorenzo, Matthew, et al.
Published: (2024)

Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations
by: Gerrits, Kyo, et al.
Published: (2026)

Responsible Trauma Research: Designing Effective and Sustainable Virtual Reality Exposure Studies
by: Degenhard, Annalisa, et al.
Published: (2026)

CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment
by: Shayanfar, Radin, et al.
Published: (2025)

A Bolu: A Structured Dataset for the Computational Analysis of Sardinian Improvisational Poetry
by: Calderaro, Silvio, et al.
Published: (2026)

Compass-v3: Scaling Domain-Specific LLMs for Multilingual E-Commerce in Southeast Asia
by: Maria, Sophia
Published: (2025)