Saved in:
| Main Authors: | Branch, Boyd, Mirowski, Piotr, Mathewson, Kory, Ppali, Sophia, Covaci, Alexandra |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.07111 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Theater Stage as Laboratory: Review of Real-Time Comedy LLM Systems for Live Performance
by: Mirowski, Piotr Wojciech, et al.
Published: (2025)
by: Mirowski, Piotr Wojciech, et al.
Published: (2025)
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
by: Mirowski, Piotr Wojciech, et al.
Published: (2024)
by: Mirowski, Piotr Wojciech, et al.
Published: (2024)
Divergent Creativity in Humans and Large Language Models
by: Bellemare-Pepin, Antoine, et al.
Published: (2024)
by: Bellemare-Pepin, Antoine, et al.
Published: (2024)
Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
by: Qadri, Rida, et al.
Published: (2024)
by: Qadri, Rida, et al.
Published: (2024)
Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry
by: Mathewson, Kyle Elliott
Published: (2026)
by: Mathewson, Kyle Elliott
Published: (2026)
Exploring Safety Alignment Evaluation of LLMs in Chinese Mental Health Dialogues via LLM-as-Judge
by: Cai, Yunna, et al.
Published: (2025)
by: Cai, Yunna, et al.
Published: (2025)
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation
by: Mendonça, John, et al.
Published: (2024)
by: Mendonça, John, et al.
Published: (2024)
Evaluation of Code LLMs on Geospatial Code Generation
by: Gramacki, Piotr, et al.
Published: (2024)
by: Gramacki, Piotr, et al.
Published: (2024)
Evaluating the Creativity of LLMs in Persian Literary Text Generation
by: Tourajmehr, Armin, et al.
Published: (2025)
by: Tourajmehr, Armin, et al.
Published: (2025)
IDEAFix: Evaluation Framework for Creative Defixation Prompting in LLMs
by: Carichon, F., et al.
Published: (2026)
by: Carichon, F., et al.
Published: (2026)
Lighting Up or Dimming Down? Exploring Dark Patterns of LLMs in Co-Creativity
by: Li, Zhu, et al.
Published: (2026)
by: Li, Zhu, et al.
Published: (2026)
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs
by: Mendonça, John, et al.
Published: (2024)
by: Mendonça, John, et al.
Published: (2024)
Do LLMs Agree on the Creativity Evaluation of Alternative Uses?
by: Rabeyah, Abdullah Al, et al.
Published: (2024)
by: Rabeyah, Abdullah Al, et al.
Published: (2024)
Sensing Heritage: Exploring Creative Approaches for Capturing, Experiencing and Safeguarding the Sensorial Aspects of Cultural Heritage
by: Ppali, Sophia, et al.
Published: (2024)
by: Ppali, Sophia, et al.
Published: (2024)
An Evaluation of LLMs for Detecting Harmful Computing Terms
by: Jacas, Joshua, et al.
Published: (2025)
by: Jacas, Joshua, et al.
Published: (2025)
Dynamic Evaluation for Oversensitivity in LLMs
by: Pu, Sophia Xiao, et al.
Published: (2025)
by: Pu, Sophia Xiao, et al.
Published: (2025)
VR as a "Drop-In" Well-being Tool for Knowledge Workers
by: Ppali, Sophia, et al.
Published: (2025)
by: Ppali, Sophia, et al.
Published: (2025)
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
by: Badola, Kartikeya, et al.
Published: (2025)
by: Badola, Kartikeya, et al.
Published: (2025)
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues
by: Kim, Eunsu, et al.
Published: (2025)
by: Kim, Eunsu, et al.
Published: (2025)
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)
by: Kwon, Deuksin, et al.
Published: (2024)
Are LLMs Robust for Spoken Dialogues?
by: Mousavi, Seyed Mahed, et al.
Published: (2024)
by: Mousavi, Seyed Mahed, et al.
Published: (2024)
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
by: Ou, Jiao, et al.
Published: (2023)
by: Ou, Jiao, et al.
Published: (2023)
Reasoning or Not? A Comprehensive Evaluation of Reasoning LLMs for Dialogue Summarization
by: Jin, Keyan, et al.
Published: (2025)
by: Jin, Keyan, et al.
Published: (2025)
Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations
by: Lu, Li-Chun, et al.
Published: (2025)
by: Lu, Li-Chun, et al.
Published: (2025)
CREATE: Testing LLMs for Associative Creativity
by: Wadhwa, Manya, et al.
Published: (2026)
by: Wadhwa, Manya, et al.
Published: (2026)
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Dialogue Evaluators
by: Mendonça, John, et al.
Published: (2025)
by: Mendonça, John, et al.
Published: (2025)
LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue
by: Kowalyshyn, Katharine, et al.
Published: (2025)
by: Kowalyshyn, Katharine, et al.
Published: (2025)
Leveraging LLMs for Dialogue Quality Measurement
by: Jia, Jinghan, et al.
Published: (2024)
by: Jia, Jinghan, et al.
Published: (2024)
Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs
by: Siro, Clemencia, et al.
Published: (2024)
by: Siro, Clemencia, et al.
Published: (2024)
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
by: Tang, Liyan, et al.
Published: (2024)
by: Tang, Liyan, et al.
Published: (2024)
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations
by: Wu, Yihao, et al.
Published: (2025)
by: Wu, Yihao, et al.
Published: (2025)
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs
by: Kabir, Mohsinul, et al.
Published: (2025)
by: Kabir, Mohsinul, et al.
Published: (2025)
CoCo-CoLa: Evaluating and Improving Language Adherence in Multilingual LLMs
by: Rahmati, Elnaz, et al.
Published: (2025)
by: Rahmati, Elnaz, et al.
Published: (2025)
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above
by: Balepur, Nishant, et al.
Published: (2025)
by: Balepur, Nishant, et al.
Published: (2025)
CreativEval: Evaluating Creativity of LLM-Based Hardware Code Generation
by: DeLorenzo, Matthew, et al.
Published: (2024)
by: DeLorenzo, Matthew, et al.
Published: (2024)
Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations
by: Gerrits, Kyo, et al.
Published: (2026)
by: Gerrits, Kyo, et al.
Published: (2026)
Responsible Trauma Research: Designing Effective and Sustainable Virtual Reality Exposure Studies
by: Degenhard, Annalisa, et al.
Published: (2026)
by: Degenhard, Annalisa, et al.
Published: (2026)
CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment
by: Shayanfar, Radin, et al.
Published: (2025)
by: Shayanfar, Radin, et al.
Published: (2025)
A Bolu: A Structured Dataset for the Computational Analysis of Sardinian Improvisational Poetry
by: Calderaro, Silvio, et al.
Published: (2026)
by: Calderaro, Silvio, et al.
Published: (2026)
Compass-v3: Scaling Domain-Specific LLMs for Multilingual E-Commerce in Southeast Asia
by: Maria, Sophia
Published: (2025)
by: Maria, Sophia
Published: (2025)
Similar Items
-
The Theater Stage as Laboratory: Review of Real-Time Comedy LLM Systems for Live Performance
by: Mirowski, Piotr Wojciech, et al.
Published: (2025) -
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
by: Mirowski, Piotr Wojciech, et al.
Published: (2024) -
Divergent Creativity in Humans and Large Language Models
by: Bellemare-Pepin, Antoine, et al.
Published: (2024) -
Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
by: Qadri, Rida, et al.
Published: (2024) -
Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry
by: Mathewson, Kyle Elliott
Published: (2026)