Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sathya, Pharath, Huang, Yin Jou, Cheng, Fei
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.03698
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912806580781056
author	Sathya, Pharath Huang, Yin Jou Cheng, Fei
author_facet	Sathya, Pharath Huang, Yin Jou Cheng, Fei
contents	Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation framework for AI story generation comprising four components (Novelty, Value, Adherence, and Resonance) and eleven sub-components. Using controlled story generation via ``Spike Prompting'' and a crowdsourced study of 115 readers, we examine how different creative components shape both immediate and reflective human creativity judgments. Our findings show that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment, and that reflective evaluation substantially alters both ratings and inter-rater agreement. Together, these results support the effectiveness of our framework in revealing dimensions of creativity that are obscured by reference-based evaluation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_03698
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Evaluation Framework for AI Creativity: A Case Study Based on Story Generation Sathya, Pharath Huang, Yin Jou Cheng, Fei Computation and Language Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation framework for AI story generation comprising four components (Novelty, Value, Adherence, and Resonance) and eleven sub-components. Using controlled story generation via ``Spike Prompting'' and a crowdsourced study of 115 readers, we examine how different creative components shape both immediate and reflective human creativity judgments. Our findings show that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment, and that reflective evaluation substantially alters both ratings and inter-rater agreement. Together, these results support the effectiveness of our framework in revealing dimensions of creativity that are obscured by reference-based evaluation.
title	Evaluation Framework for AI Creativity: A Case Study Based on Story Generation
topic	Computation and Language
url	https://arxiv.org/abs/2601.03698

Similar Items