Saved in:
Bibliographic Details
Main Authors: Sathya, Pharath, Huang, Yin Jou, Cheng, Fei
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.03698
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912806580781056
author Sathya, Pharath
Huang, Yin Jou
Cheng, Fei
author_facet Sathya, Pharath
Huang, Yin Jou
Cheng, Fei
contents Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation framework for AI story generation comprising four components (Novelty, Value, Adherence, and Resonance) and eleven sub-components. Using controlled story generation via ``Spike Prompting'' and a crowdsourced study of 115 readers, we examine how different creative components shape both immediate and reflective human creativity judgments. Our findings show that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment, and that reflective evaluation substantially alters both ratings and inter-rater agreement. Together, these results support the effectiveness of our framework in revealing dimensions of creativity that are obscured by reference-based evaluation.
format Preprint
id arxiv_https___arxiv_org_abs_2601_03698
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Evaluation Framework for AI Creativity: A Case Study Based on Story Generation
Sathya, Pharath
Huang, Yin Jou
Cheng, Fei
Computation and Language
Evaluating creative text generation remains a challenge because existing reference-based metrics fail to capture the subjective nature of creativity. We propose a structured evaluation framework for AI story generation comprising four components (Novelty, Value, Adherence, and Resonance) and eleven sub-components. Using controlled story generation via ``Spike Prompting'' and a crowdsourced study of 115 readers, we examine how different creative components shape both immediate and reflective human creativity judgments. Our findings show that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment, and that reflective evaluation substantially alters both ratings and inter-rater agreement. Together, these results support the effectiveness of our framework in revealing dimensions of creativity that are obscured by reference-based evaluation.
title Evaluation Framework for AI Creativity: A Case Study Based on Story Generation
topic Computation and Language
url https://arxiv.org/abs/2601.03698