MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Chandra, Arjun, Miller, Kevin, Ravichandran, Venkatesh, Papayiannis, Constantinos, Saligrama, Venkatesh
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2601.13742
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866908784900702208
author	Chandra, Arjun Miller, Kevin Ravichandran, Venkatesh Papayiannis, Constantinos Saligrama, Venkatesh
author_facet	Chandra, Arjun Miller, Kevin Ravichandran, Venkatesh Papayiannis, Constantinos Saligrama, Venkatesh
contents	Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs). In this work, we propose TRACE (Textual Reasoning over Audio Cues for Evaluation), a novel framework that enables LLM judges to reason over audio cues to achieve cost-efficient and human-aligned S2S evaluation. To demonstrate the strength of the framework, we first introduce a Human Chain-of-Thought (HCoT) annotation protocol to improve the diagnostic capability of existing judge benchmarks by separating evaluation into explicit dimensions: content (C), voice quality (VQ), and paralinguistics (P). Using this data, TRACE constructs a textual blueprint of inexpensive audio signals and prompts an LLM to render dimension-wise judgments, fusing them into an overall rating via a deterministic policy. TRACE achieves higher agreement with human raters than ALMs and transcript-only LLM judges while being significantly more cost-effective. We will release the HCoT annotations and the TRACE framework to enable scalable and human-aligned S2S evaluation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_13742
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation Chandra, Arjun Miller, Kevin Ravichandran, Venkatesh Papayiannis, Constantinos Saligrama, Venkatesh Computation and Language Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs). In this work, we propose TRACE (Textual Reasoning over Audio Cues for Evaluation), a novel framework that enables LLM judges to reason over audio cues to achieve cost-efficient and human-aligned S2S evaluation. To demonstrate the strength of the framework, we first introduce a Human Chain-of-Thought (HCoT) annotation protocol to improve the diagnostic capability of existing judge benchmarks by separating evaluation into explicit dimensions: content (C), voice quality (VQ), and paralinguistics (P). Using this data, TRACE constructs a textual blueprint of inexpensive audio signals and prompts an LLM to render dimension-wise judgments, fusing them into an overall rating via a deterministic policy. TRACE achieves higher agreement with human raters than ALMs and transcript-only LLM judges while being significantly more cost-effective. We will release the HCoT annotations and the TRACE framework to enable scalable and human-aligned S2S evaluation.
title	Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation
topic	Computation and Language
url	https://arxiv.org/abs/2601.13742

Documenti analoghi