MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autore principale:	Bharadwaj, Aryasomayajula Ram
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Machine Learning
Accesso online:	https://arxiv.org/abs/2412.04537
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866913599447891968
author	Bharadwaj, Aryasomayajula Ram
author_facet	Bharadwaj, Aryasomayajula Ram
contents	Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_04537
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Understanding Hidden Computations in Chain-of-Thought Reasoning Bharadwaj, Aryasomayajula Ram Computation and Language Machine Learning Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.
title	Understanding Hidden Computations in Chain-of-Thought Reasoning
topic	Computation and Language Machine Learning
url	https://arxiv.org/abs/2412.04537

Documenti analoghi