Saved in:
Bibliographic Details
Main Authors: Li, Zhiyuan, Liu, Hong, Zhou, Denny, Ma, Tengyu
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2402.12875
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916404794490880
author Li, Zhiyuan
Liu, Hong
Zhou, Denny
Ma, Tengyu
author_facet Li, Zhiyuan
Liu, Hong
Zhou, Denny
Ma, Tengyu
contents Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
format Preprint
id arxiv_https___arxiv_org_abs_2402_12875
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Li, Zhiyuan
Liu, Hong
Zhou, Denny
Ma, Tengyu
Machine Learning
Computational Complexity
Instructing the model to generate a sequence of intermediate steps, a.k.a., a chain of thought (CoT), is a highly effective method to improve the accuracy of large language models (LLMs) on arithmetics and symbolic reasoning tasks. However, the mechanism behind CoT remains unclear. This work provides a theoretical understanding of the power of CoT for decoder-only transformers through the lens of expressiveness. Conceptually, CoT empowers the model with the ability to perform inherently serial computation, which is otherwise lacking in transformers, especially when depth is low. Given input length $n$, previous works have shown that constant-depth transformers with finite precision $\mathsf{poly}(n)$ embedding size can only solve problems in $\mathsf{TC}^0$ without CoT. We first show an even tighter expressiveness upper bound for constant-depth transformers with constant-bit precision, which can only solve problems in $\mathsf{AC}^0$, a proper subset of $ \mathsf{TC}^0$. However, with $T$ steps of CoT, constant-depth transformers using constant-bit precision and $O(\log n)$ embedding size can solve any problem solvable by boolean circuits of size $T$. Empirically, enabling CoT dramatically improves the accuracy for tasks that are hard for parallel computation, including the composition of permutation groups, iterated squaring, and circuit value problems, especially for low-depth transformers.
title Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
topic Machine Learning
Computational Complexity
url https://arxiv.org/abs/2402.12875