Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Bo, Wang, Rui, Wu, Lemeng, Feng, Yihao, Stone, Peter, Liu, Qiang
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.14207
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910627935551488
author	Liu, Bo Wang, Rui Wu, Lemeng Feng, Yihao Stone, Peter Liu, Qiang
author_facet	Liu, Bo Wang, Rui Wu, Lemeng Feng, Yihao Stone, Peter Liu, Qiang
contents	Modern large language models are built on sequence modeling via next-token prediction. While the Transformer remains the dominant architecture for sequence modeling, its quadratic decoding complexity in sequence length poses a major limitation. State-space models (SSMs) present a competitive alternative, offering linear decoding efficiency while maintaining parallelism during training. However, most existing SSMs rely on linear recurrence designs that appear somewhat ad hoc. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from solving these objectives. Based on this insight, we introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem. Our experimental results show that Longhorn outperforms state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks, language modeling, and vision tasks. Specifically, Longhorn achieves a 1.8x improvement in sample efficiency compared to Mamba, and can extrapolate over contexts that are up to 16x longer during inference.
format	Preprint
id	arxiv_https___arxiv_org_abs_2407_14207
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Longhorn: State Space Models are Amortized Online Learners Liu, Bo Wang, Rui Wu, Lemeng Feng, Yihao Stone, Peter Liu, Qiang Machine Learning Modern large language models are built on sequence modeling via next-token prediction. While the Transformer remains the dominant architecture for sequence modeling, its quadratic decoding complexity in sequence length poses a major limitation. State-space models (SSMs) present a competitive alternative, offering linear decoding efficiency while maintaining parallelism during training. However, most existing SSMs rely on linear recurrence designs that appear somewhat ad hoc. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. This approach links SSM design to formulating precise online learning objectives, with state transition rules derived from solving these objectives. Based on this insight, we introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem. Our experimental results show that Longhorn outperforms state-of-the-art SSMs, including the Mamba model, on standard sequence modeling benchmarks, language modeling, and vision tasks. Specifically, Longhorn achieves a 1.8x improvement in sample efficiency compared to Mamba, and can extrapolate over contexts that are up to 16x longer during inference.
title	Longhorn: State Space Models are Amortized Online Learners
topic	Machine Learning
url	https://arxiv.org/abs/2407.14207

Similar Items