Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Zhiyuan, Liu, Yuliang, Zhao, Jinman, Wang, Suyuchen, Wang, Yan, Shen, Wei, Gu, Qing, Luu, Anh Tuan, Ng, See-Kiong, Jiang, Zhiwei, Hooi, Bryan
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2409.00509
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913491370115072
author	Hu, Zhiyuan Liu, Yuliang Zhao, Jinman Wang, Suyuchen Wang, Yan Shen, Wei Gu, Qing Luu, Anh Tuan Ng, See-Kiong Jiang, Zhiwei Hooi, Bryan
author_facet	Hu, Zhiyuan Liu, Yuliang Zhao, Jinman Wang, Suyuchen Wang, Yan Shen, Wei Gu, Qing Luu, Anh Tuan Ng, See-Kiong Jiang, Zhiwei Hooi, Bryan
contents	Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory. Our code is released at https://github.com/zhiyuanhubj/LongRecipe.
format	Preprint
id	arxiv_https___arxiv_org_abs_2409_00509
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models Hu, Zhiyuan Liu, Yuliang Zhao, Jinman Wang, Suyuchen Wang, Yan Shen, Wei Gu, Qing Luu, Anh Tuan Ng, See-Kiong Jiang, Zhiwei Hooi, Bryan Computation and Language Large language models (LLMs) face significant challenges in handling long-context tasks because of their limited effective context window size during pretraining, which restricts their ability to generalize over extended sequences. Meanwhile, extending the context window in LLMs through post-pretraining is highly resource-intensive. To address this, we introduce LongRecipe, an efficient training strategy for extending the context window of LLMs, including impactful token analysis, position index transformation, and training optimization strategies. It simulates long-sequence inputs while maintaining training efficiency and significantly improves the model's understanding of long-range dependencies. Experiments on three types of LLMs show that LongRecipe can utilize long sequences while requiring only 30% of the target context window size, and reduces computational training resource over 85% compared to full sequence training. Furthermore, LongRecipe also preserves the original LLM's capabilities in general tasks. Ultimately, we can extend the effective context window of open-source LLMs from 8k to 128k, achieving performance close to GPT-4 with just one day of dedicated training using a single GPU with 80G memory. Our code is released at https://github.com/zhiyuanhubj/LongRecipe.
title	LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2409.00509

Similar Items