Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Liang, Laurence
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2410.15570
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912079149006848
author	Liang, Laurence
author_facet	Liang, Laurence
contents	Recent advances show that large language models (LLMs) generalize strong performance across different natural language benchmarks. However, the large size of LLMs makes training and inference expensive and impractical to run in resource-limited settings. This paper introduces a new approach called fine-tuning stacks of language models (FSLM), which involves stacking small language models (SLM) as an alternative to LLMs. By fine-tuning each SLM to perform a specific task, this approach breaks down high level reasoning into multiple lower-level steps that specific SLMs are responsible for. As a result, FSLM allows for lower training and inference costs, and also improves model interpretability as each SLM communicates with the subsequent one through natural language. By evaluating FSLM on common natural language benchmarks, this paper highlights promising early results toward generalizable performance using FSLM as a cost-effective alternative to LLMs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_15570
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Stacking Small Language Models for Generalizability Liang, Laurence Computation and Language Artificial Intelligence Machine Learning Recent advances show that large language models (LLMs) generalize strong performance across different natural language benchmarks. However, the large size of LLMs makes training and inference expensive and impractical to run in resource-limited settings. This paper introduces a new approach called fine-tuning stacks of language models (FSLM), which involves stacking small language models (SLM) as an alternative to LLMs. By fine-tuning each SLM to perform a specific task, this approach breaks down high level reasoning into multiple lower-level steps that specific SLMs are responsible for. As a result, FSLM allows for lower training and inference costs, and also improves model interpretability as each SLM communicates with the subsequent one through natural language. By evaluating FSLM on common natural language benchmarks, this paper highlights promising early results toward generalizable performance using FSLM as a cost-effective alternative to LLMs.
title	Stacking Small Language Models for Generalizability
topic	Computation and Language Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2410.15570

Similar Items