Saved in:
Bibliographic Details
Main Author: Liang, Laurence
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.15570
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912079149006848
author Liang, Laurence
author_facet Liang, Laurence
contents Recent advances show that large language models (LLMs) generalize strong performance across different natural language benchmarks. However, the large size of LLMs makes training and inference expensive and impractical to run in resource-limited settings. This paper introduces a new approach called fine-tuning stacks of language models (FSLM), which involves stacking small language models (SLM) as an alternative to LLMs. By fine-tuning each SLM to perform a specific task, this approach breaks down high level reasoning into multiple lower-level steps that specific SLMs are responsible for. As a result, FSLM allows for lower training and inference costs, and also improves model interpretability as each SLM communicates with the subsequent one through natural language. By evaluating FSLM on common natural language benchmarks, this paper highlights promising early results toward generalizable performance using FSLM as a cost-effective alternative to LLMs.
format Preprint
id arxiv_https___arxiv_org_abs_2410_15570
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Stacking Small Language Models for Generalizability
Liang, Laurence
Computation and Language
Artificial Intelligence
Machine Learning
Recent advances show that large language models (LLMs) generalize strong performance across different natural language benchmarks. However, the large size of LLMs makes training and inference expensive and impractical to run in resource-limited settings. This paper introduces a new approach called fine-tuning stacks of language models (FSLM), which involves stacking small language models (SLM) as an alternative to LLMs. By fine-tuning each SLM to perform a specific task, this approach breaks down high level reasoning into multiple lower-level steps that specific SLMs are responsible for. As a result, FSLM allows for lower training and inference costs, and also improves model interpretability as each SLM communicates with the subsequent one through natural language. By evaluating FSLM on common natural language benchmarks, this paper highlights promising early results toward generalizable performance using FSLM as a cost-effective alternative to LLMs.
title Stacking Small Language Models for Generalizability
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2410.15570