Saved in:
Bibliographic Details
Main Authors: Alam, Md Ibrahim Ibne, Ram, Parikshit, Dan, Soham, Samulowitz, Horst, Kar, Koushik
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.13720
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913928413446144
author Alam, Md Ibrahim Ibne
Ram, Parikshit
Dan, Soham
Samulowitz, Horst
Kar, Koushik
author_facet Alam, Md Ibrahim Ibne
Ram, Parikshit
Dan, Soham
Samulowitz, Horst
Kar, Koushik
contents Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning.
format Preprint
id arxiv_https___arxiv_org_abs_2406_13720
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems
Alam, Md Ibrahim Ibne
Ram, Parikshit
Dan, Soham
Samulowitz, Horst
Kar, Koushik
Computation and Language
Machine Learning
Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning.
title On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems
topic Computation and Language
Machine Learning
url https://arxiv.org/abs/2406.13720