Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.13720 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913928413446144 |
|---|---|
| author | Alam, Md Ibrahim Ibne Ram, Parikshit Dan, Soham Samulowitz, Horst Kar, Koushik |
| author_facet | Alam, Md Ibrahim Ibne Ram, Parikshit Dan, Soham Samulowitz, Horst Kar, Koushik |
| contents | Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2406_13720 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems Alam, Md Ibrahim Ibne Ram, Parikshit Dan, Soham Samulowitz, Horst Kar, Koushik Computation and Language Machine Learning Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning. |
| title | On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems |
| topic | Computation and Language Machine Learning |
| url | https://arxiv.org/abs/2406.13720 |