Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.02247 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911788770000896 |
|---|---|
| author | Jindal, Ashvini Kumar Rajpoot, Pawan Kumar Parikh, Ankur |
| author_facet | Jindal, Ashvini Kumar Rajpoot, Pawan Kumar Parikh, Ankur |
| contents | LLMOps incur significant costs due to hardware requirements, hindering their widespread accessibility. Additionally, a lack of transparency in model training methods and data contributes to the majority of models being non-reproducible. To tackle these challenges, the LLM Efficiency Challenge was introduced at NeurIPS Workshop, aiming to adapt foundation models on a diverse set of tasks via fine-tuning on a single GPU (RTX 4090 or A100 with 40GB) within a 24-hour timeframe. In this system description paper, we introduce Birbal, our Mistral-7B based winning model, fine-tuned on a single RTX 4090 for 16 hours. Birbal's success lies in curating high-quality instructions covering diverse tasks, resulting in a 35% performance improvement over second-best Qwen-14B based submission. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2403_02247 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | Birbal: An efficient 7B instruct-model fine-tuned with curated datasets Jindal, Ashvini Kumar Rajpoot, Pawan Kumar Parikh, Ankur Computation and Language LLMOps incur significant costs due to hardware requirements, hindering their widespread accessibility. Additionally, a lack of transparency in model training methods and data contributes to the majority of models being non-reproducible. To tackle these challenges, the LLM Efficiency Challenge was introduced at NeurIPS Workshop, aiming to adapt foundation models on a diverse set of tasks via fine-tuning on a single GPU (RTX 4090 or A100 with 40GB) within a 24-hour timeframe. In this system description paper, we introduce Birbal, our Mistral-7B based winning model, fine-tuned on a single RTX 4090 for 16 hours. Birbal's success lies in curating high-quality instructions covering diverse tasks, resulting in a 35% performance improvement over second-best Qwen-14B based submission. |
| title | Birbal: An efficient 7B instruct-model fine-tuned with curated datasets |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2403.02247 |