Saved in:
Bibliographic Details
Main Authors: Sepulveda, Edison Jair Bejarano, Hector, Nicolai Potes, Montoya, Santiago Pineda, Rodriguez, Felipe Ivan, Orduy, Jaime Enrique, Cabezas, Alec Rosales, Navarrete, Danny Traslaviña, Farfan, Sergio Madrid
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.08792
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866916246190030848
author Sepulveda, Edison Jair Bejarano
Hector, Nicolai Potes
Montoya, Santiago Pineda
Rodriguez, Felipe Ivan
Orduy, Jaime Enrique
Cabezas, Alec Rosales
Navarrete, Danny Traslaviña
Farfan, Sergio Madrid
author_facet Sepulveda, Edison Jair Bejarano
Hector, Nicolai Potes
Montoya, Santiago Pineda
Rodriguez, Felipe Ivan
Orduy, Jaime Enrique
Cabezas, Alec Rosales
Navarrete, Danny Traslaviña
Farfan, Sergio Madrid
contents This paper explores the potential of large language models (LLMs) to make the Aeronautical Regulations of Colombia (RAC) more accessible. Given the complexity and extensive technicality of the RAC, this study introduces a novel approach to simplifying these regulations for broader understanding. By developing the first-ever RAC database, which contains 24,478 expertly labeled question-and-answer pairs, and fine-tuning LLMs specifically for RAC applications, the paper outlines the methodology for dataset assembly, expert-led annotation, and model training. Utilizing the Gemma1.1 2b model along with advanced techniques like Unsloth for efficient VRAM usage and flash attention mechanisms, the research aims to expedite training processes. This initiative establishes a foundation to enhance the comprehensibility and accessibility of RAC, potentially benefiting novices and reducing dependence on expert consultations for navigating the aviation industry's regulatory landscape. You can visit the dataset (https://huggingface.co/somosnlp/gemma-1.1-2b-it_ColombiaRAC_FullyCurated_format_chatML_V1) and the model (https://huggingface.co/datasets/somosnlp/ColombiaRAC_FullyCurated) here.
format Preprint
id arxiv_https___arxiv_org_abs_2405_08792
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs
Sepulveda, Edison Jair Bejarano
Hector, Nicolai Potes
Montoya, Santiago Pineda
Rodriguez, Felipe Ivan
Orduy, Jaime Enrique
Cabezas, Alec Rosales
Navarrete, Danny Traslaviña
Farfan, Sergio Madrid
Machine Learning
Artificial Intelligence
This paper explores the potential of large language models (LLMs) to make the Aeronautical Regulations of Colombia (RAC) more accessible. Given the complexity and extensive technicality of the RAC, this study introduces a novel approach to simplifying these regulations for broader understanding. By developing the first-ever RAC database, which contains 24,478 expertly labeled question-and-answer pairs, and fine-tuning LLMs specifically for RAC applications, the paper outlines the methodology for dataset assembly, expert-led annotation, and model training. Utilizing the Gemma1.1 2b model along with advanced techniques like Unsloth for efficient VRAM usage and flash attention mechanisms, the research aims to expedite training processes. This initiative establishes a foundation to enhance the comprehensibility and accessibility of RAC, potentially benefiting novices and reducing dependence on expert consultations for navigating the aviation industry's regulatory landscape. You can visit the dataset (https://huggingface.co/somosnlp/gemma-1.1-2b-it_ColombiaRAC_FullyCurated_format_chatML_V1) and the model (https://huggingface.co/datasets/somosnlp/ColombiaRAC_FullyCurated) here.
title Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2405.08792