Saved in:
Bibliographic Details
Main Authors: Khey, Hiba, Lakhder, Amine, Rouichi, Salma, Ghabi, Imane El, Hejjaoui, Kamal, En-nahli, Younes, Kalloubi, Fahd, Amri, Moez
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.08897
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911110790119424
author Khey, Hiba
Lakhder, Amine
Rouichi, Salma
Ghabi, Imane El
Hejjaoui, Kamal
En-nahli, Younes
Kalloubi, Fahd
Amri, Moez
author_facet Khey, Hiba
Lakhder, Amine
Rouichi, Salma
Ghabi, Imane El
Hejjaoui, Kamal
En-nahli, Younes
Kalloubi, Fahd
Amri, Moez
contents The rapid advancement of transformer-based language models has catalyzed breakthroughs in biomedical and clinical natural language processing; however, plant science remains markedly underserved by such domain-adapted tools. In this work, we present PlantDeBERTa, a high-performance, open-source language model specifically tailored for extracting structured knowledge from plant stress-response literature. Built upon the DeBERTa architecture-known for its disentangled attention and robust contextual encoding-PlantDeBERTa is fine-tuned on a meticulously curated corpus of expert-annotated abstracts, with a primary focus on lentil (Lens culinaris) responses to diverse abiotic and biotic stressors. Our methodology combines transformer-based modeling with rule-enhanced linguistic post-processing and ontology-grounded entity normalization, enabling PlantDeBERTa to capture biologically meaningful relationships with precision and semantic fidelity. The underlying corpus is annotated using a hierarchical schema aligned with the Crop Ontology, encompassing molecular, physiological, biochemical, and agronomic dimensions of plant adaptation. PlantDeBERTa exhibits strong generalization capabilities across entity types and demonstrates the feasibility of robust domain adaptation in low-resource scientific fields.By providing a scalable and reproducible framework for high-resolution entity recognition, PlantDeBERTa bridges a critical gap in agricultural NLP and paves the way for intelligent, data-driven systems in plant genomics, phenomics, and agronomic knowledge discovery. Our model is publicly released to promote transparency and accelerate cross-disciplinary innovation in computational plant science.
format Preprint
id arxiv_https___arxiv_org_abs_2506_08897
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle PlantDeBERTa: An Open Source Language Model for Plant Science
Khey, Hiba
Lakhder, Amine
Rouichi, Salma
Ghabi, Imane El
Hejjaoui, Kamal
En-nahli, Younes
Kalloubi, Fahd
Amri, Moez
Computation and Language
Artificial Intelligence
The rapid advancement of transformer-based language models has catalyzed breakthroughs in biomedical and clinical natural language processing; however, plant science remains markedly underserved by such domain-adapted tools. In this work, we present PlantDeBERTa, a high-performance, open-source language model specifically tailored for extracting structured knowledge from plant stress-response literature. Built upon the DeBERTa architecture-known for its disentangled attention and robust contextual encoding-PlantDeBERTa is fine-tuned on a meticulously curated corpus of expert-annotated abstracts, with a primary focus on lentil (Lens culinaris) responses to diverse abiotic and biotic stressors. Our methodology combines transformer-based modeling with rule-enhanced linguistic post-processing and ontology-grounded entity normalization, enabling PlantDeBERTa to capture biologically meaningful relationships with precision and semantic fidelity. The underlying corpus is annotated using a hierarchical schema aligned with the Crop Ontology, encompassing molecular, physiological, biochemical, and agronomic dimensions of plant adaptation. PlantDeBERTa exhibits strong generalization capabilities across entity types and demonstrates the feasibility of robust domain adaptation in low-resource scientific fields.By providing a scalable and reproducible framework for high-resolution entity recognition, PlantDeBERTa bridges a critical gap in agricultural NLP and paves the way for intelligent, data-driven systems in plant genomics, phenomics, and agronomic knowledge discovery. Our model is publicly released to promote transparency and accelerate cross-disciplinary innovation in computational plant science.
title PlantDeBERTa: An Open Source Language Model for Plant Science
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2506.08897