Saved in:
Bibliographic Details
Main Authors: Xie, Tong, Zhang, Hanzhi, Wang, Shaozhou, Wan, Yuwei, Razzak, Imran, Kit, Chunyu, Zhang, Wenjie, Hoex, Bram
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2411.12000
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Natural Language Processing (NLP) is widely used to supply summarization ability from long context to structured information. However, extracting structured knowledge from scientific text by NLP models remains a challenge because of its domain-specific nature to complex data preprocessing and the granularity of multi-layered device-level information. To address this, we introduce ByteScience, a non-profit cloud-based auto fine-tuned Large Language Model (LLM) platform, which is designed to extract structured scientific data and synthesize new scientific knowledge from vast scientific corpora. The platform capitalizes on DARWIN, an open-source, fine-tuned LLM dedicated to natural science. The platform was built on Amazon Web Services (AWS) and provides an automated, user-friendly workflow for custom model development and data extraction. The platform achieves remarkable accuracy with only a small amount of well-annotated articles. This innovative tool streamlines the transition from the science literature to structured knowledge and data and benefits the advancements in natural informatics.