Saved in:
Bibliographic Details
Main Authors: Chen, Jingxue, Tang, Qingkun, Lu, Qianchun, Fang, Siyuan
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2505.12043
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908371014123520
author Chen, Jingxue
Tang, Qingkun
Lu, Qianchun
Fang, Siyuan
author_facet Chen, Jingxue
Tang, Qingkun
Lu, Qianchun
Fang, Siyuan
contents Although large language models (LLMs) perform well in general tasks, domain-specific applications suffer from hallucinations and accuracy limitations. Continual Pre-Training (CPT) approaches encounter two key issues: (1) domain-biased data degrades general language skills, and (2) improper corpus-mixture ratios limit effective adaptation. To address these, we propose a novel framework, Mixture of Losses (MoL), which decouples optimization objectives for domain-specific and general corpora. Specifically, cross-entropy (CE) loss is applied to domain-corpus to ensure knowledge acquisition, while Kullback-Leibler (KL) divergence aligns general-corpus training with the base model's foundational capabilities. This dual-loss architecture preserves universal skills while enhancing domain expertise, avoiding catastrophic forgetting. Empirically, we validate that a 1:1 domain-to-general corpus ratio optimally balances training and overfitting without the need for extensive tuning or resource-intensive experiments. Furthermore, our experiments demonstrate significant performance gains compared to traditional CPT approaches, which often suffer from degradation in general language capabilities; our model achieves 27.9% higher accuracy on the Math-500 benchmark in the non-think reasoning mode, and an impressive 83.3% improvement on the challenging AIME25 subset in the think mode, underscoring the effectiveness of our approach.
format Preprint
id arxiv_https___arxiv_org_abs_2505_12043
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
Chen, Jingxue
Tang, Qingkun
Lu, Qianchun
Fang, Siyuan
Computation and Language
Although large language models (LLMs) perform well in general tasks, domain-specific applications suffer from hallucinations and accuracy limitations. Continual Pre-Training (CPT) approaches encounter two key issues: (1) domain-biased data degrades general language skills, and (2) improper corpus-mixture ratios limit effective adaptation. To address these, we propose a novel framework, Mixture of Losses (MoL), which decouples optimization objectives for domain-specific and general corpora. Specifically, cross-entropy (CE) loss is applied to domain-corpus to ensure knowledge acquisition, while Kullback-Leibler (KL) divergence aligns general-corpus training with the base model's foundational capabilities. This dual-loss architecture preserves universal skills while enhancing domain expertise, avoiding catastrophic forgetting. Empirically, we validate that a 1:1 domain-to-general corpus ratio optimally balances training and overfitting without the need for extensive tuning or resource-intensive experiments. Furthermore, our experiments demonstrate significant performance gains compared to traditional CPT approaches, which often suffer from degradation in general language capabilities; our model achieves 27.9% higher accuracy on the Math-500 benchmark in the non-think reasoning mode, and an impressive 83.3% improvement on the challenging AIME25 subset in the think mode, underscoring the effectiveness of our approach.
title MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
topic Computation and Language
url https://arxiv.org/abs/2505.12043