Saved in:
| Main Authors: | Ovadia, Oded, Brief, Meni, Lemberg, Rachel, Sheetrit, Eitam |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.05571 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance
by: Brief, Meni, et al.
Published: (2024)
by: Brief, Meni, et al.
Published: (2024)
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
by: Yoash, Noga Ben, et al.
Published: (2025)
by: Yoash, Noga Ben, et al.
Published: (2025)
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
by: Ovadia, Oded, et al.
Published: (2023)
by: Ovadia, Oded, et al.
Published: (2023)
ReMatch: Retrieval Enhanced Schema Matching with LLMs
by: Sheetrit, Eitam, et al.
Published: (2024)
by: Sheetrit, Eitam, et al.
Published: (2024)
Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models
by: Ma, Shengjie, et al.
Published: (2025)
by: Ma, Shengjie, et al.
Published: (2025)
CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data
by: Touchent, Rian, et al.
Published: (2023)
by: Touchent, Rian, et al.
Published: (2023)
Pre-training Limited Memory Language Models with Internal and External Knowledge
by: Zhao, Linxi, et al.
Published: (2025)
by: Zhao, Linxi, et al.
Published: (2025)
IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation
by: Zhang, Tianyi, et al.
Published: (2025)
by: Zhang, Tianyi, et al.
Published: (2025)
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
by: Cui, Wanyun, et al.
Published: (2023)
by: Cui, Wanyun, et al.
Published: (2023)
Topic Over Source: The Key to Effective Data Mixing for Language Models Pre-training
by: Peng, Jiahui, et al.
Published: (2025)
by: Peng, Jiahui, et al.
Published: (2025)
LuxInstruct: A Cross-Lingual Instruction Tuning Dataset For Luxembourgish
by: Philippy, Fred, et al.
Published: (2025)
by: Philippy, Fred, et al.
Published: (2025)
Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models
by: Li, Jijie, et al.
Published: (2025)
by: Li, Jijie, et al.
Published: (2025)
Learning to Instruct for Visual Instruction Tuning
by: Zhou, Zhihan, et al.
Published: (2025)
by: Zhou, Zhihan, et al.
Published: (2025)
LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
by: Ren, Huimin, et al.
Published: (2025)
by: Ren, Huimin, et al.
Published: (2025)
InstructEdit: Instruction-based Knowledge Editing for Large Language Models
by: Zhang, Ningyu, et al.
Published: (2024)
by: Zhang, Ningyu, et al.
Published: (2024)
Instruct-Imagen: Image Generation with Multi-modal Instruction
by: Hu, Hexiang, et al.
Published: (2024)
by: Hu, Hexiang, et al.
Published: (2024)
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
by: Jia, Yiming, et al.
Published: (2025)
by: Jia, Yiming, et al.
Published: (2025)
Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data
by: Xie, Juncheng, et al.
Published: (2024)
by: Xie, Juncheng, et al.
Published: (2024)
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
by: Hu, Zichao, et al.
Published: (2024)
by: Hu, Zichao, et al.
Published: (2024)
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
by: Yang, Kailai, et al.
Published: (2025)
by: Yang, Kailai, et al.
Published: (2025)
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
by: Toshniwal, Shubham, et al.
Published: (2024)
by: Toshniwal, Shubham, et al.
Published: (2024)
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
by: Liu, Wanlong, et al.
Published: (2024)
by: Liu, Wanlong, et al.
Published: (2024)
Probing Language Models for Pre-training Data Detection
by: Liu, Zhenhua, et al.
Published: (2024)
by: Liu, Zhenhua, et al.
Published: (2024)
Scaling Towards the Information Boundary of Instruction Sets: The Infinity Instruct Subject Technical Report
by: Du, Li, et al.
Published: (2025)
by: Du, Li, et al.
Published: (2025)
BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing
by: Tran, Hieu, et al.
Published: (2023)
by: Tran, Hieu, et al.
Published: (2023)
DataMan: Data Manager for Pre-training Large Language Models
by: Peng, Ru, et al.
Published: (2025)
by: Peng, Ru, et al.
Published: (2025)
Timber: Training-free Instruct Model Refining with Base via Effective Rank
by: Wu, Taiqiang, et al.
Published: (2025)
by: Wu, Taiqiang, et al.
Published: (2025)
Efficient Pre-training for Localized Instruction Generation of Videos
by: Batra, Anil, et al.
Published: (2023)
by: Batra, Anil, et al.
Published: (2023)
Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training
by: Qin, Yiwei, et al.
Published: (2026)
by: Qin, Yiwei, et al.
Published: (2026)
Safer-Instruct: Aligning Language Models with Automated Preference Data
by: Shi, Taiwei, et al.
Published: (2023)
by: Shi, Taiwei, et al.
Published: (2023)
Understanding Data Temporality Impact on Large Language Models Pre-training
by: Pilchen, Hippolyte, et al.
Published: (2026)
by: Pilchen, Hippolyte, et al.
Published: (2026)
RegMix: Data Mixture as Regression for Language Model Pre-training
by: Liu, Qian, et al.
Published: (2024)
by: Liu, Qian, et al.
Published: (2024)
MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science
by: Kim, Junho, et al.
Published: (2024)
by: Kim, Junho, et al.
Published: (2024)
Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors
by: Ma, Shengkun, et al.
Published: (2024)
by: Ma, Shengkun, et al.
Published: (2024)
EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
by: Ou, Yixin, et al.
Published: (2024)
by: Ou, Yixin, et al.
Published: (2024)
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
by: Siriwardhana, Shamane, et al.
Published: (2024)
by: Siriwardhana, Shamane, et al.
Published: (2024)
InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification
by: Hu, Yujia, et al.
Published: (2024)
by: Hu, Yujia, et al.
Published: (2024)
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct
by: Wu, Yutong, et al.
Published: (2024)
by: Wu, Yutong, et al.
Published: (2024)
Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks
by: Zhang, Yichi, et al.
Published: (2024)
by: Zhang, Yichi, et al.
Published: (2024)
Pre-training LLMs using human-like development data corpus
by: Bhardwaj, Khushi, et al.
Published: (2023)
by: Bhardwaj, Khushi, et al.
Published: (2023)
Similar Items
-
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance
by: Brief, Meni, et al.
Published: (2024) -
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
by: Yoash, Noga Ben, et al.
Published: (2025) -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
by: Ovadia, Oded, et al.
Published: (2023) -
ReMatch: Retrieval Enhanced Schema Matching with LLMs
by: Sheetrit, Eitam, et al.
Published: (2024) -
Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models
by: Ma, Shengjie, et al.
Published: (2025)