:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ovadia, Oded, Brief, Meni, Lemberg, Rachel, Sheetrit, Eitam
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.05571
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance
by: Brief, Meni, et al.
Published: (2024)

SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
by: Yoash, Noga Ben, et al.
Published: (2025)

Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
by: Ovadia, Oded, et al.
Published: (2023)

ReMatch: Retrieval Enhanced Schema Matching with LLMs
by: Sheetrit, Eitam, et al.
Published: (2024)

Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models
by: Ma, Shengjie, et al.
Published: (2025)

CamemBERT-bio: Leveraging Continual Pre-training for Cost-Effective Models on French Biomedical Data
by: Touchent, Rian, et al.
Published: (2023)

Pre-training Limited Memory Language Models with Internal and External Knowledge
by: Zhao, Linxi, et al.
Published: (2025)

IKnow: Instruction-Knowledge-Aware Continual Pretraining for Effective Domain Adaptation
by: Zhang, Tianyi, et al.
Published: (2025)

Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
by: Cui, Wanyun, et al.
Published: (2023)

Topic Over Source: The Key to Effective Data Mixing for Language Models Pre-training
by: Peng, Jiahui, et al.
Published: (2025)

LuxInstruct: A Cross-Lingual Instruction Tuning Dataset For Luxembourgish
by: Philippy, Fred, et al.
Published: (2025)

Infinity Instruct: Scaling Instruction Selection and Synthesis to Enhance Language Models
by: Li, Jijie, et al.
Published: (2025)

Learning to Instruct for Visual Instruction Tuning
by: Zhou, Zhihan, et al.
Published: (2025)

LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models
by: Ren, Huimin, et al.
Published: (2025)

InstructEdit: Instruction-based Knowledge Editing for Large Language Models
by: Zhang, Ningyu, et al.
Published: (2024)

Instruct-Imagen: Image Generation with Multi-modal Instruction
by: Hu, Hexiang, et al.
Published: (2024)

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
by: Jia, Yiming, et al.
Published: (2025)

Non-instructional Fine-tuning: Enabling Instruction-Following Capabilities in Pre-trained Language Models without Instruction-Following Data
by: Xie, Juncheng, et al.
Published: (2024)

Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
by: Hu, Zichao, et al.
Published: (2024)

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training
by: Yang, Kailai, et al.
Published: (2025)

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
by: Toshniwal, Shubham, et al.
Published: (2024)

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
by: Liu, Wanlong, et al.
Published: (2024)

Probing Language Models for Pre-training Data Detection
by: Liu, Zhenhua, et al.
Published: (2024)

Scaling Towards the Information Boundary of Instruction Sets: The Infinity Instruct Subject Technical Report
by: Du, Li, et al.
Published: (2025)

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing
by: Tran, Hieu, et al.
Published: (2023)

DataMan: Data Manager for Pre-training Large Language Models
by: Peng, Ru, et al.
Published: (2025)

Timber: Training-free Instruct Model Refining with Base via Effective Rank
by: Wu, Taiqiang, et al.
Published: (2025)

Efficient Pre-training for Localized Instruction Generation of Videos
by: Batra, Anil, et al.
Published: (2023)

Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training
by: Qin, Yiwei, et al.
Published: (2026)

Safer-Instruct: Aligning Language Models with Automated Preference Data
by: Shi, Taiwei, et al.
Published: (2023)

Understanding Data Temporality Impact on Large Language Models Pre-training
by: Pilchen, Hippolyte, et al.
Published: (2026)

RegMix: Data Mixture as Regression for Language Model Pre-training
by: Liu, Qian, et al.
Published: (2024)

MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science
by: Kim, Junho, et al.
Published: (2024)

Making Pre-trained Language Models Better Continual Few-Shot Relation Extractors
by: Ma, Shengkun, et al.
Published: (2024)

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
by: Ou, Yixin, et al.
Published: (2024)

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
by: Siriwardhana, Shamane, et al.
Published: (2024)

InstructAV: Instruction Fine-tuning Large Language Models for Authorship Verification
by: Hu, Yujia, et al.
Published: (2024)

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct
by: Wu, Yutong, et al.
Published: (2024)

Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks
by: Zhang, Yichi, et al.
Published: (2024)

Pre-training LLMs using human-like development data corpus
by: Bhardwaj, Khushi, et al.
Published: (2023)