Saved in:
| Main Authors: | Niklaus, Joel, Yamaguchi, Atsuki, Štefánik, Michal, Penedo, Guilherme, Kydlíček, Hynek, Bakouch, Elie, Tunstall, Lewis, Beeching, Edward Emanuel, Frere, Thibaud, Raffel, Colin, von Werra, Leandro, Wolf, Thomas |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.13977 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
by: Penedo, Guilherme, et al.
Published: (2024)
by: Penedo, Guilherme, et al.
Published: (2024)
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
by: Penedo, Guilherme, et al.
Published: (2025)
by: Penedo, Guilherme, et al.
Published: (2025)
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
by: Allal, Loubna Ben, et al.
Published: (2025)
by: Allal, Loubna Ben, et al.
Published: (2025)
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024)
by: Hägele, Alexander, et al.
Published: (2024)
FineVision: Open Data Is All You Need
by: Wiedmann, Luis, et al.
Published: (2025)
by: Wiedmann, Luis, et al.
Published: (2025)
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
by: Qu, Yuxiao, et al.
Published: (2025)
by: Qu, Yuxiao, et al.
Published: (2025)
SmolVLM: Redefining small and efficient multimodal models
by: Marafioti, Andrés, et al.
Published: (2025)
by: Marafioti, Andrés, et al.
Published: (2025)
Position: The Most Expensive Part of an LLM should be its Training Data
by: Kandpal, Nikhil, et al.
Published: (2025)
by: Kandpal, Nikhil, et al.
Published: (2025)
The Sustainability Gap in Robotics: A Large-Scale Survey of Sustainability Awareness in 50,000 Research Articles
by: Skuric, Antun, et al.
Published: (2026)
by: Skuric, Antun, et al.
Published: (2026)
Efficiently Estimating Data Efficiency for Language Model Fine-tuning
by: Je, Gyung Hyun, et al.
Published: (2025)
by: Je, Gyung Hyun, et al.
Published: (2025)
How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?
by: Yamaguchi, Atsuki, et al.
Published: (2024)
by: Yamaguchi, Atsuki, et al.
Published: (2024)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
by: Patel, Ajay, et al.
Published: (2024)
by: Patel, Ajay, et al.
Published: (2024)
DABstep: Data Agent Benchmark for Multi-step Reasoning
by: Egg, Alex, et al.
Published: (2025)
by: Egg, Alex, et al.
Published: (2025)
QED-Nano: Teaching a Tiny Model to Prove Hard Theorems
by: LM-Provers, et al.
Published: (2026)
by: LM-Provers, et al.
Published: (2026)
Adapting Chat Language Models Using Only Target Unlabeled Language Data
by: Yamaguchi, Atsuki, et al.
Published: (2024)
by: Yamaguchi, Atsuki, et al.
Published: (2024)
Uncovering Model Processing Strategies with Non-Negative Per-Example Fisher Factorization
by: Matena, Michael, et al.
Published: (2023)
by: Matena, Michael, et al.
Published: (2023)
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
by: Deng, Haikang, et al.
Published: (2023)
by: Deng, Haikang, et al.
Published: (2023)
Enhancing Training Data Attribution with Representational Optimization
by: Sun, Weiwei, et al.
Published: (2025)
by: Sun, Weiwei, et al.
Published: (2025)
Concept-aware Data Construction Improves In-context Learning of Language Models
by: Štefánik, Michal, et al.
Published: (2024)
by: Štefánik, Michal, et al.
Published: (2024)
An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model Inference
by: Yamaguchi, Atsuki, et al.
Published: (2024)
by: Yamaguchi, Atsuki, et al.
Published: (2024)
Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks
by: Yamaguchi, Atsuki, et al.
Published: (2026)
by: Yamaguchi, Atsuki, et al.
Published: (2026)
Scaling Data-Constrained Language Models
by: Muennighoff, Niklas, et al.
Published: (2023)
by: Muennighoff, Niklas, et al.
Published: (2023)
Verulamium Excavations
by: Frere, Sheppard
Published: (2020)
by: Frere, Sheppard
Published: (2020)
Verulamium Excavations. Volume II
by: Shephard, Frere
Published: (2021)
by: Shephard, Frere
Published: (2021)
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
by: Liu, Fengyuan, et al.
Published: (2024)
by: Liu, Fengyuan, et al.
Published: (2024)
Merging by Matching Models in Task Parameter Subspaces
by: Tam, Derek, et al.
Published: (2023)
by: Tam, Derek, et al.
Published: (2023)
Soft Merging of Experts with Adaptive Routing
by: Muqeeth, Mohammed, et al.
Published: (2023)
by: Muqeeth, Mohammed, et al.
Published: (2023)
tt386/tuning-selection: Code release for "Tuning spatial distributions of selection pressure to suppress emergence of resistance"
by: Thomas Tunstall
Published: (2026)
by: Thomas Tunstall
Published: (2026)
How Social Network Structure Impacts the Ability of Zealots to Promote Weak Opinions
by: Tunstall, Thomas
Published: (2024)
by: Tunstall, Thomas
Published: (2024)
Model Merging via Data-Free Covariance Estimation
by: Hameed, Marawan Gamal Abdel, et al.
Published: (2026)
by: Hameed, Marawan Gamal Abdel, et al.
Published: (2026)
Obsidian : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Obsidian : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Obsidian : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Świeciechów flint : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Obsidian : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Świeciechów flint : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Obsidian : 3D documentation
by: Werra, Dagmara H.
Published: (2026)
by: Werra, Dagmara H.
Published: (2026)
Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates
by: Yamaguchi, Atsuki, et al.
Published: (2025)
by: Yamaguchi, Atsuki, et al.
Published: (2025)
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus
by: Morishita, Terufumi, et al.
Published: (2024)
by: Morishita, Terufumi, et al.
Published: (2024)
Circulación de objetos, personas y saberes técnicos en el humedal del río Salado Bonaerense, Argentina
by: María Magdalena Frere
Published: (2022)
by: María Magdalena Frere
Published: (2022)
Similar Items
-
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
by: Penedo, Guilherme, et al.
Published: (2024) -
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
by: Penedo, Guilherme, et al.
Published: (2025) -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
by: Allal, Loubna Ben, et al.
Published: (2025) -
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
by: Hägele, Alexander, et al.
Published: (2024) -
FineVision: Open Data Is All You Need
by: Wiedmann, Luis, et al.
Published: (2025)