:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Subramanyam, Anirudh, Chen, Yuxin, Grossman, Robert L.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2510.03313
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Pretraining Scaling Laws for Generative Evaluations of Language Models
by: Schaeffer, Rylan, et al.
Published: (2025)

Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)

Revisiting Multilingual Data Mixtures in Language Model Pretraining
by: Foroutan, Negar, et al.
Published: (2025)

The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025)

BiMix: A Bivariate Data Mixing Law for Language Model Pretraining
by: Ge, Ce, et al.
Published: (2024)

The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces
by: Sahoo, Subramanyam
Published: (2025)

Scaling Laws for Mixture Pretraining Under Data Constraints
by: Sedova, Anastasiia, et al.
Published: (2026)

Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
by: Bethune, Louis, et al.
Published: (2025)

Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
by: Li, Houyi, et al.
Published: (2025)

Narrowing the Focus: Learned Optimizers for Pretrained Models
by: Kristiansen, Gus, et al.
Published: (2024)

Parallel Scaling Law for Language Models
by: Chen, Mouxiang, et al.
Published: (2025)

The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining
by: Ponnock, Jesse
Published: (2025)

Scaling Laws for Multilingual Language Models
by: He, Yifei, et al.
Published: (2024)

LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
by: Song, Steven, et al.
Published: (2024)

Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
by: Ali, Mehdi, et al.
Published: (2025)

Multimodal Cancer Modeling in the Age of Foundation Model Embeddings
by: Song, Steven, et al.
Published: (2025)

GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons
by: Song, Steven, et al.
Published: (2025)

Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)

Parcae: Scaling Laws For Stable Looped Language Models
by: Prairie, Hayden, et al.
Published: (2026)

Scaling Laws for Differentially Private Language Models
by: McKenna, Ryan, et al.
Published: (2025)

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
by: Chen, Zhixun, et al.
Published: (2025)

Data Mixing for Large Language Models Pretraining: A Survey and Outlook
by: Chen, Zhuo, et al.
Published: (2026)

Keypoint Aware Masked Image Modelling
by: Krishna, Madhava, et al.
Published: (2024)

To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining
by: Singh, Karan, et al.
Published: (2026)

ProtoSSL: Interpretable Prototype Learning from Unlabeled Time-Series Data
by: Song, Steven, et al.
Published: (2026)

Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
by: Portes, Jacob, et al.
Published: (2025)

Scaling Laws for Upcycling Mixture-of-Experts Language Models
by: Liew, Seng Pei, et al.
Published: (2025)

Scaling Laws for Discriminative Classification in Large Language Models
by: Wyatte, Dean, et al.
Published: (2024)

Can Language Models Discover Scaling Laws?
by: Lin, Haowei, et al.
Published: (2025)

QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models
by: Liang, Jessica, et al.
Published: (2025)

Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification
by: Pecher, Branislav, et al.
Published: (2026)

Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
by: Sam, Dylan, et al.
Published: (2025)

Procedural Pretraining: Warming Up Language Models with Abstract Data
by: Jiang, Liangze, et al.
Published: (2026)

ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
by: Longpre, Shayne, et al.
Published: (2025)

Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)

Scaling Laws of Global Weather Models
by: Yu, Yuejiang, et al.
Published: (2026)

Scaling Laws for Post Training Quantized Large Language Models
by: Xu, Zifei, et al.
Published: (2024)

Scaling Law for Language Models Training Considering Batch Size
by: Shuai, Xian, et al.
Published: (2024)

Scaling Laws for Downstream Task Performance of Large Language Models
by: Isik, Berivan, et al.
Published: (2024)

The Horcrux: Mechanistically Interpretable Task Decomposition for Detecting and Mitigating Reward Hacking in Embodied AI Systems
by: Sahoo, Subramanyam, et al.
Published: (2025)