Saved in:
| Main Authors: | Subramanyam, Anirudh, Chen, Yuxin, Grossman, Robert L. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.03313 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pretraining Scaling Laws for Generative Evaluations of Language Models
by: Schaeffer, Rylan, et al.
Published: (2025)
by: Schaeffer, Rylan, et al.
Published: (2025)
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
Revisiting Multilingual Data Mixtures in Language Model Pretraining
by: Foroutan, Negar, et al.
Published: (2025)
by: Foroutan, Negar, et al.
Published: (2025)
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025)
by: Sahoo, Subramanyam
Published: (2025)
BiMix: A Bivariate Data Mixing Law for Language Model Pretraining
by: Ge, Ce, et al.
Published: (2024)
by: Ge, Ce, et al.
Published: (2024)
The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces
by: Sahoo, Subramanyam
Published: (2025)
by: Sahoo, Subramanyam
Published: (2025)
Scaling Laws for Mixture Pretraining Under Data Constraints
by: Sedova, Anastasiia, et al.
Published: (2026)
by: Sedova, Anastasiia, et al.
Published: (2026)
Scaling Laws for Forgetting during Finetuning with Pretraining Data Injection
by: Bethune, Louis, et al.
Published: (2025)
by: Bethune, Louis, et al.
Published: (2025)
Predictable Scale: Part I, Step Law -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
by: Li, Houyi, et al.
Published: (2025)
by: Li, Houyi, et al.
Published: (2025)
Narrowing the Focus: Learned Optimizers for Pretrained Models
by: Kristiansen, Gus, et al.
Published: (2024)
by: Kristiansen, Gus, et al.
Published: (2024)
Parallel Scaling Law for Language Models
by: Chen, Mouxiang, et al.
Published: (2025)
by: Chen, Mouxiang, et al.
Published: (2025)
The Data Efficiency Frontier of Financial Foundation Models: Scaling Laws from Continued Pretraining
by: Ponnock, Jesse
Published: (2025)
by: Ponnock, Jesse
Published: (2025)
Scaling Laws for Multilingual Language Models
by: He, Yifei, et al.
Published: (2024)
by: He, Yifei, et al.
Published: (2024)
LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation
by: Song, Steven, et al.
Published: (2024)
by: Song, Steven, et al.
Published: (2024)
Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
by: Ali, Mehdi, et al.
Published: (2025)
by: Ali, Mehdi, et al.
Published: (2025)
Multimodal Cancer Modeling in the Age of Foundation Model Embeddings
by: Song, Steven, et al.
Published: (2025)
by: Song, Steven, et al.
Published: (2025)
GDC Cohort Copilot: An AI Copilot for Curating Cohorts from the Genomic Data Commons
by: Song, Steven, et al.
Published: (2025)
by: Song, Steven, et al.
Published: (2025)
Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)
by: Chen, Zhengyu, et al.
Published: (2025)
Parcae: Scaling Laws For Stable Looped Language Models
by: Prairie, Hayden, et al.
Published: (2026)
by: Prairie, Hayden, et al.
Published: (2026)
Scaling Laws for Differentially Private Language Models
by: McKenna, Ryan, et al.
Published: (2025)
by: McKenna, Ryan, et al.
Published: (2025)
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
by: Chen, Zhixun, et al.
Published: (2025)
by: Chen, Zhixun, et al.
Published: (2025)
Data Mixing for Large Language Models Pretraining: A Survey and Outlook
by: Chen, Zhuo, et al.
Published: (2026)
by: Chen, Zhuo, et al.
Published: (2026)
Keypoint Aware Masked Image Modelling
by: Krishna, Madhava, et al.
Published: (2024)
by: Krishna, Madhava, et al.
Published: (2024)
To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining
by: Singh, Karan, et al.
Published: (2026)
by: Singh, Karan, et al.
Published: (2026)
ProtoSSL: Interpretable Prototype Learning from Unlabeled Time-Series Data
by: Song, Steven, et al.
Published: (2026)
by: Song, Steven, et al.
Published: (2026)
Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
by: Portes, Jacob, et al.
Published: (2025)
by: Portes, Jacob, et al.
Published: (2025)
Scaling Laws for Upcycling Mixture-of-Experts Language Models
by: Liew, Seng Pei, et al.
Published: (2025)
by: Liew, Seng Pei, et al.
Published: (2025)
Scaling Laws for Discriminative Classification in Large Language Models
by: Wyatte, Dean, et al.
Published: (2024)
by: Wyatte, Dean, et al.
Published: (2024)
Can Language Models Discover Scaling Laws?
by: Lin, Haowei, et al.
Published: (2025)
by: Lin, Haowei, et al.
Published: (2025)
QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models
by: Liang, Jessica, et al.
Published: (2025)
by: Liang, Jessica, et al.
Published: (2025)
Revisiting Prompt Sensitivity in Large Language Models for Text Classification: The Role of Prompt Underspecification
by: Pecher, Branislav, et al.
Published: (2026)
by: Pecher, Branislav, et al.
Published: (2026)
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining
by: Sam, Dylan, et al.
Published: (2025)
by: Sam, Dylan, et al.
Published: (2025)
Procedural Pretraining: Warming Up Language Models with Abstract Data
by: Jiang, Liangze, et al.
Published: (2026)
by: Jiang, Liangze, et al.
Published: (2026)
ATLAS: Adaptive Transfer Scaling Laws for Multilingual Pretraining, Finetuning, and Decoding the Curse of Multilinguality
by: Longpre, Shayne, et al.
Published: (2025)
by: Longpre, Shayne, et al.
Published: (2025)
Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)
by: Sahoo, Subramanyam
Published: (2026)
Scaling Laws of Global Weather Models
by: Yu, Yuejiang, et al.
Published: (2026)
by: Yu, Yuejiang, et al.
Published: (2026)
Scaling Laws for Post Training Quantized Large Language Models
by: Xu, Zifei, et al.
Published: (2024)
by: Xu, Zifei, et al.
Published: (2024)
Scaling Law for Language Models Training Considering Batch Size
by: Shuai, Xian, et al.
Published: (2024)
by: Shuai, Xian, et al.
Published: (2024)
Scaling Laws for Downstream Task Performance of Large Language Models
by: Isik, Berivan, et al.
Published: (2024)
by: Isik, Berivan, et al.
Published: (2024)
The Horcrux: Mechanistically Interpretable Task Decomposition for Detecting and Mitigating Reward Hacking in Embodied AI Systems
by: Sahoo, Subramanyam, et al.
Published: (2025)
by: Sahoo, Subramanyam, et al.
Published: (2025)
Similar Items
-
Pretraining Scaling Laws for Generative Evaluations of Language Models
by: Schaeffer, Rylan, et al.
Published: (2025) -
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023) -
Revisiting Multilingual Data Mixtures in Language Model Pretraining
by: Foroutan, Negar, et al.
Published: (2025) -
The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training
by: Sahoo, Subramanyam
Published: (2025) -
BiMix: A Bivariate Data Mixing Law for Language Model Pretraining
by: Ge, Ce, et al.
Published: (2024)