Saved in:
| Main Author: | Rao, Praveen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.02632 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
by: Sajith, Aryan, et al.
Published: (2024)
by: Sajith, Aryan, et al.
Published: (2024)
Adjusting Pretrained Backbones for Performativity
by: Demirel, Berker, et al.
Published: (2024)
by: Demirel, Berker, et al.
Published: (2024)
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
by: Li, Qi, et al.
Published: (2025)
by: Li, Qi, et al.
Published: (2025)
Pretraining a Foundation Model for Small-Molecule Natural Products
by: Ding, Yuheng, et al.
Published: (2025)
by: Ding, Yuheng, et al.
Published: (2025)
LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale Optimization
by: Zhang, Zishi, et al.
Published: (2026)
by: Zhang, Zishi, et al.
Published: (2026)
Finetune-Informed Pretraining Boosts Downstream Performance
by: Faysal, Atik, et al.
Published: (2026)
by: Faysal, Atik, et al.
Published: (2026)
Small Vocabularies, Big Gains: Pretraining and Tokenization in Time Series Models
by: Roger, Alexis, et al.
Published: (2025)
by: Roger, Alexis, et al.
Published: (2025)
Learning Transferable Sensor Models via Language-Informed Pretraining
by: Chen, Yuliang, et al.
Published: (2026)
by: Chen, Yuliang, et al.
Published: (2026)
How Does Controllability Emerge In Language Models During Pretraining?
by: She, Jianshu, et al.
Published: (2025)
by: She, Jianshu, et al.
Published: (2025)
Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining
by: Huang, Ruizhe, et al.
Published: (2025)
by: Huang, Ruizhe, et al.
Published: (2025)
On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
by: Kim, Sunghwan, et al.
Published: (2026)
by: Kim, Sunghwan, et al.
Published: (2026)
Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models
by: Tian, Bowei, et al.
Published: (2025)
by: Tian, Bowei, et al.
Published: (2025)
A Step Toward Federated Pretraining of Multimodal Large Language Models
by: Xiong, Baochen, et al.
Published: (2026)
by: Xiong, Baochen, et al.
Published: (2026)
Modeling and Performance Analysis for Semantic Communications Based on Empirical Results
by: Ma, Shuai, et al.
Published: (2025)
by: Ma, Shuai, et al.
Published: (2025)
Pretraining Large Language Models with NVFP4
by: NVIDIA, et al.
Published: (2025)
by: NVIDIA, et al.
Published: (2025)
Patent Language Model Pretraining with ModernBERT
by: Yousefiramandi, Amirhossein, et al.
Published: (2025)
by: Yousefiramandi, Amirhossein, et al.
Published: (2025)
What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation
by: Gong, Zhuocheng, et al.
Published: (2024)
by: Gong, Zhuocheng, et al.
Published: (2024)
Enhancing Microgrid Performance Prediction with Attention-based Deep Learning Models
by: Maddineni, Vinod Kumar, et al.
Published: (2024)
by: Maddineni, Vinod Kumar, et al.
Published: (2024)
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
by: Sow, Daouda, et al.
Published: (2025)
by: Sow, Daouda, et al.
Published: (2025)
Predicting LLM Reasoning Performance with Small Proxy Model
by: Koh, Woosung, et al.
Published: (2025)
by: Koh, Woosung, et al.
Published: (2025)
In-context Pretraining: Language Modeling Beyond Document Boundaries
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
Revisiting Multilingual Data Mixtures in Language Model Pretraining
by: Foroutan, Negar, et al.
Published: (2025)
by: Foroutan, Negar, et al.
Published: (2025)
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
by: Bayazit, Deniz, et al.
Published: (2023)
by: Bayazit, Deniz, et al.
Published: (2023)
Deep Ensembles Secretly Perform Empirical Bayes
by: Loaiza-Ganem, Gabriel, et al.
Published: (2025)
by: Loaiza-Ganem, Gabriel, et al.
Published: (2025)
Small Language Models for Application Interactions: A Case Study
by: Li, Beibin, et al.
Published: (2024)
by: Li, Beibin, et al.
Published: (2024)
Learnware of Language Models: Specialized Small Language Models Can Do Big
by: Tan, Zhi-Hao, et al.
Published: (2025)
by: Tan, Zhi-Hao, et al.
Published: (2025)
Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments
by: Alhonainy, Ahmad, et al.
Published: (2025)
by: Alhonainy, Ahmad, et al.
Published: (2025)
Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining
by: Zhou, Xiaofan, et al.
Published: (2025)
by: Zhou, Xiaofan, et al.
Published: (2025)
Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
by: Portes, Jacob, et al.
Published: (2025)
by: Portes, Jacob, et al.
Published: (2025)
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
by: McLeish, Sean, et al.
Published: (2025)
by: McLeish, Sean, et al.
Published: (2025)
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
by: Liu, Huihan, et al.
Published: (2026)
by: Liu, Huihan, et al.
Published: (2026)
Multiple Physics Pretraining for Physical Surrogate Models
by: McCabe, Michael, et al.
Published: (2023)
by: McCabe, Michael, et al.
Published: (2023)
An Empirical Study of Realized GNN Expressiveness
by: Wang, Yanbo, et al.
Published: (2023)
by: Wang, Yanbo, et al.
Published: (2023)
Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining
by: Yang, Yazheng, et al.
Published: (2024)
by: Yang, Yazheng, et al.
Published: (2024)
Enhancing Two-Player Performance Through Single-Player Knowledge Transfer: An Empirical Study on Atari 2600 Games
by: Saadat, Kimiya, et al.
Published: (2024)
by: Saadat, Kimiya, et al.
Published: (2024)
Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts
by: Somayajula, Sai Ashish, et al.
Published: (2024)
by: Somayajula, Sai Ashish, et al.
Published: (2024)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)
by: Huang, Yukun, et al.
Published: (2025)
Improved Large Language Model Jailbreak Detection via Pretrained Embeddings
by: Galinkin, Erick, et al.
Published: (2024)
by: Galinkin, Erick, et al.
Published: (2024)
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
by: Li, Melody Zixuan, et al.
Published: (2025)
by: Li, Melody Zixuan, et al.
Published: (2025)
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
by: Zhang, Yao, et al.
Published: (2025)
by: Zhang, Yao, et al.
Published: (2025)
Similar Items
-
Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
by: Sajith, Aryan, et al.
Published: (2024) -
Adjusting Pretrained Backbones for Performativity
by: Demirel, Berker, et al.
Published: (2024) -
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
by: Li, Qi, et al.
Published: (2025) -
Pretraining a Foundation Model for Small-Molecule Natural Products
by: Ding, Yuheng, et al.
Published: (2025) -
LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale Optimization
by: Zhang, Zishi, et al.
Published: (2026)