:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Rao, Praveen
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.02632
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Is Training Data Quality or Quantity More Impactful to Small Language Model Performance?
by: Sajith, Aryan, et al.
Published: (2024)

Adjusting Pretrained Backbones for Performativity
by: Demirel, Berker, et al.
Published: (2024)

Reasoning Language Model Inference Serving Unveiled: An Empirical Study
by: Li, Qi, et al.
Published: (2025)

Pretraining a Foundation Model for Small-Molecule Natural Products
by: Ding, Yuheng, et al.
Published: (2025)

LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale Optimization
by: Zhang, Zishi, et al.
Published: (2026)

Finetune-Informed Pretraining Boosts Downstream Performance
by: Faysal, Atik, et al.
Published: (2026)

Small Vocabularies, Big Gains: Pretraining and Tokenization in Time Series Models
by: Roger, Alexis, et al.
Published: (2025)

Learning Transferable Sensor Models via Language-Informed Pretraining
by: Chen, Yuliang, et al.
Published: (2026)

How Does Controllability Emerge In Language Models During Pretraining?
by: She, Jianshu, et al.
Published: (2025)

Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining
by: Huang, Ruizhe, et al.
Published: (2025)

On Training Large Language Models for Long-Horizon Tasks: An Empirical Study of Horizon Length
by: Kim, Sunghwan, et al.
Published: (2026)

Why Representation Engineering Works: A Theoretical and Empirical Study in Vision-Language Models
by: Tian, Bowei, et al.
Published: (2025)

A Step Toward Federated Pretraining of Multimodal Large Language Models
by: Xiong, Baochen, et al.
Published: (2026)

Modeling and Performance Analysis for Semantic Communications Based on Empirical Results
by: Ma, Shuai, et al.
Published: (2025)

Pretraining Large Language Models with NVFP4
by: NVIDIA, et al.
Published: (2025)

Patent Language Model Pretraining with ModernBERT
by: Yousefiramandi, Amirhossein, et al.
Published: (2025)

What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation
by: Gong, Zhuocheng, et al.
Published: (2024)

Enhancing Microgrid Performance Prediction with Attention-based Deep Learning Models
by: Maddineni, Vinod Kumar, et al.
Published: (2024)

Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
by: Sow, Daouda, et al.
Published: (2025)

Predicting LLM Reasoning Performance with Small Proxy Model
by: Koh, Woosung, et al.
Published: (2025)

In-context Pretraining: Language Modeling Beyond Document Boundaries
by: Shi, Weijia, et al.
Published: (2023)

Revisiting Multilingual Data Mixtures in Language Model Pretraining
by: Foroutan, Negar, et al.
Published: (2025)

Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
by: Bayazit, Deniz, et al.
Published: (2023)

Deep Ensembles Secretly Perform Empirical Bayes
by: Loaiza-Ganem, Gabriel, et al.
Published: (2025)

Small Language Models for Application Interactions: A Case Study
by: Li, Beibin, et al.
Published: (2024)

Learnware of Language Models: Specialized Small Language Models Can Do Big
by: Tan, Zhi-Hao, et al.
Published: (2025)

Caching Techniques for Reducing the Communication Cost of Federated Learning in IoT Environments
by: Alhonainy, Ahmad, et al.
Published: (2025)

Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining
by: Zhou, Xiaofan, et al.
Published: (2025)

Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
by: Portes, Jacob, et al.
Published: (2025)

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
by: McLeish, Sean, et al.
Published: (2025)

Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
by: Liu, Huihan, et al.
Published: (2026)

Multiple Physics Pretraining for Physical Surrogate Models
by: McCabe, Michael, et al.
Published: (2023)

An Empirical Study of Realized GNN Expressiveness
by: Wang, Yanbo, et al.
Published: (2023)

Unlock the Potential of Large Language Models for Predictive Tabular Tasks in Data Science with Table-Specific Pretraining
by: Yang, Yazheng, et al.
Published: (2024)

Enhancing Two-Player Performance Through Single-Player Knowledge Transfer: An Empirical Study on Atari 2600 Games
by: Saadat, Kimiya, et al.
Published: (2024)

Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts
by: Somayajula, Sai Ashish, et al.
Published: (2024)

Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models
by: Huang, Yukun, et al.
Published: (2025)

Improved Large Language Model Jailbreak Detection via Pretrained Embeddings
by: Galinkin, Erick, et al.
Published: (2024)

Tracing the Representation Geometry of Language Models from Pretraining to Post-training
by: Li, Melody Zixuan, et al.
Published: (2025)

FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models
by: Zhang, Yao, et al.
Published: (2025)