Saved in:
| Main Authors: | Tapaninaho, Joonas, Oussala, Mourad |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.00544 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DiPaCo: Distributed Path Composition
by: Douillard, Arthur, et al.
Published: (2024)
by: Douillard, Arthur, et al.
Published: (2024)
Model Merging in Pre-training of Large Language Models
by: Li, Yunshui, et al.
Published: (2025)
by: Li, Yunshui, et al.
Published: (2025)
DEPT: Decoupled Embeddings for Pre-training Language Models
by: Iacob, Alex, et al.
Published: (2024)
by: Iacob, Alex, et al.
Published: (2024)
Parallel Structures in Pre-training Data Yield In-Context Learning
by: Chen, Yanda, et al.
Published: (2024)
by: Chen, Yanda, et al.
Published: (2024)
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
by: Chung, Woojin, et al.
Published: (2025)
by: Chung, Woojin, et al.
Published: (2025)
Making Pre-trained Language Models Great on Tabular Prediction
by: Yan, Jiahuan, et al.
Published: (2024)
by: Yan, Jiahuan, et al.
Published: (2024)
Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models
by: Zheng, Junhao, et al.
Published: (2023)
by: Zheng, Junhao, et al.
Published: (2023)
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
by: Thangarasa, Vithursan, et al.
Published: (2024)
by: Thangarasa, Vithursan, et al.
Published: (2024)
Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024)
by: Jiang, Minhao, et al.
Published: (2024)
Aligning Pre-trained Models for Spoken Language Translation
by: Sedláček, Šimon, et al.
Published: (2024)
by: Sedláček, Šimon, et al.
Published: (2024)
Sequence-to-Sequence Spanish Pre-trained Language Models
by: Araujo, Vladimir, et al.
Published: (2023)
by: Araujo, Vladimir, et al.
Published: (2023)
Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings
by: Sharma, Kartik, et al.
Published: (2025)
by: Sharma, Kartik, et al.
Published: (2025)
Fine-Tuning Pre-trained Language Models to Detect In-Game Trash Talks
by: Fesalbon, Daniel, et al.
Published: (2024)
by: Fesalbon, Daniel, et al.
Published: (2024)
Pre-trained Large Language Models Use Fourier Features to Compute Addition
by: Zhou, Tianyi, et al.
Published: (2024)
by: Zhou, Tianyi, et al.
Published: (2024)
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
by: Fan, Haozheng, et al.
Published: (2024)
by: Fan, Haozheng, et al.
Published: (2024)
Structural Pruning of Pre-trained Language Models via Neural Architecture Search
by: Klein, Aaron, et al.
Published: (2024)
by: Klein, Aaron, et al.
Published: (2024)
MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models
by: Zhang, Ying, et al.
Published: (2024)
by: Zhang, Ying, et al.
Published: (2024)
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
by: Zhong, Zexuan, et al.
Published: (2024)
by: Zhong, Zexuan, et al.
Published: (2024)
Integrating Pre-trained Language Model into Neural Machine Translation
by: Hwang, Soon-Jae, et al.
Published: (2023)
by: Hwang, Soon-Jae, et al.
Published: (2023)
Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews
by: Boluki, Ali, et al.
Published: (2023)
by: Boluki, Ali, et al.
Published: (2023)
A Context-Aware Approach for Enhancing Data Imputation with Pre-trained Language Models
by: Hayat, Ahatsham, et al.
Published: (2024)
by: Hayat, Ahatsham, et al.
Published: (2024)
Efficient Continual Pre-training of LLMs for Low-resource Languages
by: Nag, Arijit, et al.
Published: (2024)
by: Nag, Arijit, et al.
Published: (2024)
The Dark Side of the Language: Pre-trained Transformers in the DarkNet
by: Ranaldi, Leonardo, et al.
Published: (2022)
by: Ranaldi, Leonardo, et al.
Published: (2022)
Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers
by: Kadlčík, Marek, et al.
Published: (2025)
by: Kadlčík, Marek, et al.
Published: (2025)
Thinking Augmented Pre-training
by: Wang, Liang, et al.
Published: (2025)
by: Wang, Liang, et al.
Published: (2025)
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
by: Gu, Jiawei, et al.
Published: (2024)
by: Gu, Jiawei, et al.
Published: (2024)
Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
Pre-training Limited Memory Language Models with Internal and External Knowledge
by: Zhao, Linxi, et al.
Published: (2025)
by: Zhao, Linxi, et al.
Published: (2025)
Simple and Scalable Strategies to Continually Pre-train Large Language Models
by: Ibrahim, Adam, et al.
Published: (2024)
by: Ibrahim, Adam, et al.
Published: (2024)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
by: Song, Weixi, et al.
Published: (2023)
by: Song, Weixi, et al.
Published: (2023)
SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks
by: Zhu, Rui-Jie, et al.
Published: (2023)
by: Zhu, Rui-Jie, et al.
Published: (2023)
Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)
by: Yao, Jin, et al.
Published: (2024)
Unmasking Backdoors: An Explainable Defense via Gradient-Attention Anomaly Scoring for Pre-trained Language Models
by: Das, Anindya Sundar, et al.
Published: (2025)
by: Das, Anindya Sundar, et al.
Published: (2025)
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
by: Li, Guanchen, et al.
Published: (2024)
by: Li, Guanchen, et al.
Published: (2024)
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
by: Samragh, Mohammad, et al.
Published: (2024)
by: Samragh, Mohammad, et al.
Published: (2024)
MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector
by: Fu, Wenjie, et al.
Published: (2024)
by: Fu, Wenjie, et al.
Published: (2024)
HiFloat4 Format for Language Model Pre-training on Ascend NPUs
by: Taghian, Mehran, et al.
Published: (2026)
by: Taghian, Mehran, et al.
Published: (2026)
CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model
by: Chiang, Shang-Hsuan, et al.
Published: (2024)
by: Chiang, Shang-Hsuan, et al.
Published: (2024)
Parallel Scaling Law for Language Models
by: Chen, Mouxiang, et al.
Published: (2025)
by: Chen, Mouxiang, et al.
Published: (2025)
Parallel Token Prediction for Language Models
by: Draxler, Felix, et al.
Published: (2025)
by: Draxler, Felix, et al.
Published: (2025)
Similar Items
-
DiPaCo: Distributed Path Composition
by: Douillard, Arthur, et al.
Published: (2024) -
Model Merging in Pre-training of Large Language Models
by: Li, Yunshui, et al.
Published: (2025) -
DEPT: Decoupled Embeddings for Pre-training Language Models
by: Iacob, Alex, et al.
Published: (2024) -
Parallel Structures in Pre-training Data Yield In-Context Learning
by: Chen, Yanda, et al.
Published: (2024) -
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
by: Chung, Woojin, et al.
Published: (2025)