Saved in:
| Main Authors: | Yano, Kazuki, Ito, Takumi, Suzuki, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.04151 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
by: Yano, Kazuki, et al.
Published: (2026)
by: Yano, Kazuki, et al.
Published: (2026)
Efficient Construction of Model Family through Progressive Training Using Model Expansion
by: Yano, Kazuki, et al.
Published: (2025)
by: Yano, Kazuki, et al.
Published: (2025)
Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling
by: Yano, Kazuki, et al.
Published: (2026)
by: Yano, Kazuki, et al.
Published: (2026)
Spike No More: Stabilizing the Pre-training of Large Language Models
by: Takase, Sho, et al.
Published: (2023)
by: Takase, Sho, et al.
Published: (2023)
TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration Tasks
by: Fujii, Ryo, et al.
Published: (2026)
by: Fujii, Ryo, et al.
Published: (2026)
Layerwise Importance Analysis of Feed-Forward Networks in Transformer-based Language Models
by: Ikeda, Wataru, et al.
Published: (2025)
by: Ikeda, Wataru, et al.
Published: (2025)
Reference-free Evaluation Metrics for Text Generation: A Survey
by: Ito, Takumi, et al.
Published: (2025)
by: Ito, Takumi, et al.
Published: (2025)
Suppressing Final Layer Hidden State Jumps in Transformer Pretraining
by: Shibata, Keigo, et al.
Published: (2026)
by: Shibata, Keigo, et al.
Published: (2026)
Efficient Continual Pre-training for Building Domain Specific Large Language Models
by: Xie, Yong, et al.
Published: (2023)
by: Xie, Yong, et al.
Published: (2023)
Model Merging in Pre-training of Large Language Models
by: Li, Yunshui, et al.
Published: (2025)
by: Li, Yunshui, et al.
Published: (2025)
Towards Effective and Efficient Continual Pre-training of Large Language Models
by: Chen, Jie, et al.
Published: (2024)
by: Chen, Jie, et al.
Published: (2024)
Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
TRELM: Towards Robust and Efficient Pre-training for Knowledge-Enhanced Language Models
by: Yan, Junbing, et al.
Published: (2024)
by: Yan, Junbing, et al.
Published: (2024)
Efficient Knowledge Probing of Large Language Models by Adapting Pre-trained Embeddings
by: Sharma, Kartik, et al.
Published: (2025)
by: Sharma, Kartik, et al.
Published: (2025)
Embedding-to-Prefix: Parameter-Efficient Personalization for Pre-Trained Large Language Models
by: Huber, Bernd, et al.
Published: (2025)
by: Huber, Bernd, et al.
Published: (2025)
Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish
by: Ruciński, Szymon
Published: (2024)
by: Ruciński, Szymon
Published: (2024)
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
by: Fan, Haozheng, et al.
Published: (2024)
by: Fan, Haozheng, et al.
Published: (2024)
On Entity Identification in Language Models
by: Sakata, Masaki, et al.
Published: (2025)
by: Sakata, Masaki, et al.
Published: (2025)
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
by: Wang, Shaobo, et al.
Published: (2026)
by: Wang, Shaobo, et al.
Published: (2026)
Cross-layer Attention Sharing for Pre-trained Large Language Models
by: Mu, Yongyu, et al.
Published: (2024)
by: Mu, Yongyu, et al.
Published: (2024)
Examining Forgetting in Continual Pre-training of Aligned Large Language Models
by: Li, Chen-An, et al.
Published: (2024)
by: Li, Chen-An, et al.
Published: (2024)
Pre-trained Large Language Models for Financial Sentiment Analysis
by: Luo, Wei, et al.
Published: (2024)
by: Luo, Wei, et al.
Published: (2024)
Linear Representations of Hierarchical Concepts in Language Models
by: Sakata, Masaki, et al.
Published: (2026)
by: Sakata, Masaki, et al.
Published: (2026)
Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models
by: Tu, Zhijun, et al.
Published: (2025)
by: Tu, Zhijun, et al.
Published: (2025)
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
by: Thangarasa, Vithursan, et al.
Published: (2024)
by: Thangarasa, Vithursan, et al.
Published: (2024)
DocMamba: Efficient Document Pre-training with State Space Model
by: Hu, Pengfei, et al.
Published: (2024)
by: Hu, Pengfei, et al.
Published: (2024)
DataMan: Data Manager for Pre-training Large Language Models
by: Peng, Ru, et al.
Published: (2025)
by: Peng, Ru, et al.
Published: (2025)
Understanding Data Temporality Impact on Large Language Models Pre-training
by: Pilchen, Hippolyte, et al.
Published: (2026)
by: Pilchen, Hippolyte, et al.
Published: (2026)
Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)
by: Yao, Jin, et al.
Published: (2024)
SongSage: A Large Musical Language Model with Lyric Generative Pre-training
by: Guo, Jiani, et al.
Published: (2026)
by: Guo, Jiani, et al.
Published: (2026)
Efficient Data Learning for Open Information Extraction with Pre-trained Language Models
by: Fan, Zhiyuan, et al.
Published: (2023)
by: Fan, Zhiyuan, et al.
Published: (2023)
Acquiring Bidirectionality via Large and Small Language Models
by: Goto, Takumi, et al.
Published: (2024)
by: Goto, Takumi, et al.
Published: (2024)
Pruning Multilingual Large Language Models for Multilingual Inference
by: Kim, Hwichan, et al.
Published: (2024)
by: Kim, Hwichan, et al.
Published: (2024)
Metadata Conditioning Accelerates Language Model Pre-training
by: Gao, Tianyu, et al.
Published: (2025)
by: Gao, Tianyu, et al.
Published: (2025)
Evaluating Discourse Cohesion in Pre-trained Language Models
by: He, Jie, et al.
Published: (2025)
by: He, Jie, et al.
Published: (2025)
Development of Cognitive Intelligence in Pre-trained Language Models
by: Shah, Raj Sanjay, et al.
Published: (2024)
by: Shah, Raj Sanjay, et al.
Published: (2024)
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
by: Zhu, Yongxin, et al.
Published: (2024)
by: Zhu, Yongxin, et al.
Published: (2024)
Pre-trained Large Language Models Use Fourier Features to Compute Addition
by: Zhou, Tianyi, et al.
Published: (2024)
by: Zhou, Tianyi, et al.
Published: (2024)
Pre-training Distillation for Large Language Models: A Design Space Exploration
by: Peng, Hao, et al.
Published: (2024)
by: Peng, Hao, et al.
Published: (2024)
Fine-tuning Pre-trained Language Models for Few-shot Intent Detection: Supervised Pre-training and Isotropization
by: Zhang, Haode, et al.
Published: (2022)
by: Zhang, Haode, et al.
Published: (2022)
Similar Items
-
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
by: Yano, Kazuki, et al.
Published: (2026) -
Efficient Construction of Model Family through Progressive Training Using Model Expansion
by: Yano, Kazuki, et al.
Published: (2025) -
Adapting Text LLMs to Speech via Multimodal Depth Up-Scaling
by: Yano, Kazuki, et al.
Published: (2026) -
Spike No More: Stabilizing the Pre-training of Large Language Models
by: Takase, Sho, et al.
Published: (2023) -
TimeMachine-bench: A Benchmark for Evaluating Model Capabilities in Repository-Level Migration Tasks
by: Fujii, Ryo, et al.
Published: (2026)