Saved in:
| Main Authors: | Takase, Sho, Kiyono, Shun, Kobayashi, Sosuke, Suzuki, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2312.16903 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
by: Yano, Kazuki, et al.
Published: (2026)
by: Yano, Kazuki, et al.
Published: (2026)
Efficient Construction of Model Family through Progressive Training Using Model Expansion
by: Yano, Kazuki, et al.
Published: (2025)
by: Yano, Kazuki, et al.
Published: (2025)
Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability
by: Ri, Ryokan, et al.
Published: (2024)
by: Ri, Ryokan, et al.
Published: (2024)
Large Vocabulary Size Improves Large Language Models
by: Takase, Sho, et al.
Published: (2024)
by: Takase, Sho, et al.
Published: (2024)
Natural Fingerprints of Large Language Models
by: Suzuki, Teppei, et al.
Published: (2025)
by: Suzuki, Teppei, et al.
Published: (2025)
Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
by: Kajitsuka, Tokio, et al.
Published: (2026)
by: Kajitsuka, Tokio, et al.
Published: (2026)
Pre-trained Large Language Models for Financial Sentiment Analysis
by: Luo, Wei, et al.
Published: (2024)
by: Luo, Wei, et al.
Published: (2024)
Understanding Data Temporality Impact on Large Language Models Pre-training
by: Pilchen, Hippolyte, et al.
Published: (2026)
by: Pilchen, Hippolyte, et al.
Published: (2026)
DataMan: Data Manager for Pre-training Large Language Models
by: Peng, Ru, et al.
Published: (2025)
by: Peng, Ru, et al.
Published: (2025)
Pre-training Distillation for Large Language Models: A Design Space Exploration
by: Peng, Hao, et al.
Published: (2024)
by: Peng, Hao, et al.
Published: (2024)
Efficient Language Adaptive Pre-training: Extending State-of-the-Art Large Language Models for Polish
by: Ruciński, Szymon
Published: (2024)
by: Ruciński, Szymon
Published: (2024)
SPAFIT: Stratified Progressive Adaptation Fine-tuning for Pre-trained Large Language Models
by: Arora, Samir, et al.
Published: (2024)
by: Arora, Samir, et al.
Published: (2024)
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
by: Qian, Chen, et al.
Published: (2024)
by: Qian, Chen, et al.
Published: (2024)
Probing Language Models for Pre-training Data Detection
by: Liu, Zhenhua, et al.
Published: (2024)
by: Liu, Zhenhua, et al.
Published: (2024)
MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science
by: Kim, Junho, et al.
Published: (2024)
by: Kim, Junho, et al.
Published: (2024)
Sparse is Enough in Fine-tuning Pre-trained Large Language Models
by: Song, Weixi, et al.
Published: (2023)
by: Song, Weixi, et al.
Published: (2023)
Simple and Scalable Strategies to Continually Pre-train Large Language Models
by: Ibrahim, Adam, et al.
Published: (2024)
by: Ibrahim, Adam, et al.
Published: (2024)
Machine Unlearning of Pre-trained Large Language Models
by: Yao, Jin, et al.
Published: (2024)
by: Yao, Jin, et al.
Published: (2024)
Synthesize-on-Graph: Knowledgeable Synthetic Data Generation for Continue Pre-training of Large Language Models
by: Ma, Shengjie, et al.
Published: (2025)
by: Ma, Shengjie, et al.
Published: (2025)
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models
by: Ranaldi, Leonardo, et al.
Published: (2023)
by: Ranaldi, Leonardo, et al.
Published: (2023)
Can Pre-trained Language Models Understand Chinese Humor?
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
by: Lv, Kangtao, et al.
Published: (2025)
by: Lv, Kangtao, et al.
Published: (2025)
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
by: Samragh, Mohammad, et al.
Published: (2024)
by: Samragh, Mohammad, et al.
Published: (2024)
Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model
by: Xia, Fei, et al.
Published: (2024)
by: Xia, Fei, et al.
Published: (2024)
From N-grams to Pre-trained Multilingual Models For Language Identification
by: Sindane, Thapelo, et al.
Published: (2024)
by: Sindane, Thapelo, et al.
Published: (2024)
Boosting Explainability through Selective Rationalization in Pre-trained Language Models
by: Yuan, Libing, et al.
Published: (2025)
by: Yuan, Libing, et al.
Published: (2025)
RegMix: Data Mixture as Regression for Language Model Pre-training
by: Liu, Qian, et al.
Published: (2024)
by: Liu, Qian, et al.
Published: (2024)
Blacks is to Anger as Whites is to Joy? Understanding Latent Affective Bias in Large Pre-trained Neural Language Models
by: Kadan, Anoop, et al.
Published: (2023)
by: Kadan, Anoop, et al.
Published: (2023)
DocMamba: Efficient Document Pre-training with State Space Model
by: Hu, Pengfei, et al.
Published: (2024)
by: Hu, Pengfei, et al.
Published: (2024)
More Women, Same Stereotypes: Unpacking the Gender Bias Paradox in Large Language Models
by: Chen, Evan, et al.
Published: (2025)
by: Chen, Evan, et al.
Published: (2025)
Sequence-to-Sequence Spanish Pre-trained Language Models
by: Araujo, Vladimir, et al.
Published: (2023)
by: Araujo, Vladimir, et al.
Published: (2023)
Investigating Data Contamination for Pre-training Language Models
by: Jiang, Minhao, et al.
Published: (2024)
by: Jiang, Minhao, et al.
Published: (2024)
Aligning Pre-trained Models for Spoken Language Translation
by: Sedláček, Šimon, et al.
Published: (2024)
by: Sedláček, Šimon, et al.
Published: (2024)
Efficient Data Learning for Open Information Extraction with Pre-trained Language Models
by: Fan, Zhiyuan, et al.
Published: (2023)
by: Fan, Zhiyuan, et al.
Published: (2023)
Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models
by: Tang, Lei, et al.
Published: (2025)
by: Tang, Lei, et al.
Published: (2025)
More is More: Addition Bias in Large Language Models
by: Santagata, Luca, et al.
Published: (2024)
by: Santagata, Luca, et al.
Published: (2024)
Superpixel Semantics Representation and Pre-training for Vision-Language Task
by: Zhang, Siyu, et al.
Published: (2023)
by: Zhang, Siyu, et al.
Published: (2023)
Zero-Shot Spam Email Classification Using Pre-trained Large Language Models
by: Rojas-Galeano, Sergio
Published: (2024)
by: Rojas-Galeano, Sergio
Published: (2024)
Refactoring Programs Using Large Language Models with Few-Shot Examples
by: Shirafuji, Atsushi, et al.
Published: (2023)
by: Shirafuji, Atsushi, et al.
Published: (2023)
Topic Over Source: The Key to Effective Data Mixing for Language Models Pre-training
by: Peng, Jiahui, et al.
Published: (2025)
by: Peng, Jiahui, et al.
Published: (2025)
Similar Items
-
Pre-training LLM without Learning Rate Decay Enhances Supervised Fine-Tuning
by: Yano, Kazuki, et al.
Published: (2026) -
Efficient Construction of Model Family through Progressive Training Using Model Expansion
by: Yano, Kazuki, et al.
Published: (2025) -
Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability
by: Ri, Ryokan, et al.
Published: (2024) -
Large Vocabulary Size Improves Large Language Models
by: Takase, Sho, et al.
Published: (2024) -
Natural Fingerprints of Large Language Models
by: Suzuki, Teppei, et al.
Published: (2025)