Saved in:
| Main Author: | Tan, Liling |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.24098 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
by: Fan, Haozheng, et al.
Published: (2024)
by: Fan, Haozheng, et al.
Published: (2024)
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
by: Wang, Huazheng, et al.
Published: (2025)
by: Wang, Huazheng, et al.
Published: (2025)
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026)
by: Li, Jiacheng, et al.
Published: (2026)
WISCA: A Lightweight Model Transition Method to Improve LLM Training via Weight Scaling
by: Li, Jiacheng, et al.
Published: (2025)
by: Li, Jiacheng, et al.
Published: (2025)
Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language
by: Kumar, Vinayshekhar Bannihatti, et al.
Published: (2026)
by: Kumar, Vinayshekhar Bannihatti, et al.
Published: (2026)
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
by: Stickland, Asa Cooper, et al.
Published: (2024)
by: Stickland, Asa Cooper, et al.
Published: (2024)
TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies
by: Liang, Guang, et al.
Published: (2025)
by: Liang, Guang, et al.
Published: (2025)
Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets
by: Li, Tianjian, et al.
Published: (2024)
by: Li, Tianjian, et al.
Published: (2024)
Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models
by: Lee, Chungpa, et al.
Published: (2026)
by: Lee, Chungpa, et al.
Published: (2026)
Discovering Latent Knowledge in Language Models Without Supervision
by: Burns, Collin, et al.
Published: (2022)
by: Burns, Collin, et al.
Published: (2022)
Baby Scale: Investigating Models Trained on Individual Children's Language Input
by: Feng, Steven Y., et al.
Published: (2026)
by: Feng, Steven Y., et al.
Published: (2026)
Better Prompt Compression Without Multi-Layer Perceptrons
by: Honig, Edouardo, et al.
Published: (2025)
by: Honig, Edouardo, et al.
Published: (2025)
AutoJudge: Judge Decoding Without Manual Annotation
by: Garipov, Roman, et al.
Published: (2025)
by: Garipov, Roman, et al.
Published: (2025)
Prompt Curriculum Learning for Efficient LLM Post-Training
by: Gao, Zhaolin, et al.
Published: (2025)
by: Gao, Zhaolin, et al.
Published: (2025)
Scaling Test-Time Compute Without Verification or RL is Suboptimal
by: Setlur, Amrith, et al.
Published: (2025)
by: Setlur, Amrith, et al.
Published: (2025)
Alexpaca: Learning Factual Clarification Question Generation Without Examples
by: Toles, Matthew, et al.
Published: (2023)
by: Toles, Matthew, et al.
Published: (2023)
Training Language Models to Reason Efficiently
by: Arora, Daman, et al.
Published: (2025)
by: Arora, Daman, et al.
Published: (2025)
Cascade-Aware Training of Language Models
by: Wang, Congchao, et al.
Published: (2024)
by: Wang, Congchao, et al.
Published: (2024)
On Training Data Influence of GPT Models
by: Chai, Yekun, et al.
Published: (2024)
by: Chai, Yekun, et al.
Published: (2024)
Enhancing Domain-Specific Encoder Models with LLM-Generated Data: How to Leverage Ontologies, and How to Do Without Them
by: Brinner, Marc, et al.
Published: (2025)
by: Brinner, Marc, et al.
Published: (2025)
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home
by: Moskvoretskii, Viktor, et al.
Published: (2025)
by: Moskvoretskii, Viktor, et al.
Published: (2025)
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
by: Ahuja, Kabir, et al.
Published: (2024)
by: Ahuja, Kabir, et al.
Published: (2024)
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training
by: Guo, Qingyan, et al.
Published: (2024)
by: Guo, Qingyan, et al.
Published: (2024)
Training Superior Sparse Autoencoders for Instruct Models
by: Li, Jiaming, et al.
Published: (2025)
by: Li, Jiaming, et al.
Published: (2025)
Asynchronous Local-SGD Training for Language Modeling
by: Liu, Bo, et al.
Published: (2024)
by: Liu, Bo, et al.
Published: (2024)
Robust Training of Vector Quantized Bottleneck Models
by: Łańcucki, Adrian, et al.
Published: (2020)
by: Łańcucki, Adrian, et al.
Published: (2020)
End-to-end Planner Training for Language Modeling
by: Cornille, Nathan, et al.
Published: (2024)
by: Cornille, Nathan, et al.
Published: (2024)
Detection Without Correction: A Robust Asymmetry in Activation-Based Hallucination Probing
by: Roy, Dip, et al.
Published: (2026)
by: Roy, Dip, et al.
Published: (2026)
Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning
by: Yin, Qingyu, et al.
Published: (2024)
by: Yin, Qingyu, et al.
Published: (2024)
Induced Model Matching: Restricted Models Help Train Full-Featured Models
by: Muneeb, Usama, et al.
Published: (2024)
by: Muneeb, Usama, et al.
Published: (2024)
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
by: Luo, Renjie, et al.
Published: (2025)
by: Luo, Renjie, et al.
Published: (2025)
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access
by: Su, Jiayuan, et al.
Published: (2024)
by: Su, Jiayuan, et al.
Published: (2024)
Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models
by: Zhou, Hanhan, et al.
Published: (2026)
by: Zhou, Hanhan, et al.
Published: (2026)
Training a Bilingual Language Model by Mapping Tokens onto a Shared Character Space
by: Rom, Aviad, et al.
Published: (2024)
by: Rom, Aviad, et al.
Published: (2024)
Order-Independence Without Fine Tuning
by: McIlroy-Young, Reid, et al.
Published: (2024)
by: McIlroy-Young, Reid, et al.
Published: (2024)
Self-Training Large Language Models with Confident Reasoning
by: Jang, Hyosoon, et al.
Published: (2025)
by: Jang, Hyosoon, et al.
Published: (2025)
Pre-Trained Policy Discriminators are General Reward Models
by: Dou, Shihan, et al.
Published: (2025)
by: Dou, Shihan, et al.
Published: (2025)
Unsupervised Data Validation Methods for Efficient Model Training
by: Paniv, Yurii
Published: (2024)
by: Paniv, Yurii
Published: (2024)
Linear Dynamics in the RLVR Training of Large Language Models
by: Wang, Tianle, et al.
Published: (2026)
by: Wang, Tianle, et al.
Published: (2026)
Similar Items
-
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
by: Fan, Haozheng, et al.
Published: (2024) -
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
by: Wang, Huazheng, et al.
Published: (2025) -
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
by: Li, Jiacheng, et al.
Published: (2026) -
WISCA: A Lightweight Model Transition Method to Improve LLM Training via Weight Scaling
by: Li, Jiacheng, et al.
Published: (2025) -
Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language
by: Kumar, Vinayshekhar Bannihatti, et al.
Published: (2026)