Saved in:
| Main Authors: | Guigon, Maxime, Dixon, Lucas, Sander, Michaël E. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.11513 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024)
by: Ghandeharioun, Asma, et al.
Published: (2024)
Pre-training Distillation for Large Language Models: A Design Space Exploration
by: Peng, Hao, et al.
Published: (2024)
by: Peng, Hao, et al.
Published: (2024)
Layer by Layer: Uncovering Hidden Representations in Language Models
by: Skean, Oscar, et al.
Published: (2025)
by: Skean, Oscar, et al.
Published: (2025)
Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model
by: Lin, Chun-Hsien, et al.
Published: (2024)
by: Lin, Chun-Hsien, et al.
Published: (2024)
Embedding-to-Prefix: Parameter-Efficient Personalization for Pre-Trained Large Language Models
by: Huber, Bernd, et al.
Published: (2025)
by: Huber, Bernd, et al.
Published: (2025)
Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models
by: Byun, Hoyoon, et al.
Published: (2025)
by: Byun, Hoyoon, et al.
Published: (2025)
Evolution of Concepts in Language Model Pre-Training
by: Ge, Xuyang, et al.
Published: (2025)
by: Ge, Xuyang, et al.
Published: (2025)
Learning Dynamics in Continual Pre-Training for Large Language Models
by: Wang, Xingjin, et al.
Published: (2025)
by: Wang, Xingjin, et al.
Published: (2025)
Construction of Hyper-Relational Knowledge Graphs Using Pre-Trained Large Language Models
by: Datta, Preetha, et al.
Published: (2024)
by: Datta, Preetha, et al.
Published: (2024)
Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning
by: Bhatia, Gagan, et al.
Published: (2025)
by: Bhatia, Gagan, et al.
Published: (2025)
Pre-Trained Language Models for Keyphrase Prediction: A Review
by: Umair, Muhammad, et al.
Published: (2024)
by: Umair, Muhammad, et al.
Published: (2024)
Revealing the Inherent Instructability of Pre-Trained Language Models
by: An, Seokhyun, et al.
Published: (2024)
by: An, Seokhyun, et al.
Published: (2024)
Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models
by: Sun, Kaiser, et al.
Published: (2024)
by: Sun, Kaiser, et al.
Published: (2024)
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models
by: Wang, Chengyu, et al.
Published: (2025)
by: Wang, Chengyu, et al.
Published: (2025)
Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
by: Ghilardi, Davide, et al.
Published: (2024)
by: Ghilardi, Davide, et al.
Published: (2024)
Pre-Training Curriculum for Multi-Token Prediction in Language Models
by: Aynetdinov, Ansar, et al.
Published: (2025)
by: Aynetdinov, Ansar, et al.
Published: (2025)
Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models
by: Abbes, Istabrak, et al.
Published: (2025)
by: Abbes, Istabrak, et al.
Published: (2025)
Dual-Space Knowledge Distillation for Large Language Models
by: Zhang, Songming, et al.
Published: (2024)
by: Zhang, Songming, et al.
Published: (2024)
MiniLLM: On-Policy Distillation of Large Language Models
by: Gu, Yuxian, et al.
Published: (2023)
by: Gu, Yuxian, et al.
Published: (2023)
Black-Box On-Policy Distillation of Large Language Models
by: Ye, Tianzhu, et al.
Published: (2025)
by: Ye, Tianzhu, et al.
Published: (2025)
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models
by: Kahng, Minsuk, et al.
Published: (2024)
by: Kahng, Minsuk, et al.
Published: (2024)
Crown, Frame, Reverse: Layer-Wise Scaling Variants for LLM Pre-Training
by: Baroian, Andrei, et al.
Published: (2025)
by: Baroian, Andrei, et al.
Published: (2025)
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
by: Park, Jungwoo, et al.
Published: (2025)
by: Park, Jungwoo, et al.
Published: (2025)
MLLM-Microscope: Unlocking Hidden Structure Within Multimodal Large Language Models
by: Mussabayev, Ravil, et al.
Published: (2026)
by: Mussabayev, Ravil, et al.
Published: (2026)
Gecko: Versatile Text Embeddings Distilled from Large Language Models
by: Lee, Jinhyuk, et al.
Published: (2024)
by: Lee, Jinhyuk, et al.
Published: (2024)
Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models
by: Yang, Hao, et al.
Published: (2025)
by: Yang, Hao, et al.
Published: (2025)
Dual-objective Language Models: Training Efficiency Without Overfitting
by: Samuel, David, et al.
Published: (2025)
by: Samuel, David, et al.
Published: (2025)
CORE: A Conceptual Reasoning Layer for Large Language Models
by: Hegde, Vishwas, et al.
Published: (2025)
by: Hegde, Vishwas, et al.
Published: (2025)
Cross-Modal Knowledge Distillation for Speech Large Language Models
by: Wang, Enzhi, et al.
Published: (2025)
by: Wang, Enzhi, et al.
Published: (2025)
ELAD: Explanation-Guided Large Language Models Active Distillation
by: Zhang, Yifei, et al.
Published: (2024)
by: Zhang, Yifei, et al.
Published: (2024)
Distilling Event Sequence Knowledge From Large Language Models
by: Wadhwa, Somin, et al.
Published: (2024)
by: Wadhwa, Somin, et al.
Published: (2024)
Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training
by: Xiao, Meng, et al.
Published: (2025)
by: Xiao, Meng, et al.
Published: (2025)
Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation
by: Wysocka, Magdalena, et al.
Published: (2023)
by: Wysocka, Magdalena, et al.
Published: (2023)
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
by: Bandarkar, Lucas, et al.
Published: (2024)
by: Bandarkar, Lucas, et al.
Published: (2024)
Unraveling Emotions with Pre-Trained Models
by: Pajón-Sanmartín, Alejandro, et al.
Published: (2025)
by: Pajón-Sanmartín, Alejandro, et al.
Published: (2025)
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
by: Weller, Orion, et al.
Published: (2023)
by: Weller, Orion, et al.
Published: (2023)
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs
by: Zhang, Rongzhi, et al.
Published: (2024)
by: Zhang, Rongzhi, et al.
Published: (2024)
Pre-trained Large Language Models for Financial Sentiment Analysis
by: Luo, Wei, et al.
Published: (2024)
by: Luo, Wei, et al.
Published: (2024)
Spike No More: Stabilizing the Pre-training of Large Language Models
by: Takase, Sho, et al.
Published: (2023)
by: Takase, Sho, et al.
Published: (2023)
Federated Learning with Layer Skipping: Efficient Training of Large Language Models for Healthcare NLP
by: Zhang, Lihong, et al.
Published: (2025)
by: Zhang, Lihong, et al.
Published: (2025)
Similar Items
-
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
by: Ghandeharioun, Asma, et al.
Published: (2024) -
Pre-training Distillation for Large Language Models: A Design Space Exploration
by: Peng, Hao, et al.
Published: (2024) -
Layer by Layer: Uncovering Hidden Representations in Language Models
by: Skean, Oscar, et al.
Published: (2025) -
Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model
by: Lin, Chun-Hsien, et al.
Published: (2024) -
Embedding-to-Prefix: Parameter-Efficient Personalization for Pre-Trained Large Language Models
by: Huber, Bernd, et al.
Published: (2025)