Saved in:
| Main Authors: | Gelberg, Yoav, Eguchi, Koshi, Akiba, Takuya, Cetin, Edoardo |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.12167 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Iterative Deployment Improves Planning Skills in LLMs
by: Corrêa, Augusto B., et al.
Published: (2025)
by: Corrêa, Augusto B., et al.
Published: (2025)
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025)
by: Nakamura, Taishi, et al.
Published: (2025)
Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)
by: Sun, Qi, et al.
Published: (2025)
Doc-to-LoRA: Learning to Instantly Internalize Contexts
by: Charakorn, Rujikorn, et al.
Published: (2026)
by: Charakorn, Rujikorn, et al.
Published: (2026)
Steering at the Source: Style Modulation Heads for Robust Persona Control
by: Izawa, Yoshihiro, et al.
Published: (2026)
by: Izawa, Yoshihiro, et al.
Published: (2026)
Drop Dropout on Single-Epoch Language Model Pretraining
by: Liu, Houjun, et al.
Published: (2025)
by: Liu, Houjun, et al.
Published: (2025)
LLMs Are In-Context Bandit Reinforcement Learners
by: Monea, Giovanni, et al.
Published: (2024)
by: Monea, Giovanni, et al.
Published: (2024)
CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
by: Zhu, Shiyi, et al.
Published: (2023)
by: Zhu, Shiyi, et al.
Published: (2023)
Reinforcement Learning Teachers of Test Time Scaling
by: Cetin, Edoardo, et al.
Published: (2025)
by: Cetin, Edoardo, et al.
Published: (2025)
Large Language Models to Diffusion Finetuning
by: Cetin, Edoardo, et al.
Published: (2025)
by: Cetin, Edoardo, et al.
Published: (2025)
END: Early Noise Dropping for Efficient and Effective Context Denoising
by: Jin, Hongye, et al.
Published: (2025)
by: Jin, Hongye, et al.
Published: (2025)
No Mean Feat: Simple, Strong Baselines for Context Compression
by: Feldman, Yair, et al.
Published: (2025)
by: Feldman, Yair, et al.
Published: (2025)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
by: Basmov, Victoria, et al.
Published: (2023)
by: Basmov, Victoria, et al.
Published: (2023)
Agent Skill Acquisition for Large Language Models via CycleQD
by: Kuroki, So, et al.
Published: (2024)
by: Kuroki, So, et al.
Published: (2024)
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)
by: Wang, Guangtao, et al.
Published: (2025)
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
by: Wang, Minzheng, et al.
Published: (2024)
by: Wang, Minzheng, et al.
Published: (2024)
An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)
by: Cetin, Edoardo, et al.
Published: (2024)
SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass
by: Liu, Yewei, et al.
Published: (2026)
by: Liu, Yewei, et al.
Published: (2026)
A Surprising Failure? Multimodal LLMs and the NLVR Challenge
by: Wu, Anne, et al.
Published: (2024)
by: Wu, Anne, et al.
Published: (2024)
Pretrained LLMs Learn Multiple Types of Uncertainty
by: Cohen, Roi, et al.
Published: (2025)
by: Cohen, Roi, et al.
Published: (2025)
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
by: Hu, Lanxiang, et al.
Published: (2024)
by: Hu, Lanxiang, et al.
Published: (2024)
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings
by: Shachar, Or, et al.
Published: (2025)
by: Shachar, Or, et al.
Published: (2025)
KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
by: Kuroki, So, et al.
Published: (2025)
by: Kuroki, So, et al.
Published: (2025)
Emergent Communication Pretraining for Few-Shot Machine Translation
by: Li, Yaoyiran, et al.
Published: (2020)
by: Li, Yaoyiran, et al.
Published: (2020)
Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)
by: Park, Chanwoo, et al.
Published: (2025)
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
by: Shing, Makoto, et al.
Published: (2025)
by: Shing, Makoto, et al.
Published: (2025)
Output Embedding Centering for Stable LLM Pretraining
by: Stollenwerk, Felix, et al.
Published: (2026)
by: Stollenwerk, Felix, et al.
Published: (2026)
Sudoku-Bench: Evaluating creative reasoning with Sudoku variants
by: Seely, Jeffrey, et al.
Published: (2025)
by: Seely, Jeffrey, et al.
Published: (2025)
Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
by: Hua, Yilun, et al.
Published: (2024)
by: Hua, Yilun, et al.
Published: (2024)
Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data
by: Zhang, Xuemiao, et al.
Published: (2025)
by: Zhang, Xuemiao, et al.
Published: (2025)
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
by: Akter, Syeda Nahida, et al.
Published: (2024)
by: Akter, Syeda Nahida, et al.
Published: (2024)
SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization
by: Sun, Yan, et al.
Published: (2026)
by: Sun, Yan, et al.
Published: (2026)
Context-level Language Modeling by Learning Predictive Context Embeddings
by: Dai, Beiya, et al.
Published: (2025)
by: Dai, Beiya, et al.
Published: (2025)
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
by: Mamtani, Sumit, et al.
Published: (2025)
by: Mamtani, Sumit, et al.
Published: (2025)
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
by: Fan, Dongyang, et al.
Published: (2025)
by: Fan, Dongyang, et al.
Published: (2025)
Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
by: Liu, Feilong
Published: (2026)
by: Liu, Feilong
Published: (2026)
Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows
by: Dickson, Billy, et al.
Published: (2025)
by: Dickson, Billy, et al.
Published: (2025)
Can GRPO Help LLMs Transcend Their Pretraining Origin?
by: Ni, Kangqi, et al.
Published: (2025)
by: Ni, Kangqi, et al.
Published: (2025)
Evaluating the Sensitivity of LLMs to Prior Context
by: Hankache, Robert, et al.
Published: (2025)
by: Hankache, Robert, et al.
Published: (2025)
REAL: Response Embedding-based Alignment for LLMs
by: Zhang, Honggen, et al.
Published: (2024)
by: Zhang, Honggen, et al.
Published: (2024)
Similar Items
-
Iterative Deployment Improves Planning Skills in LLMs
by: Corrêa, Augusto B., et al.
Published: (2025) -
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025) -
Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025) -
Doc-to-LoRA: Learning to Instantly Internalize Contexts
by: Charakorn, Rujikorn, et al.
Published: (2026) -
Steering at the Source: Style Modulation Heads for Robust Persona Control
by: Izawa, Yoshihiro, et al.
Published: (2026)