:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gelberg, Yoav, Eguchi, Koshi, Akiba, Takuya, Cetin, Edoardo
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.12167
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Iterative Deployment Improves Planning Skills in LLMs
by: Corrêa, Augusto B., et al.
Published: (2025)

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025)

Transformer-Squared: Self-adaptive LLMs
by: Sun, Qi, et al.
Published: (2025)

Doc-to-LoRA: Learning to Instantly Internalize Contexts
by: Charakorn, Rujikorn, et al.
Published: (2026)

Steering at the Source: Style Modulation Heads for Robust Persona Control
by: Izawa, Yoshihiro, et al.
Published: (2026)

Drop Dropout on Single-Epoch Language Model Pretraining
by: Liu, Houjun, et al.
Published: (2025)

LLMs Are In-Context Bandit Reinforcement Learners
by: Monea, Giovanni, et al.
Published: (2024)

CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending
by: Zhu, Shiyi, et al.
Published: (2023)

Reinforcement Learning Teachers of Test Time Scaling
by: Cetin, Edoardo, et al.
Published: (2025)

Large Language Models to Diffusion Finetuning
by: Cetin, Edoardo, et al.
Published: (2025)

END: Early Noise Dropping for Efficient and Effective Context Denoising
by: Jin, Hongye, et al.
Published: (2025)

No Mean Feat: Simple, Strong Baselines for Context Compression
by: Feldman, Yair, et al.
Published: (2025)

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds
by: Basmov, Victoria, et al.
Published: (2023)

Agent Skill Acquisition for Large Language Models via CycleQD
by: Kuroki, So, et al.
Published: (2024)

LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
by: Wang, Minzheng, et al.
Published: (2024)

An Evolved Universal Transformer Memory
by: Cetin, Edoardo, et al.
Published: (2024)

SHINE: A Scalable In-Context Hypernetwork for Mapping Context to LoRA in a Single Pass
by: Liu, Yewei, et al.
Published: (2026)

A Surprising Failure? Multimodal LLMs and the NLVR Challenge
by: Wu, Anne, et al.
Published: (2024)

Pretrained LLMs Learn Multiple Types of Uncertainty
by: Cohen, Roi, et al.
Published: (2025)

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
by: Hu, Lanxiang, et al.
Published: (2024)

NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings
by: Shachar, Or, et al.
Published: (2025)

KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI
by: Kuroki, So, et al.
Published: (2025)

Emergent Communication Pretraining for Few-Shot Machine Translation
by: Li, Yaoyiran, et al.
Published: (2020)

Beyond Line-Level Filtering for the Pretraining Corpora of LLMs
by: Park, Chanwoo, et al.
Published: (2025)

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
by: Shing, Makoto, et al.
Published: (2025)

Output Embedding Centering for Stable LLM Pretraining
by: Stollenwerk, Felix, et al.
Published: (2026)

Sudoku-Bench: Evaluating creative reasoning with Sudoku variants
by: Seely, Jeffrey, et al.
Published: (2025)

Talk Less, Interact Better: Evaluating In-context Conversational Adaptation in Multimodal LLMs
by: Hua, Yilun, et al.
Published: (2024)

Preference Curriculum: LLMs Should Always Be Pretrained on Their Preferred Data
by: Zhang, Xuemiao, et al.
Published: (2025)

MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
by: Akter, Syeda Nahida, et al.
Published: (2024)

SimReg: Achieving Higher Performance in the Pretraining via Embedding Similarity Regularization
by: Sun, Yan, et al.
Published: (2026)

Context-level Language Modeling by Learning Predictive Context Embeddings
by: Dai, Beiya, et al.
Published: (2025)

Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
by: Mamtani, Sumit, et al.
Published: (2025)

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
by: Fan, Dongyang, et al.
Published: (2025)

Rotary Positional Embeddings as Phase Modulation: Theoretical Bounds on the RoPE Base for Long-Context Transformers
by: Liu, Feilong
Published: (2026)

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows
by: Dickson, Billy, et al.
Published: (2025)

Can GRPO Help LLMs Transcend Their Pretraining Origin?
by: Ni, Kangqi, et al.
Published: (2025)

Evaluating the Sensitivity of LLMs to Prior Context
by: Hankache, Robert, et al.
Published: (2025)

REAL: Response Embedding-based Alignment for LLMs
by: Zhang, Honggen, et al.
Published: (2024)