Guardado en:
| Autores principales: | Ruscio, Valeria, Khedouri, Eli-Shaoul, Thompson, Keiran |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.16600 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
The Phenomenology of Hallucinations
por: Ruscio, Valeria, et al.
Publicado: (2026)
por: Ruscio, Valeria, et al.
Publicado: (2026)
What are you sinking? A geometric approach on attention sink
por: Ruscio, Valeria, et al.
Publicado: (2025)
por: Ruscio, Valeria, et al.
Publicado: (2025)
TPTT: Transforming Pretrained Transformers into Titans
por: Furfaro, Fabien
Publicado: (2025)
por: Furfaro, Fabien
Publicado: (2025)
Beyond Position: the emergence of wavelet-like properties in Transformers
por: Ruscio, Valeria, et al.
Publicado: (2024)
por: Ruscio, Valeria, et al.
Publicado: (2024)
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
por: Tice, Cameron, et al.
Publicado: (2026)
por: Tice, Cameron, et al.
Publicado: (2026)
Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries
por: Yang, Blair, et al.
Publicado: (2024)
por: Yang, Blair, et al.
Publicado: (2024)
Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification
por: Mamtani, Sumit, et al.
Publicado: (2025)
por: Mamtani, Sumit, et al.
Publicado: (2025)
LEAP: Layer-wise Exit-Aware Pretraining for Efficient Transformer Inference
por: Kapadia, Shashank, et al.
Publicado: (2026)
por: Kapadia, Shashank, et al.
Publicado: (2026)
Reward Models Inherit Value Biases from Pretraining
por: Christian, Brian, et al.
Publicado: (2026)
por: Christian, Brian, et al.
Publicado: (2026)
Subjective Depth and Timescale Transformers: Learning Where and When to Compute
por: Wieser, Frederico, et al.
Publicado: (2025)
por: Wieser, Frederico, et al.
Publicado: (2025)
Where Do Reasoning Models Refuse?
por: Yamaguchi, Kureha, et al.
Publicado: (2025)
por: Yamaguchi, Kureha, et al.
Publicado: (2025)
Pretrained Hybrids with MAD Skills
por: Roberts, Nicholas, et al.
Publicado: (2024)
por: Roberts, Nicholas, et al.
Publicado: (2024)
Knowledge Circuits in Pretrained Transformers
por: Yao, Yunzhi, et al.
Publicado: (2024)
por: Yao, Yunzhi, et al.
Publicado: (2024)
Memorization Dynamics of Fill-in-the-Middle Pretraining
por: von Arx, Tobias, et al.
Publicado: (2026)
por: von Arx, Tobias, et al.
Publicado: (2026)
RLP: Reinforcement as a Pretraining Objective
por: Hatamizadeh, Ali, et al.
Publicado: (2025)
por: Hatamizadeh, Ali, et al.
Publicado: (2025)
mini-vec2vec: Scaling Universal Geometry Alignment with Linear Transformations
por: Dar, Guy
Publicado: (2025)
por: Dar, Guy
Publicado: (2025)
Where does output diversity collapse in post-training?
por: Karouzos, Constantinos, et al.
Publicado: (2026)
por: Karouzos, Constantinos, et al.
Publicado: (2026)
Fantastic Bugs and Where to Find Them in AI Benchmarks
por: Truong, Sang, et al.
Publicado: (2025)
por: Truong, Sang, et al.
Publicado: (2025)
Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings
por: Gopalakrishnan, Anand, et al.
Publicado: (2025)
por: Gopalakrishnan, Anand, et al.
Publicado: (2025)
Output Embedding Centering for Stable LLM Pretraining
por: Stollenwerk, Felix, et al.
Publicado: (2026)
por: Stollenwerk, Felix, et al.
Publicado: (2026)
Pretraining Large Language Models with NVFP4
por: NVIDIA, et al.
Publicado: (2025)
por: NVIDIA, et al.
Publicado: (2025)
Patent Language Model Pretraining with ModernBERT
por: Yousefiramandi, Amirhossein, et al.
Publicado: (2025)
por: Yousefiramandi, Amirhossein, et al.
Publicado: (2025)
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
por: Lin, Licong, et al.
Publicado: (2023)
por: Lin, Licong, et al.
Publicado: (2023)
Where Norms and References Collide: Evaluating LLMs on Normative Reasoning
por: Abrams, Mitchell, et al.
Publicado: (2026)
por: Abrams, Mitchell, et al.
Publicado: (2026)
To Memorize or to Retrieve: Scaling Laws for RAG-Considerate Pretraining
por: Singh, Karan, et al.
Publicado: (2026)
por: Singh, Karan, et al.
Publicado: (2026)
Can GRPO Help LLMs Transcend Their Pretraining Origin?
por: Ni, Kangqi, et al.
Publicado: (2025)
por: Ni, Kangqi, et al.
Publicado: (2025)
In-context Pretraining: Language Modeling Beyond Document Boundaries
por: Shi, Weijia, et al.
Publicado: (2023)
por: Shi, Weijia, et al.
Publicado: (2023)
Revisiting Multilingual Data Mixtures in Language Model Pretraining
por: Foroutan, Negar, et al.
Publicado: (2025)
por: Foroutan, Negar, et al.
Publicado: (2025)
Emergent Communication Pretraining for Few-Shot Machine Translation
por: Li, Yaoyiran, et al.
Publicado: (2020)
por: Li, Yaoyiran, et al.
Publicado: (2020)
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
por: Bayazit, Deniz, et al.
Publicado: (2023)
por: Bayazit, Deniz, et al.
Publicado: (2023)
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
por: Levi, Elad, et al.
Publicado: (2024)
por: Levi, Elad, et al.
Publicado: (2024)
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining
por: Fan, Dongyang, et al.
Publicado: (2025)
por: Fan, Dongyang, et al.
Publicado: (2025)
Does Differential Privacy Impact Bias in Pretrained NLP Models?
por: Islam, Md. Khairul, et al.
Publicado: (2024)
por: Islam, Md. Khairul, et al.
Publicado: (2024)
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
por: McLeish, Sean, et al.
Publicado: (2025)
por: McLeish, Sean, et al.
Publicado: (2025)
Pretraining with hierarchical memories: separating long-tail and common knowledge
por: Pouransari, Hadi, et al.
Publicado: (2025)
por: Pouransari, Hadi, et al.
Publicado: (2025)
Many-to-English Machine Translation Tools, Data, and Pretrained Models
por: Gowda, Thamme, et al.
Publicado: (2021)
por: Gowda, Thamme, et al.
Publicado: (2021)
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
por: Beniwal, Himanshu, et al.
Publicado: (2026)
por: Beniwal, Himanshu, et al.
Publicado: (2026)
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process
por: Zhang, Zhenyu, et al.
Publicado: (2025)
por: Zhang, Zhenyu, et al.
Publicado: (2025)
Generating Pretraining Tokens from Organic Data for Data-Bound Scaling
por: Yu, Zichun, et al.
Publicado: (2026)
por: Yu, Zichun, et al.
Publicado: (2026)
Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization
por: Kawakami, Wataru, et al.
Publicado: (2025)
por: Kawakami, Wataru, et al.
Publicado: (2025)
Ejemplares similares
-
The Phenomenology of Hallucinations
por: Ruscio, Valeria, et al.
Publicado: (2026) -
What are you sinking? A geometric approach on attention sink
por: Ruscio, Valeria, et al.
Publicado: (2025) -
TPTT: Transforming Pretrained Transformers into Titans
por: Furfaro, Fabien
Publicado: (2025) -
Beyond Position: the emergence of wavelet-like properties in Transformers
por: Ruscio, Valeria, et al.
Publicado: (2024) -
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
por: Tice, Cameron, et al.
Publicado: (2026)