Saved in:
| Main Authors: | Gloeckle, Fabian, Rammal, Ahmad, Arnal, Charles, Munos, Remi, Cabannes, Vivien, Synnaeve, Gabriel, Hayat, Amaury |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.03071 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Formalizing Mathematics at Scale
by: Rammal, Ahmad, et al.
Published: (2026)
by: Rammal, Ahmad, et al.
Published: (2026)
Touring sampling with pushforward maps
by: Cabannes, Vivien, et al.
Published: (2023)
by: Cabannes, Vivien, et al.
Published: (2023)
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
by: Peyronnet, Antoine, et al.
Published: (2026)
by: Peyronnet, Antoine, et al.
Published: (2026)
WybeCoder: Verified Imperative Code Generation
by: Gloeckle, Fabian, et al.
Published: (2026)
by: Gloeckle, Fabian, et al.
Published: (2026)
Provable Benefits of In-Tool Learning for Large Language Models
by: Houliston, Sam, et al.
Published: (2025)
by: Houliston, Sam, et al.
Published: (2025)
Learning with Hidden Factorial Structure
by: Arnal, Charles, et al.
Published: (2024)
by: Arnal, Charles, et al.
Published: (2024)
Efficient RL Training for LLMs with Experience Replay
by: Arnal, Charles, et al.
Published: (2026)
by: Arnal, Charles, et al.
Published: (2026)
Distilling LLM Feedback for Lean Theorem Proving
by: Narozniak, Gaetan, et al.
Published: (2026)
by: Narozniak, Gaetan, et al.
Published: (2026)
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
by: Arnal, Charles, et al.
Published: (2025)
by: Arnal, Charles, et al.
Published: (2025)
Iteration Head: A Mechanistic Study of Chain-of-Thought
by: Cabannes, Vivien, et al.
Published: (2024)
by: Cabannes, Vivien, et al.
Published: (2024)
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)
by: Cohen, Taco, et al.
Published: (2025)
The Galerkin method beats Graph-Based Approaches for Spectral Algorithms
by: Cabannes, Vivien, et al.
Published: (2023)
by: Cabannes, Vivien, et al.
Published: (2023)
Mode Estimation with Partial Feedback
by: Arnal, Charles, et al.
Published: (2024)
by: Arnal, Charles, et al.
Published: (2024)
Learning Associative Memories with Gradient Descent
by: Cabannes, Vivien, et al.
Published: (2024)
by: Cabannes, Vivien, et al.
Published: (2024)
Super-Exponential Regret for UCT, AlphaGo and Variants
by: Orseau, Laurent, et al.
Published: (2024)
by: Orseau, Laurent, et al.
Published: (2024)
Scaling Laws for Associative Memories
by: Cabannes, Vivien, et al.
Published: (2023)
by: Cabannes, Vivien, et al.
Published: (2023)
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
by: Tang, Yunhao, et al.
Published: (2025)
by: Tang, Yunhao, et al.
Published: (2025)
Short window attention enables long-term memorization
by: Cabannes, Loïc, et al.
Published: (2025)
by: Cabannes, Loïc, et al.
Published: (2025)
Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)
by: Lomeli, Maria, et al.
Published: (2025)
Spectral bandits
by: Kocák, Tomáš, et al.
Published: (2026)
by: Kocák, Tomáš, et al.
Published: (2026)
Positional Encoding via Token-Aware Phase Attention
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
by: Gu, Alex, et al.
Published: (2025)
by: Gu, Alex, et al.
Published: (2025)
A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
by: Sancaktar, Cansu, et al.
Published: (2026)
by: Sancaktar, Cansu, et al.
Published: (2026)
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
by: Hassid, Michael, et al.
Published: (2025)
by: Hassid, Michael, et al.
Published: (2025)
Correlated Quantization for Faster Nonconvex Distributed Optimization
by: Panferov, Andrei, et al.
Published: (2024)
by: Panferov, Andrei, et al.
Published: (2024)
Temporal Difference Flows
by: Farebrother, Jesse, et al.
Published: (2025)
by: Farebrother, Jesse, et al.
Published: (2025)
Better & Faster Large Language Models via Multi-token Prediction
by: Gloeckle, Fabian, et al.
Published: (2024)
by: Gloeckle, Fabian, et al.
Published: (2024)
The KoLMogorov Test: Compression by Code Generation
by: Yoran, Ori, et al.
Published: (2025)
by: Yoran, Ori, et al.
Published: (2025)
Towards a Neural Debugger for Python
by: Beck, Maximilian, et al.
Published: (2026)
by: Beck, Maximilian, et al.
Published: (2026)
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025)
by: Chambon, Pierre, et al.
Published: (2025)
Safety Alignment of LMs via Non-cooperative Games
by: Paulus, Anselm, et al.
Published: (2025)
by: Paulus, Anselm, et al.
Published: (2025)
Learning Mathematical Rules with Large Language Models
by: Gorceix, Antoine, et al.
Published: (2024)
by: Gorceix, Antoine, et al.
Published: (2024)
Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models
by: Abraham, Louis, et al.
Published: (2024)
by: Abraham, Louis, et al.
Published: (2024)
Reframing Data Value for Large Language Models Through the Lens of Plausibility
by: Rammal, Mohamad Rida, et al.
Published: (2024)
by: Rammal, Mohamad Rida, et al.
Published: (2024)
Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification
by: Donhauser, Konstantin, et al.
Published: (2025)
by: Donhauser, Konstantin, et al.
Published: (2025)
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
by: Gehring, Jonas, et al.
Published: (2024)
by: Gehring, Jonas, et al.
Published: (2024)
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024)
by: Cummins, Chris, et al.
Published: (2024)
An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks
by: Stefan, Gabriel, et al.
Published: (2026)
by: Stefan, Gabriel, et al.
Published: (2026)
Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
by: Zheng, Kunhao, et al.
Published: (2026)
by: Zheng, Kunhao, et al.
Published: (2026)
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
by: Gu, Alex, et al.
Published: (2024)
by: Gu, Alex, et al.
Published: (2024)
Similar Items
-
Formalizing Mathematics at Scale
by: Rammal, Ahmad, et al.
Published: (2026) -
Touring sampling with pushforward maps
by: Cabannes, Vivien, et al.
Published: (2023) -
LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
by: Peyronnet, Antoine, et al.
Published: (2026) -
WybeCoder: Verified Imperative Code Generation
by: Gloeckle, Fabian, et al.
Published: (2026) -
Provable Benefits of In-Tool Learning for Large Language Models
by: Houliston, Sam, et al.
Published: (2025)