:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gloeckle, Fabian, Rammal, Ahmad, Arnal, Charles, Munos, Remi, Cabannes, Vivien, Synnaeve, Gabriel, Hayat, Amaury
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.03071
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Formalizing Mathematics at Scale
by: Rammal, Ahmad, et al.
Published: (2026)

Touring sampling with pushforward maps
by: Cabannes, Vivien, et al.
Published: (2023)

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics
by: Peyronnet, Antoine, et al.
Published: (2026)

WybeCoder: Verified Imperative Code Generation
by: Gloeckle, Fabian, et al.
Published: (2026)

Provable Benefits of In-Tool Learning for Large Language Models
by: Houliston, Sam, et al.
Published: (2025)

Learning with Hidden Factorial Structure
by: Arnal, Charles, et al.
Published: (2024)

Efficient RL Training for LLMs with Experience Replay
by: Arnal, Charles, et al.
Published: (2026)

Distilling LLM Feedback for Lean Theorem Proving
by: Narozniak, Gaetan, et al.
Published: (2026)

Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
by: Arnal, Charles, et al.
Published: (2025)

Iteration Head: A Mechanistic Study of Chain-of-Thought
by: Cabannes, Vivien, et al.
Published: (2024)

Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)

The Galerkin method beats Graph-Based Approaches for Spectral Algorithms
by: Cabannes, Vivien, et al.
Published: (2023)

Mode Estimation with Partial Feedback
by: Arnal, Charles, et al.
Published: (2024)

Learning Associative Memories with Gradient Descent
by: Cabannes, Vivien, et al.
Published: (2024)

Super-Exponential Regret for UCT, AlphaGo and Variants
by: Orseau, Laurent, et al.
Published: (2024)

Scaling Laws for Associative Memories
by: Cabannes, Vivien, et al.
Published: (2023)

Optimizing Language Models for Inference Time Objectives using Reinforcement Learning
by: Tang, Yunhao, et al.
Published: (2025)

Short window attention enables long-term memorization
by: Cabannes, Loïc, et al.
Published: (2025)

Stochastic activations
by: Lomeli, Maria, et al.
Published: (2025)

Spectral bandits
by: Kocák, Tomáš, et al.
Published: (2026)

Positional Encoding via Token-Aware Phase Attention
by: Wang, Yu, et al.
Published: (2025)

ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
by: Gu, Alex, et al.
Published: (2025)

A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
by: Sancaktar, Cansu, et al.
Published: (2026)

Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
by: Hassid, Michael, et al.
Published: (2025)

Correlated Quantization for Faster Nonconvex Distributed Optimization
by: Panferov, Andrei, et al.
Published: (2024)

Temporal Difference Flows
by: Farebrother, Jesse, et al.
Published: (2025)

Better & Faster Large Language Models via Multi-token Prediction
by: Gloeckle, Fabian, et al.
Published: (2024)

The KoLMogorov Test: Compression by Code Generation
by: Yoran, Ori, et al.
Published: (2025)

Towards a Neural Debugger for Python
by: Beck, Maximilian, et al.
Published: (2026)

BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?
by: Chambon, Pierre, et al.
Published: (2025)

Safety Alignment of LMs via Non-cooperative Games
by: Paulus, Anselm, et al.
Published: (2025)

Learning Mathematical Rules with Large Language Models
by: Gorceix, Antoine, et al.
Published: (2024)

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models
by: Abraham, Louis, et al.
Published: (2024)

Reframing Data Value for Large Language Models Through the Lens of Plausibility
by: Rammal, Mohamad Rida, et al.
Published: (2024)

Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification
by: Donhauser, Konstantin, et al.
Published: (2025)

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
by: Gehring, Jonas, et al.
Published: (2024)

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
by: Cummins, Chris, et al.
Published: (2024)

An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks
by: Stefan, Gabriel, et al.
Published: (2026)

Extrapolative Weight Averaging Reveals Correctness-Efficiency Frontiers in Code RL
by: Zheng, Kunhao, et al.
Published: (2026)

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
by: Gu, Alex, et al.
Published: (2024)