Saved in:
| Main Authors: | Jourdan, Marc, Yüce, Gizem, Flammarion, Nicolas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.23557 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning In-context n-grams with Transformers: Sub-n-grams Are Near-stationary Points
by: Varre, Aditya, et al.
Published: (2025)
by: Varre, Aditya, et al.
Published: (2025)
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
by: D'Angelo, Francesco, et al.
Published: (2026)
by: D'Angelo, Francesco, et al.
Published: (2026)
Early alignment in two-layer networks training is a two-edged sword
by: Boursier, Etienne, et al.
Published: (2024)
by: Boursier, Etienne, et al.
Published: (2024)
Simplicity bias and optimization threshold in two-layer ReLU networks
by: Boursier, Etienne, et al.
Published: (2024)
by: Boursier, Etienne, et al.
Published: (2024)
Penalising the biases in norm regularisation enforces sparsity
by: Boursier, Etienne, et al.
Published: (2023)
by: Boursier, Etienne, et al.
Published: (2023)
Pareto Set Identification With Posterior Sampling
by: Kone, Cyrille, et al.
Published: (2024)
by: Kone, Cyrille, et al.
Published: (2024)
Learning Algorithms in the Limit
by: Papazov, Hristo, et al.
Published: (2025)
by: Papazov, Hristo, et al.
Published: (2025)
(How) Learning Rates Regulate Catastrophic Overtraining
by: Rofin, Mark, et al.
Published: (2026)
by: Rofin, Mark, et al.
Published: (2026)
Does Refusal Training in LLMs Generalize to the Past Tense?
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
Exact Learning of Arithmetic with Differentiable Agents
by: Papazov, Hristo, et al.
Published: (2025)
by: Papazov, Hristo, et al.
Published: (2025)
On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks
by: Neuhaus, Yannic, et al.
Published: (2026)
by: Neuhaus, Yannic, et al.
Published: (2026)
Incremental Learning of Sparse Attention Patterns in Transformers
by: Yüksel, Oğuz Kaan, et al.
Published: (2026)
by: Yüksel, Oğuz Kaan, et al.
Published: (2026)
Why Do We Need Weight Decay in Modern Deep Learning?
by: D'Angelo, Francesco, et al.
Published: (2023)
by: D'Angelo, Francesco, et al.
Published: (2023)
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
by: Papazov, Hristo, et al.
Published: (2024)
by: Papazov, Hristo, et al.
Published: (2024)
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
by: Varre, Aditya, et al.
Published: (2026)
by: Varre, Aditya, et al.
Published: (2026)
Optimal Best Arm Identification under Differential Privacy
by: Jourdan, Marc, et al.
Published: (2025)
by: Jourdan, Marc, et al.
Published: (2025)
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
by: Boursier, Etienne, et al.
Published: (2022)
by: Boursier, Etienne, et al.
Published: (2022)
First-order ANIL provably learns representations despite overparametrization
by: Yüksel, Oğuz Kaan, et al.
Published: (2023)
by: Yüksel, Oğuz Kaan, et al.
Published: (2023)
Is In-Context Learning Sufficient for Instruction Following in LLMs?
by: Zhao, Hao, et al.
Published: (2024)
by: Zhao, Hao, et al.
Published: (2024)
Selective Induction Heads: How Transformers Select Causal Structures In Context
by: D'Angelo, Francesco, et al.
Published: (2025)
by: D'Angelo, Francesco, et al.
Published: (2025)
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
by: Andriushchenko, Maksym, et al.
Published: (2024)
by: Andriushchenko, Maksym, et al.
Published: (2024)
Implicit Bias of Mirror Flow on Separable Data
by: Pesme, Scott, et al.
Published: (2024)
by: Pesme, Scott, et al.
Published: (2024)
Long-Context Linear System Identification
by: Yüksel, Oğuz Kaan, et al.
Published: (2024)
by: Yüksel, Oğuz Kaan, et al.
Published: (2024)
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
by: Schlarmann, Christian, et al.
Published: (2025)
by: Schlarmann, Christian, et al.
Published: (2025)
An Anytime Algorithm for Good Arm Identification
by: Jourdan, Marc, et al.
Published: (2023)
by: Jourdan, Marc, et al.
Published: (2023)
Best-Arm Identification in Unimodal Bandits
by: Poiani, Riccardo, et al.
Published: (2024)
by: Poiani, Riccardo, et al.
Published: (2024)
Contextual Preference Distribution Learning
by: Hudson, Benjamin, et al.
Published: (2026)
by: Hudson, Benjamin, et al.
Published: (2026)
SIEVE: Sample-Efficient Parametric Learning from Natural Language
by: Asawa, Parth, et al.
Published: (2026)
by: Asawa, Parth, et al.
Published: (2026)
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
by: Ghazanfari, Sara, et al.
Published: (2024)
by: Ghazanfari, Sara, et al.
Published: (2024)
Privacy Assessment of Federated Learning using Private Personalized Layers
by: Jourdan, Théo, et al.
Published: (2021)
by: Jourdan, Théo, et al.
Published: (2021)
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
by: Kuntz, Thomas, et al.
Published: (2025)
by: Kuntz, Thomas, et al.
Published: (2025)
Differentially Private Best-Arm Identification
by: Azize, Achraf, et al.
Published: (2024)
by: Azize, Achraf, et al.
Published: (2024)
Speculative Sampling for Parametric Temporal Point Processes
by: Biloš, Marin, et al.
Published: (2025)
by: Biloš, Marin, et al.
Published: (2025)
Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning
by: Diwan, Nirav, et al.
Published: (2025)
by: Diwan, Nirav, et al.
Published: (2025)
Mini-Batch Kernel $k$-means
by: Jourdan, Ben, et al.
Published: (2024)
by: Jourdan, Ben, et al.
Published: (2024)
Leveraging Sparsity for Sample-Efficient Preference Learning: A Theoretical Perspective
by: Yao, Yunzhen, et al.
Published: (2025)
by: Yao, Yunzhen, et al.
Published: (2025)
Finite Sample Bounds for Non-Parametric Regression: Optimal Sample Efficiency and Space Complexity
by: Maran, Davide, et al.
Published: (2024)
by: Maran, Davide, et al.
Published: (2024)
Preference as Reward, Maximum Preference Optimization with Importance Sampling
by: Jiang, Zaifan, et al.
Published: (2023)
by: Jiang, Zaifan, et al.
Published: (2023)
HypeMARL: Multi-Agent Reinforcement Learning For High-Dimensional, Parametric, and Distributed Systems
by: Botteghi, Nicolò, et al.
Published: (2025)
by: Botteghi, Nicolò, et al.
Published: (2025)
Distributed Direct Preference Optimization
by: Jiang, Zhanhong
Published: (2026)
by: Jiang, Zhanhong
Published: (2026)
Similar Items
-
Learning In-context n-grams with Transformers: Sub-n-grams Are Near-stationary Points
by: Varre, Aditya, et al.
Published: (2025) -
Transformers Learn Latent Mixture Models In-Context via Mirror Descent
by: D'Angelo, Francesco, et al.
Published: (2026) -
Early alignment in two-layer networks training is a two-edged sword
by: Boursier, Etienne, et al.
Published: (2024) -
Simplicity bias and optimization threshold in two-layer ReLU networks
by: Boursier, Etienne, et al.
Published: (2024) -
Penalising the biases in norm regularisation enforces sparsity
by: Boursier, Etienne, et al.
Published: (2023)