:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Tony, Jung, Junehyuk, Kim, Sang-hyun, Pagano, Carlo, Gukov, Sergei, Tsai, Chiang-Chiang, Woodruff, David, Javanmard, Adel, Mokhtari, Aryan, Hwang, Dawsen, Chervonyi, Yuri, Lee, Jonathan N., Bingham, Garrett, Trinh, Trieu H., Mirrokni, Vahab, Le, Quoc V., Luong, Thang
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2602.21201
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems
by: Feng, Tony, et al.
Published: (2026)

Understanding the Role of Training Data in Test-Time Scaling
by: Javanmard, Adel, et al.
Published: (2025)

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026)

PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
by: Javanmard, Adel, et al.
Published: (2024)

Improving the Variance of Differentially Private Randomized Experiments through Clustering
by: Javanmard, Adel, et al.
Published: (2023)

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
by: Chervonyi, Yuri, et al.
Published: (2025)

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing
by: Javanmard, Adel, et al.
Published: (2025)

Learning Rate Schedules in the Presence of Distribution Shift
by: Fahrbach, Matthew, et al.
Published: (2023)

Optimistic Rates for Learning from Label Proportions
by: Li, Gene, et al.
Published: (2024)

Towards Autonomous Mathematics Research
by: Feng, Tony, et al.
Published: (2026)

Towards Robust Mathematical Reasoning
by: Luong, Thang, et al.
Published: (2025)

Learning from Aggregate responses: Instance Level versus Bag Level Loss Functions
by: Javanmard, Adel, et al.
Published: (2024)

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
by: Wang, Zhecan, et al.
Published: (2024)

Progress on the Courtade-Kumar Conjecture: Optimal High-Noise Entropy Bounds and Generalized Coordinate-wise Mutual Information
by: Javanmard, Adel, et al.
Published: (2026)

DeepCrossAttention: Supercharging Transformer Residual Connections
by: Heddes, Mike, et al.
Published: (2025)

Retraining with Predicted Hard Labels Provably Increases Model Accuracy
by: Das, Rudrajit, et al.
Published: (2024)

Differentially Private Synthetic Data Release for Topics API Outputs
by: Dick, Travis, et al.
Published: (2025)

High-Dimensional Geometric Streaming for Nearly Low Rank Data
by: Esfandiari, Hossein, et al.
Published: (2024)

Optimal Communication for Classic Functions in the Coordinator Model and Beyond
by: Esfandiari, Hossein, et al.
Published: (2024)

Lattice: Learning to Efficiently Compress the Memory
by: Karami, Mahdi, et al.
Published: (2025)

Titans: Learning to Memorize at Test Time
by: Behrouz, Ali, et al.
Published: (2024)

PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
by: Kacham, Praneeth, et al.
Published: (2023)

The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression
by: Hassani, Hamed, et al.
Published: (2022)

Pearson Chi-squared Conditional Randomization Test
by: Javanmard, Adel, et al.
Published: (2021)

Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform
by: Tao, Yuxuan, et al.
Published: (2025)

Load Balancing with Network Latencies via Distributed Gradient Descent
by: Balseiro, Santiago R., et al.
Published: (2025)

Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
by: Ene, Alina, et al.
Published: (2025)

Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon
by: Das, Rudrajit, et al.
Published: (2026)

Approximately Optimal Core Shapes for Tensor Decompositions
by: Ghadiri, Mehrdad, et al.
Published: (2023)

SubGen: Token Generation in Sublinear Time and Memory
by: Zandieh, Amir, et al.
Published: (2024)

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
by: Behrouz, Ali, et al.
Published: (2025)

TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs
by: Dhulipala, Laxman, et al.
Published: (2023)

Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions
by: Epasto, Alessandro, et al.
Published: (2020)

Sampling and Loss Weights in Multi-Domain Training
by: Salmani, Mahdi, et al.
Published: (2025)

ECO: Quantized Training without Full-Precision Master Weights
by: Nikdan, Mahdi, et al.
Published: (2026)

Trellis: Learning to Compress Key-Value Memory in Attention Models
by: Karami, Mahdi, et al.
Published: (2025)

Nested Learning: The Illusion of Deep Learning Architectures
by: Behrouz, Ali, et al.
Published: (2025)

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
by: Zandieh, Amir, et al.
Published: (2025)

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond
by: Axiotis, Kyriakos, et al.
Published: (2024)

Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence
by: Jiang, Ruichen, et al.
Published: (2024)