Saved in:
| Main Authors: | Feng, Tony, Jung, Junehyuk, Kim, Sang-hyun, Pagano, Carlo, Gukov, Sergei, Tsai, Chiang-Chiang, Woodruff, David, Javanmard, Adel, Mokhtari, Aryan, Hwang, Dawsen, Chervonyi, Yuri, Lee, Jonathan N., Bingham, Garrett, Trinh, Trieu H., Mirrokni, Vahab, Le, Quoc V., Luong, Thang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems
by: Feng, Tony, et al.
Published: (2026)
by: Feng, Tony, et al.
Published: (2026)
Understanding the Role of Training Data in Test-Time Scaling
by: Javanmard, Adel, et al.
Published: (2025)
by: Javanmard, Adel, et al.
Published: (2025)
Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026)
by: Javanmard, Adel, et al.
Published: (2026)
PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
by: Javanmard, Adel, et al.
Published: (2024)
by: Javanmard, Adel, et al.
Published: (2024)
Improving the Variance of Differentially Private Randomized Experiments through Clustering
by: Javanmard, Adel, et al.
Published: (2023)
by: Javanmard, Adel, et al.
Published: (2023)
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
by: Chervonyi, Yuri, et al.
Published: (2025)
by: Chervonyi, Yuri, et al.
Published: (2025)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing
by: Javanmard, Adel, et al.
Published: (2025)
by: Javanmard, Adel, et al.
Published: (2025)
Learning Rate Schedules in the Presence of Distribution Shift
by: Fahrbach, Matthew, et al.
Published: (2023)
by: Fahrbach, Matthew, et al.
Published: (2023)
Optimistic Rates for Learning from Label Proportions
by: Li, Gene, et al.
Published: (2024)
by: Li, Gene, et al.
Published: (2024)
Towards Autonomous Mathematics Research
by: Feng, Tony, et al.
Published: (2026)
by: Feng, Tony, et al.
Published: (2026)
Towards Robust Mathematical Reasoning
by: Luong, Thang, et al.
Published: (2025)
by: Luong, Thang, et al.
Published: (2025)
Learning from Aggregate responses: Instance Level versus Bag Level Loss Functions
by: Javanmard, Adel, et al.
Published: (2024)
by: Javanmard, Adel, et al.
Published: (2024)
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
by: Wang, Zhecan, et al.
Published: (2024)
by: Wang, Zhecan, et al.
Published: (2024)
Progress on the Courtade-Kumar Conjecture: Optimal High-Noise Entropy Bounds and Generalized Coordinate-wise Mutual Information
by: Javanmard, Adel, et al.
Published: (2026)
by: Javanmard, Adel, et al.
Published: (2026)
DeepCrossAttention: Supercharging Transformer Residual Connections
by: Heddes, Mike, et al.
Published: (2025)
by: Heddes, Mike, et al.
Published: (2025)
Retraining with Predicted Hard Labels Provably Increases Model Accuracy
by: Das, Rudrajit, et al.
Published: (2024)
by: Das, Rudrajit, et al.
Published: (2024)
Differentially Private Synthetic Data Release for Topics API Outputs
by: Dick, Travis, et al.
Published: (2025)
by: Dick, Travis, et al.
Published: (2025)
High-Dimensional Geometric Streaming for Nearly Low Rank Data
by: Esfandiari, Hossein, et al.
Published: (2024)
by: Esfandiari, Hossein, et al.
Published: (2024)
Optimal Communication for Classic Functions in the Coordinator Model and Beyond
by: Esfandiari, Hossein, et al.
Published: (2024)
by: Esfandiari, Hossein, et al.
Published: (2024)
Lattice: Learning to Efficiently Compress the Memory
by: Karami, Mahdi, et al.
Published: (2025)
by: Karami, Mahdi, et al.
Published: (2025)
Titans: Learning to Memorize at Test Time
by: Behrouz, Ali, et al.
Published: (2024)
by: Behrouz, Ali, et al.
Published: (2024)
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
by: Kacham, Praneeth, et al.
Published: (2023)
by: Kacham, Praneeth, et al.
Published: (2023)
The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression
by: Hassani, Hamed, et al.
Published: (2022)
by: Hassani, Hamed, et al.
Published: (2022)
Pearson Chi-squared Conditional Randomization Test
by: Javanmard, Adel, et al.
Published: (2021)
by: Javanmard, Adel, et al.
Published: (2021)
Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform
by: Tao, Yuxuan, et al.
Published: (2025)
by: Tao, Yuxuan, et al.
Published: (2025)
Load Balancing with Network Latencies via Distributed Gradient Descent
by: Balseiro, Santiago R., et al.
Published: (2025)
by: Balseiro, Santiago R., et al.
Published: (2025)
Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures
by: Ene, Alina, et al.
Published: (2025)
by: Ene, Alina, et al.
Published: (2025)
Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon
by: Das, Rudrajit, et al.
Published: (2026)
by: Das, Rudrajit, et al.
Published: (2026)
Approximately Optimal Core Shapes for Tensor Decompositions
by: Ghadiri, Mehrdad, et al.
Published: (2023)
by: Ghadiri, Mehrdad, et al.
Published: (2023)
SubGen: Token Generation in Sublinear Time and Memory
by: Zandieh, Amir, et al.
Published: (2024)
by: Zandieh, Amir, et al.
Published: (2024)
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
by: Behrouz, Ali, et al.
Published: (2025)
by: Behrouz, Ali, et al.
Published: (2025)
TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs
by: Dhulipala, Laxman, et al.
Published: (2023)
by: Dhulipala, Laxman, et al.
Published: (2023)
Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions
by: Epasto, Alessandro, et al.
Published: (2020)
by: Epasto, Alessandro, et al.
Published: (2020)
Sampling and Loss Weights in Multi-Domain Training
by: Salmani, Mahdi, et al.
Published: (2025)
by: Salmani, Mahdi, et al.
Published: (2025)
ECO: Quantized Training without Full-Precision Master Weights
by: Nikdan, Mahdi, et al.
Published: (2026)
by: Nikdan, Mahdi, et al.
Published: (2026)
Trellis: Learning to Compress Key-Value Memory in Attention Models
by: Karami, Mahdi, et al.
Published: (2025)
by: Karami, Mahdi, et al.
Published: (2025)
Nested Learning: The Illusion of Deep Learning Architectures
by: Behrouz, Ali, et al.
Published: (2025)
by: Behrouz, Ali, et al.
Published: (2025)
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
by: Zandieh, Amir, et al.
Published: (2025)
by: Zandieh, Amir, et al.
Published: (2025)
Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond
by: Axiotis, Kyriakos, et al.
Published: (2024)
by: Axiotis, Kyriakos, et al.
Published: (2024)
Online Learning Guided Quasi-Newton Methods with Global Non-Asymptotic Convergence
by: Jiang, Ruichen, et al.
Published: (2024)
by: Jiang, Ruichen, et al.
Published: (2024)
Similar Items
-
Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on the Erdős Problems
by: Feng, Tony, et al.
Published: (2026) -
Understanding the Role of Training Data in Test-Time Scaling
by: Javanmard, Adel, et al.
Published: (2025) -
Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026) -
PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
by: Javanmard, Adel, et al.
Published: (2024) -
Improving the Variance of Differentially Private Randomized Experiments through Clustering
by: Javanmard, Adel, et al.
Published: (2023)