Saved in:
| Main Authors: | Dimlioglu, Tolga, Topollai, Kristi, Choromanska, Anna |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.27739 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization
by: Topollai, Kristi, et al.
Published: (2026)
by: Topollai, Kristi, et al.
Published: (2026)
Understanding Quantization of Optimizer States in LLM Pre-training: Dynamics of State Staleness and Effectiveness of State Resets
by: Topollai, Kristi, et al.
Published: (2026)
by: Topollai, Kristi, et al.
Published: (2026)
Task-Level Contrastiveness for Cross-Domain Few-Shot Learning
by: Topollai, Kristi, et al.
Published: (2025)
by: Topollai, Kristi, et al.
Published: (2025)
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
by: Topollai, Kristi, et al.
Published: (2025)
by: Topollai, Kristi, et al.
Published: (2025)
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
by: Dimlioglu, Tolga, et al.
Published: (2025)
by: Dimlioglu, Tolga, et al.
Published: (2025)
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models
by: Dimlioglu, Tolga, et al.
Published: (2024)
by: Dimlioglu, Tolga, et al.
Published: (2024)
Streamlining Industrial Contract Management with Retrieval-Augmented LLMs
by: Topollai, Kristi, et al.
Published: (2025)
by: Topollai, Kristi, et al.
Published: (2025)
OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction
by: Hemadri, Raghu Vamshi, et al.
Published: (2025)
by: Hemadri, Raghu Vamshi, et al.
Published: (2025)
Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems
by: Dimlioglu, Tolga, et al.
Published: (2026)
by: Dimlioglu, Tolga, et al.
Published: (2026)
SGD at the Edge of Stability: The Stochastic Sharpness Gap
by: Liao, Fangshuo, et al.
Published: (2026)
by: Liao, Fangshuo, et al.
Published: (2026)
Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization
by: Lei, Yunwen, et al.
Published: (2023)
by: Lei, Yunwen, et al.
Published: (2023)
PromptSplit: Revealing Prompt-Level Disagreement in Generative Models
by: Lotfian, Mehdi, et al.
Published: (2026)
by: Lotfian, Mehdi, et al.
Published: (2026)
ACE and Diverse Generalization via Selective Disagreement
by: Daniels, Oliver, et al.
Published: (2025)
by: Daniels, Oliver, et al.
Published: (2025)
Mitigating Spurious Correlations via Disagreement Probability
by: Han, Hyeonggeun, et al.
Published: (2024)
by: Han, Hyeonggeun, et al.
Published: (2024)
STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems
by: Bonagiri, Akash, et al.
Published: (2026)
by: Bonagiri, Akash, et al.
Published: (2026)
DIVE: Subgraph Disagreement for Graph Out-of-Distribution Generalization
by: Sun, Xin, et al.
Published: (2024)
by: Sun, Xin, et al.
Published: (2024)
Bootstrap SGD: Algorithmic Stability and Robustness
by: Christmann, Andreas, et al.
Published: (2024)
by: Christmann, Andreas, et al.
Published: (2024)
DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding
by: Zou, Mingxi, et al.
Published: (2026)
by: Zou, Mingxi, et al.
Published: (2026)
The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective
by: Krishna, Satyapriya, et al.
Published: (2022)
by: Krishna, Satyapriya, et al.
Published: (2022)
Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection
by: Zhu, Haoran, et al.
Published: (2025)
by: Zhu, Haoran, et al.
Published: (2025)
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
by: Luo, Haocheng, et al.
Published: (2026)
by: Luo, Haocheng, et al.
Published: (2026)
Anon: Extrapolating Adaptivity Beyond SGD and Adam
by: Zhang, Yiheng, et al.
Published: (2026)
by: Zhang, Yiheng, et al.
Published: (2026)
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
by: Kim, Jihwan, et al.
Published: (2026)
by: Kim, Jihwan, et al.
Published: (2026)
Accumulative SGD Influence Estimation for Data Attribution
by: Shi, Yunxiao, et al.
Published: (2025)
by: Shi, Yunxiao, et al.
Published: (2025)
RQP-SGD: Differential Private Machine Learning through Noisy SGD and Randomized Quantization
by: Feng, Ce, et al.
Published: (2024)
by: Feng, Ce, et al.
Published: (2024)
Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing
by: Wagner, Dominik, et al.
Published: (2024)
by: Wagner, Dominik, et al.
Published: (2024)
DC-SGD: Differentially Private SGD with Dynamic Clipping through Gradient Norm Distribution Estimation
by: Wei, Chengkun, et al.
Published: (2025)
by: Wei, Chengkun, et al.
Published: (2025)
StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
by: Yu, Dingzhi, et al.
Published: (2026)
by: Yu, Dingzhi, et al.
Published: (2026)
SWAN: SGD with Normalization and Whitening Enables Stateless LLM Training
by: Ma, Chao, et al.
Published: (2024)
by: Ma, Chao, et al.
Published: (2024)
EMoE: Training-Free Expert Disagreement for Uncertainty-Aware Text-to-Image Diffusion
by: Berry, Lucas, et al.
Published: (2025)
by: Berry, Lucas, et al.
Published: (2025)
INO-SGD: Addressing Utility Imbalance under Individualized Differential Privacy
by: Tian, Xiao, et al.
Published: (2026)
by: Tian, Xiao, et al.
Published: (2026)
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
by: Zhang, Tongcheng, et al.
Published: (2026)
by: Zhang, Tongcheng, et al.
Published: (2026)
Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning
by: Deng, Yuyang, et al.
Published: (2025)
by: Deng, Yuyang, et al.
Published: (2025)
Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care
by: Singh, Prabhjot, et al.
Published: (2026)
by: Singh, Prabhjot, et al.
Published: (2026)
Connections between Schedule-Free Optimizers, AdEMAMix, and Accelerated SGD Variants
by: Morwani, Depen, et al.
Published: (2025)
by: Morwani, Depen, et al.
Published: (2025)
Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight
by: Huang, Tao, et al.
Published: (2024)
by: Huang, Tao, et al.
Published: (2024)
APOLLO: SGD-like Memory, AdamW-level Performance
by: Zhu, Hanqing, et al.
Published: (2024)
by: Zhu, Hanqing, et al.
Published: (2024)
Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs
by: Mukherjee, Sagnik, et al.
Published: (2026)
by: Mukherjee, Sagnik, et al.
Published: (2026)
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
by: Jin, Ruinan, et al.
Published: (2026)
by: Jin, Ruinan, et al.
Published: (2026)
Similar Items
-
Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization
by: Topollai, Kristi, et al.
Published: (2026) -
Understanding Quantization of Optimizer States in LLM Pre-training: Dynamics of State Staleness and Effectiveness of State Resets
by: Topollai, Kristi, et al.
Published: (2026) -
Task-Level Contrastiveness for Cross-Domain Few-Shot Learning
by: Topollai, Kristi, et al.
Published: (2025) -
Adaptive Memory Momentum via a Model-Based Framework for Deep Learning Optimization
by: Topollai, Kristi, et al.
Published: (2025) -
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
by: Dimlioglu, Tolga, et al.
Published: (2025)