Saved in:
| Main Authors: | Xie, Zeke, Xu, Zhiqiang, Zhang, Jingzhao, Sato, Issei, Sugiyama, Masashi |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2011.11152 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025)
by: Tomihari, Akiyoshi, et al.
Published: (2025)
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
by: Xu, Kevin, et al.
Published: (2025)
by: Xu, Kevin, et al.
Published: (2025)
A Formal Comparison Between Chain of Thought and Latent Thought
by: Xu, Kevin, et al.
Published: (2025)
by: Xu, Kevin, et al.
Published: (2025)
On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
by: Chen, Lesi, et al.
Published: (2023)
by: Chen, Lesi, et al.
Published: (2023)
Fix Initial Codes and Iteratively Refine Textual Directions Toward Safe Multi-Turn Code Correction
by: Tanaka, Yuto, et al.
Published: (2026)
by: Tanaka, Yuto, et al.
Published: (2026)
Decoupled Weight Decay for Any $p$ Norm
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)
Learning Robust Diffusion Models from Imprecise Supervision
by: Wu, Dong-Dong, et al.
Published: (2025)
by: Wu, Dong-Dong, et al.
Published: (2025)
Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)
by: Ackermann, Johannes, et al.
Published: (2026)
VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
by: Cai, Xin-Qiang, et al.
Published: (2026)
by: Cai, Xin-Qiang, et al.
Published: (2026)
Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
by: Xu, Jing, et al.
Published: (2024)
by: Xu, Jing, et al.
Published: (2024)
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
by: Ono, Shinnosuke, et al.
Published: (2026)
by: Ono, Shinnosuke, et al.
Published: (2026)
Mano: Restriking Manifold Optimization for LLM Training
by: Gu, Yufei, et al.
Published: (2026)
by: Gu, Yufei, et al.
Published: (2026)
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
by: Ackermann, Johannes, et al.
Published: (2024)
by: Ackermann, Johannes, et al.
Published: (2024)
GradientStabilizer:Fix the Norm, Not the Gradient
by: Huang, Tianjin, et al.
Published: (2025)
by: Huang, Tianjin, et al.
Published: (2025)
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
by: Gao, Chengqian, et al.
Published: (2025)
by: Gao, Chengqian, et al.
Published: (2025)
Weak-to-Strong Diffusion with Reflection
by: Bai, Lichen, et al.
Published: (2025)
by: Bai, Lichen, et al.
Published: (2025)
On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
by: Nishimori, Soichiro, et al.
Published: (2025)
by: Nishimori, Soichiro, et al.
Published: (2025)
On the Condition Number Dependency in Bilevel Optimization
by: Chen, Lesi, et al.
Published: (2025)
by: Chen, Lesi, et al.
Published: (2025)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
Towards Scalable Oversight via Partitioned Human Supervision
by: Yin, Ren, et al.
Published: (2025)
by: Yin, Ren, et al.
Published: (2025)
Low Rank Gradients and Where to Find Them
by: Sonthalia, Rishi, et al.
Published: (2025)
by: Sonthalia, Rishi, et al.
Published: (2025)
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
by: Chen, Hao, et al.
Published: (2023)
by: Chen, Hao, et al.
Published: (2023)
Reasoning Inconsistencies and How to Mitigate Them in Deep Learning
by: Arakelyan, Erik
Published: (2025)
by: Arakelyan, Erik
Published: (2025)
Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone
by: Hassanpour, Negar, et al.
Published: (2025)
by: Hassanpour, Negar, et al.
Published: (2025)
Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate
by: Xu, Huangyu, et al.
Published: (2026)
by: Xu, Huangyu, et al.
Published: (2026)
Sharpness-Aware Black-Box Optimization
by: Ye, Feiyang, et al.
Published: (2024)
by: Ye, Feiyang, et al.
Published: (2024)
The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
by: Xu, Yongzhong
Published: (2026)
by: Xu, Yongzhong
Published: (2026)
From $\log π$ to $π$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight
by: Fu, Xiaoliang, et al.
Published: (2026)
by: Fu, Xiaoliang, et al.
Published: (2026)
GNN Explanations that do not Explain and How to find Them
by: Azzolin, Steve, et al.
Published: (2026)
by: Azzolin, Steve, et al.
Published: (2026)
How to Square Tensor Networks and Circuits Without Squaring Them
by: Loconte, Lorenzo, et al.
Published: (2025)
by: Loconte, Lorenzo, et al.
Published: (2025)
Calibrated Language Models and How to Find Them with Label Smoothing
by: Huang, Jerry, et al.
Published: (2025)
by: Huang, Jerry, et al.
Published: (2025)
Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
by: Nishimori, Soichiro, et al.
Published: (2026)
by: Nishimori, Soichiro, et al.
Published: (2026)
Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
by: Zhang, Zhen-Yu, et al.
Published: (2024)
by: Zhang, Zhen-Yu, et al.
Published: (2024)
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025)
by: Xu, Haoran, et al.
Published: (2025)
Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
by: Liu, Siyuan, et al.
Published: (2026)
by: Liu, Siyuan, et al.
Published: (2026)
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
by: Xu, Jie, et al.
Published: (2025)
by: Xu, Jie, et al.
Published: (2025)
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
by: Fan, Zhiyuan, et al.
Published: (2025)
by: Fan, Zhiyuan, et al.
Published: (2025)
Fantastic Copyrighted Beasts and How (Not) to Generate Them
by: He, Luxi, et al.
Published: (2024)
by: He, Luxi, et al.
Published: (2024)
Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
by: Wen, Xuexiang, et al.
Published: (2026)
by: Wen, Xuexiang, et al.
Published: (2026)
AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
by: He, Di, et al.
Published: (2025)
by: He, Di, et al.
Published: (2025)
Similar Items
-
Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025) -
To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
by: Xu, Kevin, et al.
Published: (2025) -
A Formal Comparison Between Chain of Thought and Latent Thought
by: Xu, Kevin, et al.
Published: (2025) -
On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
by: Chen, Lesi, et al.
Published: (2023) -
Fix Initial Codes and Iteratively Refine Textual Directions Toward Safe Multi-Turn Code Correction
by: Tanaka, Yuto, et al.
Published: (2026)