:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xie, Zeke, Xu, Zhiqiang, Zhang, Jingzhao, Sato, Issei, Sugiyama, Masashi
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2011.11152
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Understanding Transformer Optimization via Gradient Heterogeneity
by: Tomihari, Akiyoshi, et al.
Published: (2025)

To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers
by: Xu, Kevin, et al.
Published: (2025)

A Formal Comparison Between Chain of Thought and Latent Thought
by: Xu, Kevin, et al.
Published: (2025)

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
by: Chen, Lesi, et al.
Published: (2023)

Fix Initial Codes and Iteratively Refine Textual Directions Toward Safe Multi-Turn Code Correction
by: Tanaka, Yuto, et al.
Published: (2026)

Decoupled Weight Decay for Any $p$ Norm
by: Outmezguine, Nadav Joseph, et al.
Published: (2024)

Learning Robust Diffusion Models from Imprecise Supervision
by: Wu, Dong-Dong, et al.
Published: (2025)

Gradient Regularization Prevents Reward Hacking in Reinforcement Learning from Human Feedback and Verifiable Rewards
by: Ackermann, Johannes, et al.
Published: (2026)

VI-CuRL: Stabilizing Verifier-Independent RL Reasoning via Confidence-Guided Variance Reduction
by: Cai, Xin-Qiang, et al.
Published: (2026)

Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
by: Xu, Jing, et al.
Published: (2024)

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
by: Ono, Shinnosuke, et al.
Published: (2026)

Mano: Restriking Manifold Optimization for LLM Training
by: Gu, Yufei, et al.
Published: (2026)

Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
by: Ackermann, Johannes, et al.
Published: (2024)

GradientStabilizer:Fix the Norm, Not the Gradient
by: Huang, Tianjin, et al.
Published: (2025)

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
by: Gao, Chengqian, et al.
Published: (2025)

Weak-to-Strong Diffusion with Reflection
by: Bai, Lichen, et al.
Published: (2025)

On Symmetric Losses for Robust Policy Optimization with Noisy Preferences
by: Nishimori, Soichiro, et al.
Published: (2025)

On the Condition Number Dependency in Bilevel Optimization
by: Chen, Lesi, et al.
Published: (2025)

Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)

Towards Scalable Oversight via Partitioned Human Supervision
by: Yin, Ren, et al.
Published: (2025)

Low Rank Gradients and Where to Find Them
by: Sonthalia, Rishi, et al.
Published: (2025)

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
by: Chen, Hao, et al.
Published: (2023)

Reasoning Inconsistencies and How to Mitigate Them in Deep Learning
by: Arakelyan, Erik
Published: (2025)

Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone
by: Hassanpour, Negar, et al.
Published: (2025)

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate
by: Xu, Huangyu, et al.
Published: (2026)

Sharpness-Aware Black-Box Optimization
by: Ye, Feiyang, et al.
Published: (2024)

The Geometry of Multi-Task Grokking: Transverse Instability, Superposition, and Weight Decay Phase Structure
by: Xu, Yongzhong
Published: (2026)

From $\log π$ to $π$: Taming Divergence in Soft Clipping via Bilateral Decoupled Decay of Probability Gradient Weight
by: Fu, Xiaoliang, et al.
Published: (2026)

GNN Explanations that do not Explain and How to find Them
by: Azzolin, Steve, et al.
Published: (2026)

How to Square Tensor Networks and Circuits Without Squaring Them
by: Loconte, Lorenzo, et al.
Published: (2025)

Calibrated Language Models and How to Find Them with Label Smoothing
by: Huang, Jerry, et al.
Published: (2025)

Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
by: Nishimori, Soichiro, et al.
Published: (2026)

Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
by: Zhang, Zhen-Yu, et al.
Published: (2024)

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025)

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
by: Liu, Siyuan, et al.
Published: (2026)

Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
by: Xu, Jie, et al.
Published: (2025)

Robust Layerwise Scaling Rules by Proper Weight Decay Tuning
by: Fan, Zhiyuan, et al.
Published: (2025)

Fantastic Copyrighted Beasts and How (Not) to Generate Them
by: He, Luxi, et al.
Published: (2024)

Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward
by: Wen, Xuexiang, et al.
Published: (2026)

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs
by: He, Di, et al.
Published: (2025)