Saved in:
| Main Authors: | Li, Xiaocan, Wu, Shiliang, Shen, Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.20402 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
by: Li, Xiaocan, et al.
Published: (2025)
by: Li, Xiaocan, et al.
Published: (2025)
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
Pretraining large language models with MXFP4 on Native FP4 Hardware
by: Cim, Musa, et al.
Published: (2026)
by: Cim, Musa, et al.
Published: (2026)
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)
by: Chen, Yuxiang, et al.
Published: (2025)
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)
by: Chhugani, Jatin, et al.
Published: (2026)
Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
by: Ding, Yifu, et al.
Published: (2026)
by: Ding, Yifu, et al.
Published: (2026)
Recipes for Pre-training LLMs with MXFP8
by: Mishra, Asit, et al.
Published: (2025)
by: Mishra, Asit, et al.
Published: (2025)
MXNorm: Reusing MXFP block scales for efficient tensor normalisation
by: McLean, Callum, et al.
Published: (2026)
by: McLean, Callum, et al.
Published: (2026)
Normalization and effective learning rates in reinforcement learning
by: Lyle, Clare, et al.
Published: (2024)
by: Lyle, Clare, et al.
Published: (2024)
Causal prompting model-based offline reinforcement learning
by: Yu, Xuehui, et al.
Published: (2024)
by: Yu, Xuehui, et al.
Published: (2024)
Curriculum reinforcement learning for quantum architecture search under hardware errors
by: Patel, Yash J., et al.
Published: (2024)
by: Patel, Yash J., et al.
Published: (2024)
SDQ: Sparse Decomposed Quantization for LLM Inference
by: Jeong, Geonhwa, et al.
Published: (2024)
by: Jeong, Geonhwa, et al.
Published: (2024)
Curriculum reinforcement learning with measurable task representation learning
by: Wen, Yongyan, et al.
Published: (2026)
by: Wen, Yongyan, et al.
Published: (2026)
Ultra-short-term solar power forecasting by deep learning and data reconstruction
by: Wang, Jinbao, et al.
Published: (2025)
by: Wang, Jinbao, et al.
Published: (2025)
SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)
by: Liu, Zechun, et al.
Published: (2024)
Dynamic feature selection in medical predictive monitoring by reinforcement learning
by: Chen, Yutong, et al.
Published: (2024)
by: Chen, Yutong, et al.
Published: (2024)
OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
by: Boss, Mark, et al.
Published: (2026)
by: Boss, Mark, et al.
Published: (2026)
Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
by: Su, Xuerui, et al.
Published: (2025)
by: Su, Xuerui, et al.
Published: (2025)
An efficient deep reinforcement learning environment for flexible job-shop scheduling
by: Wu, Xinquan, et al.
Published: (2025)
by: Wu, Xinquan, et al.
Published: (2025)
A Causality-Aware Spatiotemporal Model for Multi-Region and Multi-Pollutant Air Quality Forecasting
by: Lu, Junxin, et al.
Published: (2025)
by: Lu, Junxin, et al.
Published: (2025)
Revealing the Challenges of Sim-to-Real Transfer in Model-Based Reinforcement Learning via Latent Space Modeling
by: Lin, Zhilin, et al.
Published: (2025)
by: Lin, Zhilin, et al.
Published: (2025)
Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning
by: He, Austin Yubo, et al.
Published: (2025)
by: He, Austin Yubo, et al.
Published: (2025)
Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation
by: Tang, Pingzhi, et al.
Published: (2026)
by: Tang, Pingzhi, et al.
Published: (2026)
Delayed homomorphic reinforcement learning for environments with delayed feedback
by: Lee, Jongsoo, et al.
Published: (2026)
by: Lee, Jongsoo, et al.
Published: (2026)
Counterfactual experience augmented off-policy reinforcement learning
by: Lee, Sunbowen, et al.
Published: (2025)
by: Lee, Sunbowen, et al.
Published: (2025)
Deep reinforcement learning with time-scale invariant memory
by: Kabir, Md Rysul, et al.
Published: (2024)
by: Kabir, Md Rysul, et al.
Published: (2024)
Offline reinforcement learning for job-shop scheduling problems
by: Echeverria, Imanol, et al.
Published: (2024)
by: Echeverria, Imanol, et al.
Published: (2024)
Bellman operator convergence enhancements in reinforcement learning algorithms
by: Kadurha, David Krame, et al.
Published: (2025)
by: Kadurha, David Krame, et al.
Published: (2025)
SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks
by: Kundurthy, Srivatsa, et al.
Published: (2026)
by: Kundurthy, Srivatsa, et al.
Published: (2026)
LLM Assertiveness can be Mechanistically Decomposed into Emotional and Logical Components
by: Tsujimura, Hikaru, et al.
Published: (2025)
by: Tsujimura, Hikaru, et al.
Published: (2025)
Estimating unknown parameters in differential equations with a reinforcement learning based PSO method
by: Sun, Wenkui, et al.
Published: (2024)
by: Sun, Wenkui, et al.
Published: (2024)
Multi-hop Upstream Anticipatory Traffic Signal Control with Deep Reinforcement Learning
by: Li, Xiaocan, et al.
Published: (2024)
by: Li, Xiaocan, et al.
Published: (2024)
Not all tokens are needed(NAT): token efficient reinforcement learning
by: Sang, Hejian, et al.
Published: (2026)
by: Sang, Hejian, et al.
Published: (2026)
Leveraging weights signals -- Predicting and improving generalizability in reinforcement learning
by: Moulin, Olivier, et al.
Published: (2025)
by: Moulin, Olivier, et al.
Published: (2025)
Economic span selection of bridge based on deep reinforcement learning
by: Zhang, Leye, et al.
Published: (2024)
by: Zhang, Leye, et al.
Published: (2024)
Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis
by: Li, Hao, et al.
Published: (2026)
by: Li, Hao, et al.
Published: (2026)
Survey on reinforcement learning for language processing
by: Uc-Cetina, Victor, et al.
Published: (2021)
by: Uc-Cetina, Victor, et al.
Published: (2021)
Maximum diffusion reinforcement learning
by: Berrueta, Thomas A., et al.
Published: (2023)
by: Berrueta, Thomas A., et al.
Published: (2023)
Mitigating spectral bias for the multiscale operator learning
by: Liu, Xinliang, et al.
Published: (2022)
by: Liu, Xinliang, et al.
Published: (2022)
TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting
by: Wang, Shiyu, et al.
Published: (2024)
by: Wang, Shiyu, et al.
Published: (2024)
Similar Items
-
A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
by: Li, Xiaocan, et al.
Published: (2025) -
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026) -
Pretraining large language models with MXFP4 on Native FP4 Hardware
by: Cim, Musa, et al.
Published: (2026) -
Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025) -
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)