:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Xiaocan, Wu, Shiliang, Shen, Zheng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.20402
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation
by: Li, Xiaocan, et al.
Published: (2025)

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
by: Xu, Zukang, et al.
Published: (2026)

Pretraining large language models with MXFP4 on Native FP4 Hardware
by: Cim, Musa, et al.
Published: (2026)

Oscillation-Reduced MXFP4 Training for Vision Transformers
by: Chen, Yuxiang, et al.
Published: (2025)

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference
by: Ding, Yifu, et al.
Published: (2026)

Recipes for Pre-training LLMs with MXFP8
by: Mishra, Asit, et al.
Published: (2025)

MXNorm: Reusing MXFP block scales for efficient tensor normalisation
by: McLean, Callum, et al.
Published: (2026)

Normalization and effective learning rates in reinforcement learning
by: Lyle, Clare, et al.
Published: (2024)

Causal prompting model-based offline reinforcement learning
by: Yu, Xuehui, et al.
Published: (2024)

Curriculum reinforcement learning for quantum architecture search under hardware errors
by: Patel, Yash J., et al.
Published: (2024)

SDQ: Sparse Decomposed Quantization for LLM Inference
by: Jeong, Geonhwa, et al.
Published: (2024)

Curriculum reinforcement learning with measurable task representation learning
by: Wen, Yongyan, et al.
Published: (2026)

Ultra-short-term solar power forecasting by deep learning and data reconstruction
by: Wang, Jinbao, et al.
Published: (2025)

SpinQuant: LLM quantization with learned rotations
by: Liu, Zechun, et al.
Published: (2024)

Dynamic feature selection in medical predictive monitoring by reinforcement learning
by: Chen, Yutong, et al.
Published: (2024)

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization
by: Boss, Mark, et al.
Published: (2026)

Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning
by: Su, Xuerui, et al.
Published: (2025)

An efficient deep reinforcement learning environment for flexible job-shop scheduling
by: Wu, Xinquan, et al.
Published: (2025)

A Causality-Aware Spatiotemporal Model for Multi-Region and Multi-Pollutant Air Quality Forecasting
by: Lu, Junxin, et al.
Published: (2025)

Revealing the Challenges of Sim-to-Real Transfer in Model-Based Reinforcement Learning via Latent Space Modeling
by: Lin, Zhilin, et al.
Published: (2025)

Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning
by: He, Austin Yubo, et al.
Published: (2025)

Breaking the Blocks: Continuous Low-Rank Decomposed Scaling for Unified LLM Quantization and Adaptation
by: Tang, Pingzhi, et al.
Published: (2026)

Delayed homomorphic reinforcement learning for environments with delayed feedback
by: Lee, Jongsoo, et al.
Published: (2026)

Counterfactual experience augmented off-policy reinforcement learning
by: Lee, Sunbowen, et al.
Published: (2025)

Deep reinforcement learning with time-scale invariant memory
by: Kabir, Md Rysul, et al.
Published: (2024)

Offline reinforcement learning for job-shop scheduling problems
by: Echeverria, Imanol, et al.
Published: (2024)

Bellman operator convergence enhancements in reinforcement learning algorithms
by: Kadurha, David Krame, et al.
Published: (2025)

SpreadsheetArena: Decomposing Preference in LLM Generation of Spreadsheet Workbooks
by: Kundurthy, Srivatsa, et al.
Published: (2026)

LLM Assertiveness can be Mechanistically Decomposed into Emotional and Logical Components
by: Tsujimura, Hikaru, et al.
Published: (2025)

Estimating unknown parameters in differential equations with a reinforcement learning based PSO method
by: Sun, Wenkui, et al.
Published: (2024)

Multi-hop Upstream Anticipatory Traffic Signal Control with Deep Reinforcement Learning
by: Li, Xiaocan, et al.
Published: (2024)

Not all tokens are needed(NAT): token efficient reinforcement learning
by: Sang, Hejian, et al.
Published: (2026)

Leveraging weights signals -- Predicting and improving generalizability in reinforcement learning
by: Moulin, Olivier, et al.
Published: (2025)

Economic span selection of bridge based on deep reinforcement learning
by: Zhang, Leye, et al.
Published: (2024)

Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis
by: Li, Hao, et al.
Published: (2026)

Survey on reinforcement learning for language processing
by: Uc-Cetina, Victor, et al.
Published: (2021)

Maximum diffusion reinforcement learning
by: Berrueta, Thomas A., et al.
Published: (2023)

Mitigating spectral bias for the multiscale operator learning
by: Liu, Xinliang, et al.
Published: (2022)

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting
by: Wang, Shiyu, et al.
Published: (2024)