:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bai, Sikai, Li, Haoxi, Zhang, Jie, Liu, Yongjiang, Guo, Song
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.08468
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs
by: Li, Haoxi, et al.
Published: (2025)

Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models
by: Liu, Yongjiang, et al.
Published: (2025)

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators
by: Bai, Sikai, et al.
Published: (2023)

SelfBC: Self Behavior Cloning for Offline Reinforcement Learning
by: Liu, Shirong, et al.
Published: (2024)

DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
by: Bai, Sikai, et al.
Published: (2024)

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards
by: Fang, Linghan, et al.
Published: (2026)

Black-box Gradient Attack on Graph Neural Networks: Deeper Insights in Graph-based Attack and Defense
by: Zhan, Haoxi, et al.
Published: (2021)

HAIM-DRL: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous Driving
by: Huang, Zilin, et al.
Published: (2024)

What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
by: Li, Haoxi, et al.
Published: (2026)

Trustworthy Human-AI Collaboration: Reinforcement Learning with Human Feedback and Physics Knowledge for Safe Autonomous Driving
by: Huang, Zilin, et al.
Published: (2024)

Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning
by: Wang, Ru, et al.
Published: (2025)

Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching
by: Li, Zeqiao, et al.
Published: (2026)

Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration
by: Zhang, Chongsheng, et al.
Published: (2026)

Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning
by: Yu, Zhiyin, et al.
Published: (2026)

Gradient Boosting Reinforcement Learning
by: Fuhrer, Benjamin, et al.
Published: (2024)

Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions
by: Yang, Rui, et al.
Published: (2024)

Learning to Inject: Automated Prompt Injection via Reinforcement Learning
by: Chen, Xin, et al.
Published: (2026)

VERIRL: Boosting the LLM-based Verilog Code Generation via Reinforcement Learning
by: Teng, Fu, et al.
Published: (2025)

Self Paced Gaussian Contextual Reinforcement Learning
by: Ardakani, Mohsen Sahraei, et al.
Published: (2026)

Advancing Autonomous VLM Agents via Variational Subgoal-Conditioned Reinforcement Learning
by: Wu, Qingyuan, et al.
Published: (2025)

Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning
by: Gu, Yuxuan, et al.
Published: (2025)

Reinforcement Learning via Self-Distillation
by: Hübotter, Jonas, et al.
Published: (2026)

Reinforced Interactive Continual Learning via Real-time Noisy Human Feedback
by: Yang, Yutao, et al.
Published: (2025)

ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning
by: Zhao, Chu, et al.
Published: (2026)

DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
by: Li, Guanghe, et al.
Published: (2024)

Boosting long-term forecasting performance for continuous-time dynamic graph networks via data augmentation
by: Tian, Yuxing, et al.
Published: (2023)

Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control
by: Sheng, Zihao, et al.
Published: (2024)

Boosting Sample Efficiency and Generalization in Multi-agent Reinforcement Learning via Equivariance
by: McClellan, Joshua, et al.
Published: (2024)

Deep Reinforcement Learning Guided Improvement Heuristic for Job Shop Scheduling
by: Zhang, Cong, et al.
Published: (2022)

Meta-Cognitive Reinforcement Learning with Self-Doubt and Recovery
by: Zhang, Zhipeng, et al.
Published: (2026)

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges
by: Qing, Yunpeng, et al.
Published: (2022)

Test-Time Meta-Adaptation with Self-Synthesis
by: Kaya, Zeyneb N., et al.
Published: (2026)

Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards
by: Bai, Bizhe, et al.
Published: (2026)

RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)

Online Boosting Adaptive Learning under Concept Drift for Multistream Classification
by: Yu, En, et al.
Published: (2023)

Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
by: Sokota, Samuel, et al.
Published: (2025)

Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning
by: Han, Seungyub, et al.
Published: (2026)

MathMixup: Boosting LLM Mathematical Reasoning with Difficulty-Controllable Data Synthesis and Curriculum Learning
by: Li, Xuchen, et al.
Published: (2026)

KAN v.s. MLP for Offline Reinforcement Learning
by: Guo, Haihong, et al.
Published: (2024)

TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning
by: Yang, Shenzhi, et al.
Published: (2025)