:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yan, Renye, Gan, Yaozhong, Wu, You, Xing, Junliang, Liangn, Ling, Zhu, Yeshang, Cai, Yimao
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2410.04498
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
by: Yan, Renye, et al.
Published: (2024)

Reflective Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024)

Transductive Off-policy Proximal Policy Optimization
by: Gan, Yaozhong, et al.
Published: (2024)

MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning
by: Wu, Cuiling, et al.
Published: (2025)

Do Less, Achieve More: Do We Need Every-Step Optimization for RL Fine-tuning of Diffusion Models?
by: Yan, Renye, et al.
Published: (2026)

Memento 2: Learning by Stateful Reflective Memory
by: Wang, Jun
Published: (2025)

A Sanity Check for Multi-In-Domain Face Forgery Detection in the Real World
by: Cheng, Jikang, et al.
Published: (2025)

AdaLomo: Low-memory Optimization with Adaptive Learning Rate
by: Lv, Kai, et al.
Published: (2023)

Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization
by: Gu, Shengda, et al.
Published: (2025)

AdaMuon: Adaptive Muon Optimizer
by: Si, Chongjie, et al.
Published: (2025)

Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance
by: He, Jinmin, et al.
Published: (2025)

AdaCubic: An Adaptive Cubic Regularization Optimizer for Deep Learning
by: Tsingalis, Ioannis, et al.
Published: (2026)

AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies
by: Hu, Xixi, et al.
Published: (2024)

AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
by: Lou, Chenwei, et al.
Published: (2025)

AdaTKG: Adaptive Memory for Temporal Knowledge Graph Reasoning
by: Lee, Seunghan, et al.
Published: (2026)

AdaMuS: Adaptive Multi-view Sparsity Learning for Dimensionally Unbalanced Data
by: Xu, Cai, et al.
Published: (2026)

AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control
by: Bui, Quang-Hung, et al.
Published: (2025)

Memento-Skills: Let Agents Design Agents
by: Zhou, Huichi, et al.
Published: (2026)

BiBLDR: Bidirectional Behavior Learning for Drug Repositioning
by: Zhang, Renye, et al.
Published: (2025)

Inner-Probe: Discovering Copyright-related Data Generation in LLM Architecture
by: Ma, Qichao, et al.
Published: (2024)

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
by: Zhou, Huichi, et al.
Published: (2025)

AdaCL:Adaptive Continual Learning
by: Yildirim, Elif Ceren Gok, et al.
Published: (2023)

Ada-MSHyper: Adaptive Multi-Scale Hypergraph Transformer for Time Series Forecasting
by: Shang, Zongjiang, et al.
Published: (2024)

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks
by: Zhang, Zhongyue, et al.
Published: (2026)

AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments
by: Cai, Zhijie, et al.
Published: (2026)

AdaCuRL: Adaptive Curriculum Reinforcement Learning with Invalid Sample Mitigation and Historical Revisiting
by: Li, Renda, et al.
Published: (2025)

AdaFair-MARL: Enforcing Adaptive Fairness Constraints in Multi-Agent Reinforcement Learning
by: Ekpo, Promise, et al.
Published: (2025)

Temperature as a Meta-Policy: Adaptive Temperature in LLM Reinforcement Learning
by: Dang, Haoran, et al.
Published: (2026)

BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
by: Xi, Zhiheng, et al.
Published: (2025)

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents
by: Zeng, Ziyun, et al.
Published: (2026)

Network Topology Optimization via Deep Reinforcement Learning
by: Li, Zhuoran, et al.
Published: (2022)

AdaFisher: Adaptive Second Order Optimization via Fisher Information
by: Gomes, Damien Martins, et al.
Published: (2024)

Belief-Based Offline Reinforcement Learning for Delay-Robust Policy Optimization
by: Zhan, Simon Sinong, et al.
Published: (2025)

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning
by: Wang, Hongjun, et al.
Published: (2026)

Reinforce-Ada: An Adaptive Sampling Framework under Non-linear RL Objectives
by: Xiong, Wei, et al.
Published: (2025)

AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering
by: Cai, Yuzhu, et al.
Published: (2026)

AdaWorld: Learning Adaptable World Models with Latent Actions
by: Gao, Shenyuan, et al.
Published: (2025)

On the Reuse Bias in Off-Policy Reinforcement Learning
by: Ying, Chengyang, et al.
Published: (2022)

AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
by: Refael, Yehonathan, et al.
Published: (2024)

Adaptive Policy Synchronization for Scalable Reinforcement Learning
by: Lafuente-Mercado, Rodney
Published: (2025)