Saved in:
| Main Authors: | Jiang, Zhizheng, Zhao, Kang, Xu, Weikai, Lin, Xinkui, Liu, Wei, Luan, Jian, Shang, Shuo, Han, Peng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.19620 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy
by: Sun, Hongda, et al.
Published: (2023)
by: Sun, Hongda, et al.
Published: (2023)
Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents
by: Xu, Weikai, et al.
Published: (2025)
by: Xu, Weikai, et al.
Published: (2025)
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning
by: Wang, Qibin, et al.
Published: (2024)
by: Wang, Qibin, et al.
Published: (2024)
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
by: Deng, Shihan, et al.
Published: (2024)
by: Deng, Shihan, et al.
Published: (2024)
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
by: Wu, Qinzhuo, et al.
Published: (2024)
by: Wu, Qinzhuo, et al.
Published: (2024)
Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement Learning
by: Ma, Weiyu, et al.
Published: (2026)
by: Ma, Weiyu, et al.
Published: (2026)
Reinforcement Learning to Rank Using Coarse-grained Rewards
by: Tu, Yiteng, et al.
Published: (2022)
by: Tu, Yiteng, et al.
Published: (2022)
R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning
by: Zhao, Qingfei, et al.
Published: (2025)
by: Zhao, Qingfei, et al.
Published: (2025)
EMO-R3: Reflective Reinforcement Learning for Emotional Reasoning in Multimodal Large Language Models
by: Fang, Yiyang, et al.
Published: (2026)
by: Fang, Yiyang, et al.
Published: (2026)
Rewarded Region Replay (R3) for Policy Learning with Discrete Action Space
by: Li, Bangzheng, et al.
Published: (2024)
by: Li, Bangzheng, et al.
Published: (2024)
CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning
by: Liu, Yuxuan, et al.
Published: (2026)
by: Liu, Yuxuan, et al.
Published: (2026)
DyJR: Preserving Diversity in Reinforcement Learning with Verifiable Rewards via Dynamic Jensen-Shannon Replay
by: Li, Long, et al.
Published: (2026)
by: Li, Long, et al.
Published: (2026)
RLEP: Reinforcement Learning with Experience Replay for LLM Reasoning
by: Zhang, Hongzhi, et al.
Published: (2025)
by: Zhang, Hongzhi, et al.
Published: (2025)
End-to-End Optimization of LLM-Driven Multi-Agent Search Systems via Heterogeneous-Group-Based Reinforcement Learning
by: Chen, Guanzhong, et al.
Published: (2025)
by: Chen, Guanzhong, et al.
Published: (2025)
PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning
by: Dong, Daize, et al.
Published: (2026)
by: Dong, Daize, et al.
Published: (2026)
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
by: Surana, Rohan, et al.
Published: (2026)
by: Surana, Rohan, et al.
Published: (2026)
MobileIPL: Enhancing Mobile Agents Thinking Process via Iterative Preference Learning
by: Huang, Kun, et al.
Published: (2025)
by: Huang, Kun, et al.
Published: (2025)
Reinforcement Learning with Inverse Rewards for World Model Post-training
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL
by: Han, Hojae, et al.
Published: (2026)
by: Han, Hojae, et al.
Published: (2026)
Revisiting Vulnerability Patch Localization: An Empirical Study and LLM-Based Solution
by: Xu, Haoran, et al.
Published: (2025)
by: Xu, Haoran, et al.
Published: (2025)
Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
by: Ye, Zhiling, et al.
Published: (2025)
by: Ye, Zhiling, et al.
Published: (2025)
Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning
by: Liu, Jinyi, et al.
Published: (2023)
by: Liu, Jinyi, et al.
Published: (2023)
Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning
by: Xu, Linjie, et al.
Published: (2024)
by: Xu, Linjie, et al.
Published: (2024)
Sample-efficient LLM Optimization with Reset Replay
by: Liu, Zichuan, et al.
Published: (2025)
by: Liu, Zichuan, et al.
Published: (2025)
Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation
by: Min, Do June, et al.
Published: (2024)
by: Min, Do June, et al.
Published: (2024)
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning
by: Zhuang, Shengyao, et al.
Published: (2025)
by: Zhuang, Shengyao, et al.
Published: (2025)
TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning
by: Chen, Yuhui, et al.
Published: (2025)
by: Chen, Yuhui, et al.
Published: (2025)
Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
by: Qu, Yun, et al.
Published: (2024)
by: Qu, Yun, et al.
Published: (2024)
Data-Efficient Learning from Human Interventions for Mobile Robots
by: Peng, Zhenghao, et al.
Published: (2025)
by: Peng, Zhenghao, et al.
Published: (2025)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs
by: Fan, Siqi, et al.
Published: (2025)
by: Fan, Siqi, et al.
Published: (2025)
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding
by: Yang, Wenkai, et al.
Published: (2025)
by: Yang, Wenkai, et al.
Published: (2025)
Enabling Option Learning in Sparse Rewards with Hindsight Experience Replay
by: Romio, Gabriel, et al.
Published: (2026)
by: Romio, Gabriel, et al.
Published: (2026)
GUI-Shift: Enhancing VLM-Based GUI Agents through Self-supervised Reinforcement Learning
by: Gao, Longxi, et al.
Published: (2025)
by: Gao, Longxi, et al.
Published: (2025)
Multi-Agent Reinforcement Learning with Submodular Reward
by: Chen, Wenjing, et al.
Published: (2026)
by: Chen, Wenjing, et al.
Published: (2026)
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification
by: Li, Jiaqi, et al.
Published: (2023)
by: Li, Jiaqi, et al.
Published: (2023)
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
by: Zhang, Yi-Fan, et al.
Published: (2025)
by: Zhang, Yi-Fan, et al.
Published: (2025)
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives
by: Zhang, Xiaoqing, et al.
Published: (2025)
by: Zhang, Xiaoqing, et al.
Published: (2025)
Sample Efficient Experience Replay in Non-stationary Environments
by: Duan, Tianyang, et al.
Published: (2025)
by: Duan, Tianyang, et al.
Published: (2025)
Distribution-Aware Reward: Reinforcement Learning over Predictive Distributions for LLM Regression
by: Park, Jungsoo, et al.
Published: (2026)
by: Park, Jungsoo, et al.
Published: (2026)
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
by: Yu, En, et al.
Published: (2025)
by: Yu, En, et al.
Published: (2025)
Similar Items
-
DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy
by: Sun, Hongda, et al.
Published: (2023) -
Mobile-Bench-v2: A More Realistic and Comprehensive Benchmark for VLM-based Mobile Agents
by: Xu, Weikai, et al.
Published: (2025) -
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning
by: Wang, Qibin, et al.
Published: (2024) -
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
by: Deng, Shihan, et al.
Published: (2024) -
MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding
by: Wu, Qinzhuo, et al.
Published: (2024)