Saved in:
| Main Authors: | Zhou, Ruiwen, Liu, Minghuan, Ren, Kan, Luo, Xufang, Zhang, Weinan, Li, Dongsheng |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2307.00547 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
by: Mo, Shentong, et al.
Published: (2026)
by: Mo, Shentong, et al.
Published: (2026)
LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
by: Zhao, Hanye, et al.
Published: (2024)
by: Zhao, Hanye, et al.
Published: (2024)
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
by: Luo, Xufang, et al.
Published: (2025)
by: Luo, Xufang, et al.
Published: (2025)
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
by: Liu, Zeyuan, et al.
Published: (2026)
by: Liu, Zeyuan, et al.
Published: (2026)
A Large-scale Medical Visual Task Adaptation Benchmark
by: Mo, Shentong, et al.
Published: (2024)
by: Mo, Shentong, et al.
Published: (2024)
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization
by: Ding, Shutong, et al.
Published: (2024)
by: Ding, Shutong, et al.
Published: (2024)
Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty
by: Kim, Jeonghye, et al.
Published: (2026)
by: Kim, Jeonghye, et al.
Published: (2026)
Looking Ahead to Avoid Being Late: Solving Hard-Constrained Traveling Salesman Problem
by: Chen, Jingxiao, et al.
Published: (2024)
by: Chen, Jingxiao, et al.
Published: (2024)
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
by: Zhang, Lei, et al.
Published: (2023)
by: Zhang, Lei, et al.
Published: (2023)
Resolving Latency and Inventory Risk in Market Making with Reinforcement Learning
by: Jiang, Junzhe, et al.
Published: (2025)
by: Jiang, Junzhe, et al.
Published: (2025)
MADiff: Offline Multi-agent Learning with Diffusion Models
by: Zhu, Zhengbang, et al.
Published: (2023)
by: Zhu, Zhengbang, et al.
Published: (2023)
Rank Supervised Contrastive Learning for Time Series Classification
by: Ren, Qianying, et al.
Published: (2024)
by: Ren, Qianying, et al.
Published: (2024)
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
by: Jiang, Huiqiang, et al.
Published: (2023)
by: Jiang, Huiqiang, et al.
Published: (2023)
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
by: Yang, Zhihe, et al.
Published: (2025)
by: Yang, Zhihe, et al.
Published: (2025)
DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning
by: Ma, Xiaoteng, et al.
Published: (2020)
by: Ma, Xiaoteng, et al.
Published: (2020)
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
by: Zhang, Dake, et al.
Published: (2024)
by: Zhang, Dake, et al.
Published: (2024)
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?
by: Kim, Jeonghye, et al.
Published: (2026)
by: Kim, Jeonghye, et al.
Published: (2026)
Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting
by: Moghimi, Mehrdad, et al.
Published: (2026)
by: Moghimi, Mehrdad, et al.
Published: (2026)
RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations
by: Chen, Jingxiao, et al.
Published: (2025)
by: Chen, Jingxiao, et al.
Published: (2025)
Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
by: Chen, Yu, et al.
Published: (2024)
by: Chen, Yu, et al.
Published: (2024)
Robust Risk-Sensitive Reinforcement Learning with Conditional Value-at-Risk
by: Ni, Xinyi, et al.
Published: (2024)
by: Ni, Xinyi, et al.
Published: (2024)
PerfectDou: Dominating DouDizhu with Perfect Information Distillation
by: Yang, Guan, et al.
Published: (2022)
by: Yang, Guan, et al.
Published: (2022)
Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation
by: Zhang, Tonghe, et al.
Published: (2024)
by: Zhang, Tonghe, et al.
Published: (2024)
Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning
by: Sun, Mingyang, et al.
Published: (2025)
by: Sun, Mingyang, et al.
Published: (2025)
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs
by: Hua, Pu, et al.
Published: (2024)
by: Hua, Pu, et al.
Published: (2024)
AlignRec: Aligning and Training in Multimodal Recommendations
by: Liu, Yifan, et al.
Published: (2024)
by: Liu, Yifan, et al.
Published: (2024)
Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects
by: Wang, Xihuai, et al.
Published: (2022)
by: Wang, Xihuai, et al.
Published: (2022)
SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
by: Zhang, Yiqi, et al.
Published: (2026)
by: Zhang, Yiqi, et al.
Published: (2026)
Risk-Sensitive Reinforcement Learning with Exponential Criteria
by: Noorani, Erfaun, et al.
Published: (2022)
by: Noorani, Erfaun, et al.
Published: (2022)
Automated Contrastive Learning Strategy Search for Time Series
by: Jing, Baoyu, et al.
Published: (2024)
by: Jing, Baoyu, et al.
Published: (2024)
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
by: Ding, Shutong, et al.
Published: (2025)
by: Ding, Shutong, et al.
Published: (2025)
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
by: Chen, Zhangquan, et al.
Published: (2025)
by: Chen, Zhangquan, et al.
Published: (2025)
Attention-Driven Hierarchical Reinforcement Learning with Particle Filtering for Source Localization in Dynamic Fields
by: Shi, Yiwei, et al.
Published: (2025)
by: Shi, Yiwei, et al.
Published: (2025)
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
by: Li, Guanghe, et al.
Published: (2024)
by: Li, Guanghe, et al.
Published: (2024)
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
by: Yu, Yijiong, et al.
Published: (2024)
by: Yu, Yijiong, et al.
Published: (2024)
Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation
by: Shi, Yiwei, et al.
Published: (2024)
by: Shi, Yiwei, et al.
Published: (2024)
Diffusion-DICE: In-Sample Diffusion Guidance for Offline Reinforcement Learning
by: Mao, Liyuan, et al.
Published: (2024)
by: Mao, Liyuan, et al.
Published: (2024)
Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
by: Sun, Mingyang, et al.
Published: (2025)
by: Sun, Mingyang, et al.
Published: (2025)
A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models
by: Xiu, Lixin, et al.
Published: (2026)
by: Xiu, Lixin, et al.
Published: (2026)
Similar Items
-
pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
by: Mo, Shentong, et al.
Published: (2026) -
LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning
by: Mo, Shentong, et al.
Published: (2024) -
DyDiff: Long-Horizon Rollout via Dynamics Diffusion for Offline Reinforcement Learning
by: Zhao, Hanye, et al.
Published: (2024) -
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
by: Luo, Xufang, et al.
Published: (2025) -
Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
by: Liu, Zeyuan, et al.
Published: (2026)