Guardado en:
| Autores principales: | Xu, Yifan, Chen, Junren, Chen, Yifan |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.08817 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
por: Xiong, Zidi, et al.
Publicado: (2026)
por: Xiong, Zidi, et al.
Publicado: (2026)
What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
por: Li, Haoxi, et al.
Publicado: (2026)
por: Li, Haoxi, et al.
Publicado: (2026)
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
por: Wang, Haonan, et al.
Publicado: (2025)
por: Wang, Haonan, et al.
Publicado: (2025)
Is Depth All You Need? An Exploration of Iterative Reasoning in LLMs
por: Wu, Zongqian, et al.
Publicado: (2025)
por: Wu, Zongqian, et al.
Publicado: (2025)
Think Before You Lie: How Reasoning Leads to Honesty
por: Yuan, Ann, et al.
Publicado: (2026)
por: Yuan, Ann, et al.
Publicado: (2026)
How Much You Ate? Food Portion Estimation on Spoons
por: Sharma, Aaryam, et al.
Publicado: (2024)
por: Sharma, Aaryam, et al.
Publicado: (2024)
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
por: Xie, Can, et al.
Publicado: (2025)
por: Xie, Can, et al.
Publicado: (2025)
Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
por: Huang, Zhuoxu, et al.
Publicado: (2026)
por: Huang, Zhuoxu, et al.
Publicado: (2026)
Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
por: Miao, Ziqi, et al.
Publicado: (2026)
por: Miao, Ziqi, et al.
Publicado: (2026)
Exploitation Is All You Need... for Exploration
por: Rentschler, Micah, et al.
Publicado: (2025)
por: Rentschler, Micah, et al.
Publicado: (2025)
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
por: Yang, Zhicheng, et al.
Publicado: (2025)
por: Yang, Zhicheng, et al.
Publicado: (2025)
Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning
por: Liu, Shuochen, et al.
Publicado: (2025)
por: Liu, Shuochen, et al.
Publicado: (2025)
Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance
por: Ren, Yanwei, et al.
Publicado: (2026)
por: Ren, Yanwei, et al.
Publicado: (2026)
Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
por: Yuan, Wenhao, et al.
Publicado: (2026)
por: Yuan, Wenhao, et al.
Publicado: (2026)
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
por: Kim, Soeun, et al.
Publicado: (2026)
por: Kim, Soeun, et al.
Publicado: (2026)
Tensor Product Attention Is All You Need
por: Zhang, Yifan, et al.
Publicado: (2025)
por: Zhang, Yifan, et al.
Publicado: (2025)
Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
por: Chen, Peter, et al.
Publicado: (2025)
por: Chen, Peter, et al.
Publicado: (2025)
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
por: Xu, Yifan, et al.
Publicado: (2025)
por: Xu, Yifan, et al.
Publicado: (2025)
Driving with Regulation: Trustworthy and Interpretable Decision-Making for Autonomous Driving with Retrieval-Augmented Reasoning
por: Cai, Tianhui, et al.
Publicado: (2024)
por: Cai, Tianhui, et al.
Publicado: (2024)
Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning
por: Balloch, Jonathan C., et al.
Publicado: (2024)
por: Balloch, Jonathan C., et al.
Publicado: (2024)
Detecting RLVR Training Data via Structural Convergence of Reasoning
por: Zhang, Hongbo, et al.
Publicado: (2026)
por: Zhang, Hongbo, et al.
Publicado: (2026)
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
por: Huang, Kexin, et al.
Publicado: (2026)
por: Huang, Kexin, et al.
Publicado: (2026)
Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
por: Zhao, Qiannian, et al.
Publicado: (2026)
por: Zhao, Qiannian, et al.
Publicado: (2026)
Look Before You Leap: Autonomous Exploration for LLM Agents
por: Ye, Ziang, et al.
Publicado: (2026)
por: Ye, Ziang, et al.
Publicado: (2026)
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
por: Qiu, Zhongxi, et al.
Publicado: (2025)
por: Qiu, Zhongxi, et al.
Publicado: (2025)
Reasoning Is All You Need for Urban Planning AI
por: Yang, Sijie, et al.
Publicado: (2025)
por: Yang, Sijie, et al.
Publicado: (2025)
LensWalk: Agentic Video Understanding by Planning How You See in Videos
por: Li, Keliang, et al.
Publicado: (2026)
por: Li, Keliang, et al.
Publicado: (2026)
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
por: Takemoto, Kazuhiro
Publicado: (2024)
por: Takemoto, Kazuhiro
Publicado: (2024)
How Many Bytes Can You Take Out Of Brain-To-Text Decoding?
por: Antonello, Richard, et al.
Publicado: (2024)
por: Antonello, Richard, et al.
Publicado: (2024)
Game of Trust: How Trustworthy Does Your Blockchain Think You Are?
por: Drineas, Petros, et al.
Publicado: (2025)
por: Drineas, Petros, et al.
Publicado: (2025)
SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs
por: Lee, Chanuk, et al.
Publicado: (2026)
por: Lee, Chanuk, et al.
Publicado: (2026)
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
por: Huang, Zeyu, et al.
Publicado: (2025)
por: Huang, Zeyu, et al.
Publicado: (2025)
Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
por: Liao, Haicheng, et al.
Publicado: (2025)
por: Liao, Haicheng, et al.
Publicado: (2025)
Chasing Better Deep Image Priors between Over- and Under-parameterization
por: Wu, Qiming, et al.
Publicado: (2024)
por: Wu, Qiming, et al.
Publicado: (2024)
Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training
por: Abouzeid, Abdelrahman
Publicado: (2026)
por: Abouzeid, Abdelrahman
Publicado: (2026)
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR
por: Lee, Chanuk, et al.
Publicado: (2026)
por: Lee, Chanuk, et al.
Publicado: (2026)
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
por: Ma, Hao, et al.
Publicado: (2024)
por: Ma, Hao, et al.
Publicado: (2024)
What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions
por: Habibi, Reza, et al.
Publicado: (2025)
por: Habibi, Reza, et al.
Publicado: (2025)
See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
por: Li, Yuejia, et al.
Publicado: (2026)
por: Li, Yuejia, et al.
Publicado: (2026)
Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
por: Yu, Chengzhi, et al.
Publicado: (2025)
por: Yu, Chengzhi, et al.
Publicado: (2025)
Ejemplares similares
-
Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
por: Xiong, Zidi, et al.
Publicado: (2026) -
What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
por: Li, Haoxi, et al.
Publicado: (2026) -
PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
por: Wang, Haonan, et al.
Publicado: (2025) -
Is Depth All You Need? An Exploration of Iterative Reasoning in LLMs
por: Wu, Zongqian, et al.
Publicado: (2025) -
Think Before You Lie: How Reasoning Leads to Honesty
por: Yuan, Ann, et al.
Publicado: (2026)