:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Xu, Yifan, Chen, Junren, Chen, Yifan
Formato:	Preprint
Publicado:	2026
Materias:	Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2605.08817
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning
por: Xiong, Zidi, et al.
Publicado: (2026)

What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
por: Li, Haoxi, et al.
Publicado: (2026)

PrefixMemory-Tuning: Modernizing Prefix-Tuning by Decoupling the Prefix from Attention
por: Wang, Haonan, et al.
Publicado: (2025)

Is Depth All You Need? An Exploration of Iterative Reasoning in LLMs
por: Wu, Zongqian, et al.
Publicado: (2025)

Think Before You Lie: How Reasoning Leads to Honesty
por: Yuan, Ann, et al.
Publicado: (2026)

How Much You Ate? Food Portion Estimation on Spoons
por: Sharma, Aaryam, et al.
Publicado: (2024)

Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
por: Xie, Can, et al.
Publicado: (2025)

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning
por: Huang, Zhuoxu, et al.
Publicado: (2026)

Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
por: Miao, Ziqi, et al.
Publicado: (2026)

Exploitation Is All You Need... for Exploration
por: Rentschler, Micah, et al.
Publicado: (2025)

Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration
por: Yang, Zhicheng, et al.
Publicado: (2025)

Look as You Think: Unifying Reasoning and Visual Evidence Attribution for Verifiable Document RAG via Reinforcement Learning
por: Liu, Shuochen, et al.
Publicado: (2025)

Recycling Failures: Salvaging Exploration in RLVR via Fine-Grained Off-Policy Guidance
por: Ren, Yanwei, et al.
Publicado: (2026)

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
por: Yuan, Wenhao, et al.
Publicado: (2026)

Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
por: Kim, Soeun, et al.
Publicado: (2026)

Tensor Product Attention Is All You Need
por: Zhang, Yifan, et al.
Publicado: (2025)

Exploration vs Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward
por: Chen, Peter, et al.
Publicado: (2025)

When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
por: Xu, Yifan, et al.
Publicado: (2025)

Driving with Regulation: Trustworthy and Interpretable Decision-Making for Autonomous Driving with Retrieval-Augmented Reasoning
por: Cai, Tianhui, et al.
Publicado: (2024)

Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning
por: Balloch, Jonathan C., et al.
Publicado: (2024)

Detecting RLVR Training Data via Structural Convergence of Reasoning
por: Zhang, Hongbo, et al.
Publicado: (2026)

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation
por: Huang, Kexin, et al.
Publicado: (2026)

Know What You Know: Metacognitive Entropy Calibration for Verifiable RL Reasoning
por: Zhao, Qiannian, et al.
Publicado: (2026)

Look Before You Leap: Autonomous Exploration for LLM Agents
por: Ye, Ziang, et al.
Publicado: (2026)

Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
por: Qiu, Zhongxi, et al.
Publicado: (2025)

Reasoning Is All You Need for Urban Planning AI
por: Yang, Sijie, et al.
Publicado: (2025)

LensWalk: Agentic Video Understanding by Planning How You See in Videos
por: Li, Keliang, et al.
Publicado: (2026)

All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
por: Takemoto, Kazuhiro
Publicado: (2024)

How Many Bytes Can You Take Out Of Brain-To-Text Decoding?
por: Antonello, Richard, et al.
Publicado: (2024)

Game of Trust: How Trustworthy Does Your Blockchain Think You Are?
por: Drineas, Petros, et al.
Publicado: (2025)

SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs
por: Lee, Chanuk, et al.
Publicado: (2026)

Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
por: Huang, Zeyu, et al.
Publicado: (2025)

Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
por: Liao, Haicheng, et al.
Publicado: (2025)

Chasing Better Deep Image Priors between Over- and Under-parameterization
por: Wu, Qiming, et al.
Publicado: (2024)

Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training
por: Abouzeid, Abdelrahman
Publicado: (2026)

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR
por: Lee, Chanuk, et al.
Publicado: (2026)

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
por: Ma, Hao, et al.
Publicado: (2024)

What Do You Mean? Exploring How Humans and AI Interact with Symbols and Meanings in Their Interactions
por: Habibi, Reza, et al.
Publicado: (2025)

See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation
por: Li, Yuejia, et al.
Publicado: (2026)

Optimizing LVLMs with On-Policy Data for Effective Hallucination Mitigation
por: Yu, Chengzhi, et al.
Publicado: (2025)