Saved in:
| Main Authors: | Hu, Zhiyuan, Wang, Yibo, Dong, Hanze, Xu, Yuhui, Saha, Amrita, Xiong, Caiming, Hooi, Bryan, Li, Junnan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.10554 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reward Models Identify Consistency, Not Causality
by: Xu, Yuhui, et al.
Published: (2025)
by: Xu, Yuhui, et al.
Published: (2025)
Scalable Chain of Thoughts via Elastic Reasoning
by: Xu, Yuhui, et al.
Published: (2025)
by: Xu, Yuhui, et al.
Published: (2025)
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
by: Zhao, Zirui, et al.
Published: (2024)
by: Zhao, Zirui, et al.
Published: (2024)
Fractured Chain-of-Thought Reasoning
by: Liao, Baohao, et al.
Published: (2025)
by: Liao, Baohao, et al.
Published: (2025)
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
by: Wang, Lei, et al.
Published: (2024)
by: Wang, Lei, et al.
Published: (2024)
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
by: Liao, Baohao, et al.
Published: (2025)
by: Liao, Baohao, et al.
Published: (2025)
ThinK: Thinner Key Cache by Query-Driven Pruning
by: Xu, Yuhui, et al.
Published: (2024)
by: Xu, Yuhui, et al.
Published: (2024)
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
by: Xiong, Wei, et al.
Published: (2025)
by: Xiong, Wei, et al.
Published: (2025)
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
by: Li, Yibo, et al.
Published: (2025)
by: Li, Yibo, et al.
Published: (2025)
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers
by: Luo, Ziyang, et al.
Published: (2025)
by: Luo, Ziyang, et al.
Published: (2025)
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation
by: Pang, Bo, et al.
Published: (2025)
by: Pang, Bo, et al.
Published: (2025)
JudgeLRM: Large Reasoning Models as a Judge
by: Chen, Nuo, et al.
Published: (2025)
by: Chen, Nuo, et al.
Published: (2025)
XForecast: Evaluating Natural Language Explanations for Time Series Forecasting
by: Aksu, Taha, et al.
Published: (2024)
by: Aksu, Taha, et al.
Published: (2024)
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
by: Wang, Yuquan, et al.
Published: (2025)
by: Wang, Yuquan, et al.
Published: (2025)
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation
by: Sui, Yuan, et al.
Published: (2026)
by: Sui, Yuan, et al.
Published: (2026)
How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
by: Zhang, Xiangxiang, et al.
Published: (2026)
by: Zhang, Xiangxiang, et al.
Published: (2026)
EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
by: He, Yufei, et al.
Published: (2025)
by: He, Yufei, et al.
Published: (2025)
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet
by: Zhao, James Xu, et al.
Published: (2025)
by: Zhao, James Xu, et al.
Published: (2025)
MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models
by: Wang, Xinming, et al.
Published: (2025)
by: Wang, Xinming, et al.
Published: (2025)
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
by: Xiong, Miao, et al.
Published: (2023)
by: Xiong, Miao, et al.
Published: (2023)
Towards A Unified View of Answer Calibration for Multi-Step Reasoning
by: Deng, Shumin, et al.
Published: (2023)
by: Deng, Shumin, et al.
Published: (2023)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
by: Parmar, Mihir, et al.
Published: (2024)
by: Parmar, Mihir, et al.
Published: (2024)
Entropy-Based Block Pruning for Efficient Large Language Models
by: Yang, Liangwei, et al.
Published: (2025)
by: Yang, Liangwei, et al.
Published: (2025)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
by: Xu, Yiheng, et al.
Published: (2024)
by: Xu, Yiheng, et al.
Published: (2024)
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
by: Zhou, Kaiwen, et al.
Published: (2025)
by: Zhou, Kaiwen, et al.
Published: (2025)
RLHF Workflow: From Reward Modeling to Online RLHF
by: Dong, Hanze, et al.
Published: (2024)
by: Dong, Hanze, et al.
Published: (2024)
FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering
by: Sui, Yuan, et al.
Published: (2024)
by: Sui, Yuan, et al.
Published: (2024)
Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability
by: Jia, Jinghan, et al.
Published: (2025)
by: Jia, Jinghan, et al.
Published: (2025)
Entity Alignment with Noisy Annotations from Large Language Models
by: Chen, Shengyuan, et al.
Published: (2024)
by: Chen, Shengyuan, et al.
Published: (2024)
Enhancing Multi-Agent Debate System Performance via Confidence Expression
by: Lin, Zijie, et al.
Published: (2025)
by: Lin, Zijie, et al.
Published: (2025)
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
by: Hu, Zhiyuan, et al.
Published: (2025)
by: Hu, Zhiyuan, et al.
Published: (2025)
JudgeRank: Leveraging Large Language Models for Reasoning-Intensive Reranking
by: Niu, Tong, et al.
Published: (2024)
by: Niu, Tong, et al.
Published: (2024)
Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
by: Deng, Ailin, et al.
Published: (2024)
by: Deng, Ailin, et al.
Published: (2024)
Beyond Language: Format-Agnostic Reasoning Subspaces in Large Language Models
by: Yuan, Aojie, et al.
Published: (2026)
by: Yuan, Aojie, et al.
Published: (2026)
KLong: Training LLM Agent for Extremely Long-horizon Tasks
by: Liu, Yue, et al.
Published: (2026)
by: Liu, Yue, et al.
Published: (2026)
Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
by: Sui, Yuan, et al.
Published: (2024)
by: Sui, Yuan, et al.
Published: (2024)
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models
by: Hagendorff, Thilo, et al.
Published: (2025)
by: Hagendorff, Thilo, et al.
Published: (2025)
AhaKV: Adaptive Holistic Attention-Driven KV Cache Eviction for Efficient Inference of Large Language Models
by: Gu, Yifeng, et al.
Published: (2025)
by: Gu, Yifeng, et al.
Published: (2025)
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models
by: Diao, Shizhe, et al.
Published: (2023)
by: Diao, Shizhe, et al.
Published: (2023)
Safety in Large Reasoning Models: A Survey
by: Wang, Cheng, et al.
Published: (2025)
by: Wang, Cheng, et al.
Published: (2025)
Similar Items
-
Reward Models Identify Consistency, Not Causality
by: Xu, Yuhui, et al.
Published: (2025) -
Scalable Chain of Thoughts via Elastic Reasoning
by: Xu, Yuhui, et al.
Published: (2025) -
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
by: Zhao, Zirui, et al.
Published: (2024) -
Fractured Chain-of-Thought Reasoning
by: Liao, Baohao, et al.
Published: (2025) -
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
by: Wang, Lei, et al.
Published: (2024)