Saved in:
| Main Authors: | Pu, Xiao, Saxon, Michael, Hua, Wenyue, Wang, William Yang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.13367 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Overthinking through Reasoning Shaping
by: Song, Feifan, et al.
Published: (2025)
by: Song, Feifan, et al.
Published: (2025)
Benchmarks as Microscopes: A Call for Model Metrology
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
by: Guan, Weixin, et al.
Published: (2026)
by: Guan, Weixin, et al.
Published: (2026)
Mitigating Overthinking in Large Reasoning Models via Manifold Steering
by: Huang, Yao, et al.
Published: (2025)
by: Huang, Yao, et al.
Published: (2025)
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
by: Sharma, Aditya, et al.
Published: (2024)
by: Sharma, Aditya, et al.
Published: (2024)
Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning
by: Wang, Qianyue, et al.
Published: (2026)
by: Wang, Qianyue, et al.
Published: (2026)
Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
by: Srivastava, Saurabh, et al.
Published: (2025)
by: Srivastava, Saurabh, et al.
Published: (2025)
Do LLMs Overthink Basic Math Reasoning? Benchmarking the Accuracy-Efficiency Tradeoff in Language Models
by: Srivastava, Gaurav, et al.
Published: (2025)
by: Srivastava, Gaurav, et al.
Published: (2025)
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios
by: Zhou, Ruiwen, et al.
Published: (2024)
by: Zhou, Ruiwen, et al.
Published: (2024)
ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention
by: Wang, Xinyan, et al.
Published: (2026)
by: Wang, Xinyan, et al.
Published: (2026)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
by: Sui, Yang, et al.
Published: (2025)
by: Sui, Yang, et al.
Published: (2025)
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
by: Fan, Lizhou, et al.
Published: (2023)
by: Fan, Lizhou, et al.
Published: (2023)
Disentangling Memory and Reasoning Ability in Large Language Models
by: Jin, Mingyu, et al.
Published: (2024)
by: Jin, Mingyu, et al.
Published: (2024)
DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models
by: Yan, Kaiwen, et al.
Published: (2025)
by: Yan, Kaiwen, et al.
Published: (2025)
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
by: Han, Jinyi, et al.
Published: (2025)
by: Han, Jinyi, et al.
Published: (2025)
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
by: Wang, Xinyi, et al.
Published: (2023)
by: Wang, Xinyi, et al.
Published: (2023)
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
by: Yi, Biao, et al.
Published: (2025)
by: Yi, Biao, et al.
Published: (2025)
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
by: Feng, Weixi, et al.
Published: (2024)
by: Feng, Weixi, et al.
Published: (2024)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation
by: Juneja, Gurusha, et al.
Published: (2025)
by: Juneja, Gurusha, et al.
Published: (2025)
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning
by: Hassid, Michael, et al.
Published: (2025)
by: Hassid, Michael, et al.
Published: (2025)
Do You Know About My Nation? Investigating Multilingual Language Models' Cultural Literacy Through Factual Knowledge
by: Tanwar, Eshaan, et al.
Published: (2025)
by: Tanwar, Eshaan, et al.
Published: (2025)
Think, But Don't Overthink: Reproducing Recursive Language Models
by: Wang, Daren
Published: (2026)
by: Wang, Daren
Published: (2026)
Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis
by: Vamvourellis, Dimitris, et al.
Published: (2025)
by: Vamvourellis, Dimitris, et al.
Published: (2025)
Explore Briefly, Then Decide: Mitigating LLM Overthinking via Cumulative Entropy Regulation
by: Bin, Yi, et al.
Published: (2025)
by: Bin, Yi, et al.
Published: (2025)
Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization
by: Diao, Xingjian, et al.
Published: (2026)
by: Diao, Xingjian, et al.
Published: (2026)
Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
by: Fan, Chenrui, et al.
Published: (2025)
by: Fan, Chenrui, et al.
Published: (2025)
The Evolution of Thought: Tracking LLM Overthinking via Reasoning Dynamics Analysis
by: Wei, Zihao, et al.
Published: (2025)
by: Wei, Zihao, et al.
Published: (2025)
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs
by: Su, Jinyan, et al.
Published: (2025)
by: Su, Jinyan, et al.
Published: (2025)
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate
by: Amayuelas, Alfonso, et al.
Published: (2024)
by: Amayuelas, Alfonso, et al.
Published: (2024)
Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?
by: Jedidi, Nour, et al.
Published: (2025)
by: Jedidi, Nour, et al.
Published: (2025)
VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs
by: Wu, Qiucheng, et al.
Published: (2024)
by: Wu, Qiucheng, et al.
Published: (2024)
Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis
by: Yadav, Anushka, et al.
Published: (2025)
by: Yadav, Anushka, et al.
Published: (2025)
Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)
by: Saxon, Michael, et al.
Published: (2024)
by: Saxon, Michael, et al.
Published: (2024)
The Impact of Reasoning Step Length on Large Language Models
by: Jin, Mingyu, et al.
Published: (2024)
by: Jin, Mingyu, et al.
Published: (2024)
Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
by: Jiang, Shuyang, et al.
Published: (2025)
by: Jiang, Shuyang, et al.
Published: (2025)
InductionBench: LLMs Fail in the Simplest Complexity Class
by: Hua, Wenyue, et al.
Published: (2025)
by: Hua, Wenyue, et al.
Published: (2025)
MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation
by: Juneja, Gurusha, et al.
Published: (2025)
by: Juneja, Gurusha, et al.
Published: (2025)
REALM: A Dataset of Real-World LLM Use Cases
by: Cheng, Jingwen, et al.
Published: (2025)
by: Cheng, Jingwen, et al.
Published: (2025)
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
by: Hua, Wenyue, et al.
Published: (2024)
by: Hua, Wenyue, et al.
Published: (2024)
Similar Items
-
Mitigating Overthinking through Reasoning Shaping
by: Song, Feifan, et al.
Published: (2025) -
Benchmarks as Microscopes: A Call for Model Metrology
by: Saxon, Michael, et al.
Published: (2024) -
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
by: Guan, Weixin, et al.
Published: (2026) -
Mitigating Overthinking in Large Reasoning Models via Manifold Steering
by: Huang, Yao, et al.
Published: (2025) -
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
by: Sharma, Aditya, et al.
Published: (2024)