:: Library Catalog

$Cover Image$

Saved in:

Bibliographic Details
Main Authors:	Guo, Jiaxing, Yang, Wenjie, Zhang, Shengzhong, Xu, Tongshan, Du, Lun, Zheng, Da, Huang, Zengfeng
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.06877
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Your Graph Recommender is Provably a Single-view Graph Contrastive Learning
by: Yang, Wenjie, et al.
Published: (2024)

Re$^2$Math: Benchmarking Theorem Retrieval in Research-Level Mathematics
by: Lyu, Zicheng, et al.
Published: (2026)

Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning
by: Roy, Tiasa Singha, et al.
Published: (2025)

Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models
by: Wang, Binghai, et al.
Published: (2026)

Solving Math Word Problems via Cooperative Reasoning induced Language Models
by: Zhu, Xinyu, et al.
Published: (2022)

Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
by: He, Xuan, et al.
Published: (2024)

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
by: Guan, Xinyu, et al.
Published: (2025)

When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
by: Li, Xiaomin, et al.
Published: (2025)

Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards
by: Han, Tianyang, et al.
Published: (2026)

NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
by: Chen, Huayu, et al.
Published: (2025)

StructComp: Substituting Propagation with Structural Compression in Training Graph Contrastive Learning
by: Zhang, Shengzhong, et al.
Published: (2023)

Position: On the Methodological Pitfalls of Evaluating Base LLMs for Reasoning
by: Chan, Jason, et al.
Published: (2025)

SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese
by: Xu, Liang, et al.
Published: (2024)

Can LLMs Solve longer Math Word Problems Better?
by: Xu, Xin, et al.
Published: (2024)

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
by: Zhao, Yilun, et al.
Published: (2023)

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
by: Liu, Zihan, et al.
Published: (2024)

First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning
by: Jain, Kushal, et al.
Published: (2023)

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
by: Huang, Kaixuan, et al.
Published: (2025)

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
by: Wang, Lei, et al.
Published: (2024)

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
by: Liu, Xiaoran, et al.
Published: (2025)

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
by: Huan, Maggie, et al.
Published: (2025)

STAR-PólyaMath: Multi-Agent Reasoning under Persistent Meta-Strategic Supervision
by: Wu, Jiaao, et al.
Published: (2026)

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
by: Chen, Zui, et al.
Published: (2024)

Beyond Reasoning: Reinforcement Learning Unlocks Parametric Knowledge in LLMs
by: Yang, Wanli, et al.
Published: (2026)

TabularMath: Understanding Math Reasoning over Tables with Large Language Models
by: Tian, Shi-Yu, et al.
Published: (2025)

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
by: Wang, Yiming, et al.
Published: (2025)

Poivre: Self-Refining Visual Pointing with Reinforcement Learning
by: Yang, Wenjie, et al.
Published: (2025)

LLMs as Assessors: Right for the Right Reason?
by: Saha, Sourav, et al.
Published: (2026)

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
by: Sun, Jiaxing, et al.
Published: (2024)

FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains
by: Zhao, Yilun, et al.
Published: (2023)

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction
by: Song, Yuerong, et al.
Published: (2025)

Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs
by: Barkett, Emilio, et al.
Published: (2025)

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
by: Li, Xiaoyuan, et al.
Published: (2025)

When Is Thinking Enough? Early Exit via Sufficiency Assessment for Efficient Reasoning
by: Xiang, Yang, et al.
Published: (2026)

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs
by: Jiang, Yuxuan, et al.
Published: (2024)

Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems
by: Miner, Stephen, et al.
Published: (2024)

SASFT: Sparse Autoencoder-guided Supervised Finetuning to Mitigate Unexpected Code-Switching in LLMs
by: Deng, Boyi, et al.
Published: (2025)

KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA
by: Chen, Zhuo, et al.
Published: (2025)

Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs
by: Xu, Yuzhuang, et al.
Published: (2026)

From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision
by: Lin, Qingwen, et al.
Published: (2024)