:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tseng, Chiung-Yi, Roy, Somshubhra, Thasin, Maisha, Zhang, Danyang, Effiong, Blessing
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2510.25776
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Modality-Dependent Memory Mechanisms in Cross-Modal Neuromorphic Computing
by: Blessing, Effiong, et al.
Published: (2025)

Memory-Augmented Spiking Networks: Synergistic Integration of Complementary Mechanisms for Neuromorphic Vision
by: Blessing, Effiong, et al.
Published: (2026)

Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue
by: Yu, Lin, et al.
Published: (2025)

47B Mixture-of-Experts Beats 671B Dense Models on Chinese Medical Examinations
by: Tseng, Chiung-Yi, et al.
Published: (2025)

Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning
by: Roy, Tiasa Singha, et al.
Published: (2025)

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
by: Chen, Zui, et al.
Published: (2024)

Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox
by: Bi, Ziqian, et al.
Published: (2025)

†DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
by: Nazi, Zabir Al, et al.
Published: (2026)

MathChat: Converse to Tackle Challenging Math Problems with LLM Agents
by: Wu, Yiran, et al.
Published: (2023)

WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications
by: Li, Xin, et al.
Published: (2025)

LLMs Are Already Good Tutors: Training-Free Prompt Optimization for Pedagogical Math Tutoring
by: Lee, Unggi, et al.
Published: (2026)

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
by: Petrov, Ivo, et al.
Published: (2025)

NFT: Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
by: Chen, Huayu, et al.
Published: (2025)

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
by: Huang, Kaixuan, et al.
Published: (2025)

Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
by: Li, Junsong, et al.
Published: (2025)

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
by: Wang, Peiyi, et al.
Published: (2023)

MegaMath: Pushing the Limits of Open Math Corpora
by: Zhou, Fan, et al.
Published: (2025)

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)

Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
by: Mahran, Mariam, et al.
Published: (2025)

MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning
by: Li, Chengpeng, et al.
Published: (2023)

MathPile: A Billion-Token-Scale Pretraining Corpus for Math
by: Wang, Zengzhi, et al.
Published: (2023)

ControlMath: Controllable Data Generation Promotes Math Generalist Models
by: Chen, Nuo, et al.
Published: (2024)

Large Language Models for Math Education in Low-Resource Languages: A Study in Sinhala and Tamil
by: Kishanthan, Sukumar, et al.
Published: (2026)

On Pruning State-Space LLMs
by: Ghattas, Tamer, et al.
Published: (2025)

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
by: Liu, Zihan, et al.
Published: (2024)

Benchmarking Large Language Models for Math Reasoning Tasks
by: Seßler, Kathrin, et al.
Published: (2024)

Interpreting and Mitigating Unwanted Uncertainty in LLMs
by: Roy, Tiasa Singha, et al.
Published: (2025)

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
by: Toshniwal, Shubham, et al.
Published: (2024)

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
by: Toshniwal, Shubham, et al.
Published: (2024)

Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning
by: Kong, Deqian, et al.
Published: (2026)

MathScale: Scaling Instruction Tuning for Mathematical Reasoning
by: Tang, Zhengyang, et al.
Published: (2024)

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
by: Albalak, Alon, et al.
Published: (2025)

Automated Alignment of Math Items to Content Standards in Large-Scale Assessments Using Language Models
by: Xu, Qingshu, et al.
Published: (2025)

A Controlled Study on Long Context Extension and Generalization in LLMs
by: Lu, Yi, et al.
Published: (2024)

Transparent Neighborhood Approximation for Text Classifier Explanation
by: Cai, Yi, et al.
Published: (2024)

Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
by: Gallego, Víctor
Published: (2024)

Augmenting Math Word Problems via Iterative Question Composing
by: Liu, Haoxiong, et al.
Published: (2024)

Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams
by: Caraeni, Adriana, et al.
Published: (2024)

Continuous Approximations for Improving Quantization Aware Training of LLMs
by: Li, He, et al.
Published: (2024)