:: Library Catalog

$Cover Image$

Saved in:

Bibliographic Details
Main Authors:	Raimondi, Bianca, Pivi, Francesco, Evangelista, Davide, Gabbrielli, Maurizio
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.03334
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
by: Raimondi, Bianca, et al.
Published: (2026)

Exploiting Primacy Effect To Improve Large Language Models
by: Raimondi, Bianca, et al.
Published: (2025)

Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
by: Raimondi, Bianca, et al.
Published: (2025)

Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs
by: Raimondi, Bianca, et al.
Published: (2025)

Improving Diffusion Posterior Samplers with Lagged Temporal Corrections for Image Restoration
by: Evangelista, Davide, et al.
Published: (2026)

On the flow matching interpretability
by: Pivi, Francesco, et al.
Published: (2025)

BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
by: Rondelli, Massimo, et al.
Published: (2026)

Learning Factors in AI-Augmented Education: A Comparative Study of Middle and High School Students
by: Ebli, Gaia, et al.
Published: (2025)

mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
by: Dobler, Konstantin, et al.
Published: (2026)

From Reasoning to Code: GRPO Optimization for Underrepresented Languages
by: Pennino, Federico, et al.
Published: (2025)

CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
by: Wei, Chengwei, et al.
Published: (2024)

MathArena: Evaluating LLMs on Uncontaminated Math Competitions
by: Balunović, Mislav, et al.
Published: (2025)

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
by: Zhao, Yilun, et al.
Published: (2023)

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
by: Guan, Xinyu, et al.
Published: (2025)

SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese
by: Xu, Liang, et al.
Published: (2024)

Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset
by: Zhou, Zhuqian, et al.
Published: (2026)

CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
by: Liu, Wentao, et al.
Published: (2024)

Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange
by: Satpute, Ankit, et al.
Published: (2024)

SafeMath: Inference-time Safety improves Math Accuracy
by: Basu, Sagnik, et al.
Published: (2026)

FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains
by: Zhao, Yilun, et al.
Published: (2023)

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
by: Toshniwal, Shubham, et al.
Published: (2024)

StreetMath: Study of LLMs' Approximation Behaviors
by: Tseng, Chiung-Yi, et al.
Published: (2025)

Automate Knowledge Concept Tagging on Math Questions with LLMs
by: Li, Hang, et al.
Published: (2024)

What Makes Math Word Problems Challenging for LLMs?
by: Srivatsa, KV Aditya, et al.
Published: (2024)

Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
by: Li, Junsong, et al.
Published: (2025)

ShoppingComp: Are LLMs Really Ready for Your Shopping Cart?
by: Tou, Huaixiao, et al.
Published: (2025)

Orca-Math: Unlocking the potential of SLMs in Grade School Math
by: Mitra, Arindam, et al.
Published: (2024)

Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)

Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
by: Albalak, Alon, et al.
Published: (2025)

MathBuddy: A Multimodal System for Affective Math Tutoring
by: Kar, Debanjana, et al.
Published: (2025)

MegaMath: Pushing the Limits of Open Math Corpora
by: Zhou, Fan, et al.
Published: (2025)

MathDuels: Evaluating LLMs as Problem Posers and Solvers
by: Xu, Zhiqiu, et al.
Published: (2026)

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
by: Dekoninck, Jasper, et al.
Published: (2026)

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
by: Petrov, Ivo, et al.
Published: (2025)

Can LLMs Solve longer Math Word Problems Better?
by: Xu, Xin, et al.
Published: (2024)

MathChat: Converse to Tackle Challenging Math Problems with LLM Agents
by: Wu, Yiran, et al.
Published: (2023)

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)

GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving
by: Zhang, Yingji, et al.
Published: (2026)

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
by: Ying, Huaiyuan, et al.
Published: (2024)

RoMathExam: A Longitudinal Dataset of Romanian Math Exams (1895-2025) with a Seven-Decade Core (1957-2025)
by: Cuclea, Luca-Ncolae, et al.
Published: (2026)