Saved in:
| Main Authors: | Raimondi, Bianca, Pivi, Francesco, Evangelista, Davide, Gabbrielli, Maurizio |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03334 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
by: Raimondi, Bianca, et al.
Published: (2026)
by: Raimondi, Bianca, et al.
Published: (2026)
Exploiting Primacy Effect To Improve Large Language Models
by: Raimondi, Bianca, et al.
Published: (2025)
by: Raimondi, Bianca, et al.
Published: (2025)
Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
by: Raimondi, Bianca, et al.
Published: (2025)
by: Raimondi, Bianca, et al.
Published: (2025)
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs
by: Raimondi, Bianca, et al.
Published: (2025)
by: Raimondi, Bianca, et al.
Published: (2025)
Improving Diffusion Posterior Samplers with Lagged Temporal Corrections for Image Restoration
by: Evangelista, Davide, et al.
Published: (2026)
by: Evangelista, Davide, et al.
Published: (2026)
On the flow matching interpretability
by: Pivi, Francesco, et al.
Published: (2025)
by: Pivi, Francesco, et al.
Published: (2025)
BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
by: Rondelli, Massimo, et al.
Published: (2026)
by: Rondelli, Massimo, et al.
Published: (2026)
Learning Factors in AI-Augmented Education: A Comparative Study of Middle and High School Students
by: Ebli, Gaia, et al.
Published: (2025)
by: Ebli, Gaia, et al.
Published: (2025)
mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
by: Dobler, Konstantin, et al.
Published: (2026)
by: Dobler, Konstantin, et al.
Published: (2026)
From Reasoning to Code: GRPO Optimization for Underrepresented Languages
by: Pennino, Federico, et al.
Published: (2025)
by: Pennino, Federico, et al.
Published: (2025)
CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
by: Wei, Chengwei, et al.
Published: (2024)
by: Wei, Chengwei, et al.
Published: (2024)
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
by: Balunović, Mislav, et al.
Published: (2025)
by: Balunović, Mislav, et al.
Published: (2025)
DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents
by: Zhao, Yilun, et al.
Published: (2023)
by: Zhao, Yilun, et al.
Published: (2023)
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
by: Guan, Xinyu, et al.
Published: (2025)
by: Guan, Xinyu, et al.
Published: (2025)
SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in Chinese
by: Xu, Liang, et al.
Published: (2024)
by: Xu, Liang, et al.
Published: (2024)
Utility-Preserving De-Identification for Math Tutoring: Investigating Numeric Ambiguity in the MathEd-PII Benchmark Dataset
by: Zhou, Zhuqian, et al.
Published: (2026)
by: Zhou, Zhuqian, et al.
Published: (2026)
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
by: Liu, Wentao, et al.
Published: (2024)
by: Liu, Wentao, et al.
Published: (2024)
Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange
by: Satpute, Ankit, et al.
Published: (2024)
by: Satpute, Ankit, et al.
Published: (2024)
SafeMath: Inference-time Safety improves Math Accuracy
by: Basu, Sagnik, et al.
Published: (2026)
by: Basu, Sagnik, et al.
Published: (2026)
FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains
by: Zhao, Yilun, et al.
Published: (2023)
by: Zhao, Yilun, et al.
Published: (2023)
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
by: Toshniwal, Shubham, et al.
Published: (2024)
by: Toshniwal, Shubham, et al.
Published: (2024)
StreetMath: Study of LLMs' Approximation Behaviors
by: Tseng, Chiung-Yi, et al.
Published: (2025)
by: Tseng, Chiung-Yi, et al.
Published: (2025)
Automate Knowledge Concept Tagging on Math Questions with LLMs
by: Li, Hang, et al.
Published: (2024)
by: Li, Hang, et al.
Published: (2024)
What Makes Math Word Problems Challenging for LLMs?
by: Srivatsa, KV Aditya, et al.
Published: (2024)
by: Srivatsa, KV Aditya, et al.
Published: (2024)
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
by: Li, Junsong, et al.
Published: (2025)
by: Li, Junsong, et al.
Published: (2025)
ShoppingComp: Are LLMs Really Ready for Your Shopping Cart?
by: Tou, Huaixiao, et al.
Published: (2025)
by: Tou, Huaixiao, et al.
Published: (2025)
Orca-Math: Unlocking the potential of SLMs in Grade School Math
by: Mitra, Arindam, et al.
Published: (2024)
by: Mitra, Arindam, et al.
Published: (2024)
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
by: Albalak, Alon, et al.
Published: (2025)
by: Albalak, Alon, et al.
Published: (2025)
MathBuddy: A Multimodal System for Affective Math Tutoring
by: Kar, Debanjana, et al.
Published: (2025)
by: Kar, Debanjana, et al.
Published: (2025)
MegaMath: Pushing the Limits of Open Math Corpora
by: Zhou, Fan, et al.
Published: (2025)
by: Zhou, Fan, et al.
Published: (2025)
MathDuels: Evaluating LLMs as Problem Posers and Solvers
by: Xu, Zhiqiu, et al.
Published: (2026)
by: Xu, Zhiqiu, et al.
Published: (2026)
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs
by: Dekoninck, Jasper, et al.
Published: (2026)
by: Dekoninck, Jasper, et al.
Published: (2026)
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
by: Petrov, Ivo, et al.
Published: (2025)
by: Petrov, Ivo, et al.
Published: (2025)
Can LLMs Solve longer Math Word Problems Better?
by: Xu, Xin, et al.
Published: (2024)
by: Xu, Xin, et al.
Published: (2024)
MathChat: Converse to Tackle Challenging Math Problems with LLM Agents
by: Wu, Yiran, et al.
Published: (2023)
by: Wu, Yiran, et al.
Published: (2023)
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)
by: Mahdavi, Sadegh, et al.
Published: (2025)
GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving
by: Zhang, Yingji, et al.
Published: (2026)
by: Zhang, Yingji, et al.
Published: (2026)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
by: Ying, Huaiyuan, et al.
Published: (2024)
by: Ying, Huaiyuan, et al.
Published: (2024)
RoMathExam: A Longitudinal Dataset of Romanian Math Exams (1895-2025) with a Seven-Decade Core (1957-2025)
by: Cuclea, Luca-Ncolae, et al.
Published: (2026)
by: Cuclea, Luca-Ncolae, et al.
Published: (2026)
Similar Items
-
Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
by: Raimondi, Bianca, et al.
Published: (2026) -
Exploiting Primacy Effect To Improve Large Language Models
by: Raimondi, Bianca, et al.
Published: (2025) -
Analysing Moral Bias in Finetuned LLMs through Mechanistic Interpretability
by: Raimondi, Bianca, et al.
Published: (2025) -
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs
by: Raimondi, Bianca, et al.
Published: (2025) -
Improving Diffusion Posterior Samplers with Lagged Temporal Corrections for Image Restoration
by: Evangelista, Davide, et al.
Published: (2026)