Saved in:
| Main Authors: | Yousefzadeh, Roozbeh, Cao, Xuenan, Ospanov, Azim |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.18872 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs
by: Yousefzadeh, Roozbeh, et al.
Published: (2025)
by: Yousefzadeh, Roozbeh, et al.
Published: (2025)
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
by: Ospanov, Azim, et al.
Published: (2025)
by: Ospanov, Azim, et al.
Published: (2025)
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward
by: Ospanov, Azim, et al.
Published: (2025)
by: Ospanov, Azim, et al.
Published: (2025)
Towards a Scalable Reference-Free Evaluation of Generative Models
by: Ospanov, Azim, et al.
Published: (2024)
by: Ospanov, Azim, et al.
Published: (2024)
Do Vendi Scores Converge with Finite Samples? Truncated Vendi Score for Finite-Sample Convergence Guarantees
by: Ospanov, Azim, et al.
Published: (2024)
by: Ospanov, Azim, et al.
Published: (2024)
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)
by: Mahdavi, Sadegh, et al.
Published: (2025)
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
by: Feng, Shengyu, et al.
Published: (2024)
by: Feng, Shengyu, et al.
Published: (2024)
MathWriting: A Dataset For Handwritten Mathematical Expression Recognition
by: Gervais, Philippe, et al.
Published: (2024)
by: Gervais, Philippe, et al.
Published: (2024)
FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
by: Ravi, Nikil, et al.
Published: (2026)
by: Ravi, Nikil, et al.
Published: (2026)
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
by: Opedal, Andreas, et al.
Published: (2024)
by: Opedal, Andreas, et al.
Published: (2024)
MathChat: Converse to Tackle Challenging Math Problems with LLM Agents
by: Wu, Yiran, et al.
Published: (2023)
by: Wu, Yiran, et al.
Published: (2023)
Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error
by: Farnia, Farzan, et al.
Published: (2026)
by: Farnia, Farzan, et al.
Published: (2026)
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
by: Toshniwal, Shubham, et al.
Published: (2024)
by: Toshniwal, Shubham, et al.
Published: (2024)
Nemotron-CC-Math: A 133 Billion-Token-Scale High Quality Math Pretraining Dataset
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)
by: Mahabadi, Rabeeh Karimi, et al.
Published: (2025)
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
by: Albalak, Alon, et al.
Published: (2025)
by: Albalak, Alon, et al.
Published: (2025)
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
by: Petrov, Ivo, et al.
Published: (2025)
by: Petrov, Ivo, et al.
Published: (2025)
Conditional Vendi Score: An Information-Theoretic Approach to Diversity Evaluation of Prompt-based Generative Models
by: Jalali, Mohammad, et al.
Published: (2024)
by: Jalali, Mohammad, et al.
Published: (2024)
MegaMath: Pushing the Limits of Open Math Corpora
by: Zhou, Fan, et al.
Published: (2025)
by: Zhou, Fan, et al.
Published: (2025)
MathPile: A Billion-Token-Scale Pretraining Corpus for Math
by: Wang, Zengzhi, et al.
Published: (2023)
by: Wang, Zengzhi, et al.
Published: (2023)
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation
by: Ouyang, Jialin
Published: (2025)
by: Ouyang, Jialin
Published: (2025)
ControlMath: Controllable Data Generation Promotes Math Generalist Models
by: Chen, Nuo, et al.
Published: (2024)
by: Chen, Nuo, et al.
Published: (2024)
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning
by: Li, Chengpeng, et al.
Published: (2023)
by: Li, Chengpeng, et al.
Published: (2023)
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
by: Zhang, Renrui, et al.
Published: (2024)
by: Zhang, Renrui, et al.
Published: (2024)
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
by: Huang, Kaixuan, et al.
Published: (2025)
by: Huang, Kaixuan, et al.
Published: (2025)
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
by: He, Xuan, et al.
Published: (2024)
by: He, Xuan, et al.
Published: (2024)
Solving Formal Math Problems by Decomposition and Iterative Reflection
by: Zhou, Yichi, et al.
Published: (2025)
by: Zhou, Yichi, et al.
Published: (2025)
EasyMath: A 0-shot Math Benchmark for SLMs
by: Karki, Drishya, et al.
Published: (2025)
by: Karki, Drishya, et al.
Published: (2025)
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
by: Liu, Zihan, et al.
Published: (2024)
by: Liu, Zihan, et al.
Published: (2024)
ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning
by: Kommuru, Kranthi, et al.
Published: (2026)
by: Kommuru, Kranthi, et al.
Published: (2026)
DOoM: Difficult Olympiads of Math
by: Kuleshov, Ilya, et al.
Published: (2025)
by: Kuleshov, Ilya, et al.
Published: (2025)
Augmenting Math Word Problems via Iterative Question Composing
by: Liu, Haoxiong, et al.
Published: (2024)
by: Liu, Haoxiong, et al.
Published: (2024)
†DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems
by: Nazi, Zabir Al, et al.
Published: (2026)
by: Nazi, Zabir Al, et al.
Published: (2026)
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
by: Li, Junsong, et al.
Published: (2025)
by: Li, Junsong, et al.
Published: (2025)
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
by: Wang, Peiyi, et al.
Published: (2023)
by: Wang, Peiyi, et al.
Published: (2023)
Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention
by: Di, Xinhan, et al.
Published: (2025)
by: Di, Xinhan, et al.
Published: (2025)
SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance
by: Singh, Kunal, et al.
Published: (2025)
by: Singh, Kunal, et al.
Published: (2025)
A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture
by: Rahman, Roussel, et al.
Published: (2025)
by: Rahman, Roussel, et al.
Published: (2025)
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
by: Khan, Zaid, et al.
Published: (2025)
by: Khan, Zaid, et al.
Published: (2025)
Similar Items
-
Advocate for Complete Benchmarks for Formal Reasoning with Formal/Informal Statements and Formal/Informal Proofs
by: Yousefzadeh, Roozbeh, et al.
Published: (2025) -
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
by: Ospanov, Azim, et al.
Published: (2025) -
miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward
by: Ospanov, Azim, et al.
Published: (2025) -
Towards a Scalable Reference-Free Evaluation of Generative Models
by: Ospanov, Azim, et al.
Published: (2024) -
Do Vendi Scores Converge with Finite Samples? Truncated Vendi Score for Finite-Sample Convergence Guarantees
by: Ospanov, Azim, et al.
Published: (2024)