:: Library Catalog

$Imagen de Portada$

Guardado en:

Detalles Bibliográficos
Autores principales:	Wei, Chengwei, Wang, Bin, Kim, Jung-jae, Chen, Nancy F.
Formato:	Preprint
Publicado:	2025
Materias:	Computation and Language
Acceso en línea:	https://arxiv.org/abs/2505.15000
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
por: Wei, Chengwei, et al.
Publicado: (2024)

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
por: Wei, Chengwei, et al.
Publicado: (2026)

VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency
por: Han, Vernon Toh Yan, et al.
Publicado: (2023)

ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
por: Xue, Boyang, et al.
Publicado: (2025)

SKYLENAGE Technical Report: Mathematical Reasoning and Contest-Innovation Benchmarks for Multi-Level Math Evaluation
por: Wei, Hu, et al.
Publicado: (2025)

MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning
por: Sobhani, Mahbub E, et al.
Publicado: (2025)

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
por: Yang, Zhen, et al.
Publicado: (2024)

MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula
por: Hyeon, Sieun, et al.
Publicado: (2024)

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
por: Zhou, Zihao, et al.
Publicado: (2024)

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
por: Wang, Lei, et al.
Publicado: (2024)

RoMath: A Mathematical Reasoning Benchmark in Romanian
por: Cosma, Adrian, et al.
Publicado: (2024)

Resilience of Large Language Models for Noisy Instructions
por: Wang, Bin, et al.
Publicado: (2024)

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
por: Shi, Wenhao, et al.
Publicado: (2024)

VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
por: Ma, Jingkun, et al.
Publicado: (2024)

DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
por: Zou, Chengke, et al.
Publicado: (2024)

MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
por: Zhan, Shaoxiong, et al.
Publicado: (2025)

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
por: Fang, Meng, et al.
Publicado: (2024)

MathClean: A Benchmark for Synthetic Mathematical Data Cleaning
por: Liang, Hao, et al.
Publicado: (2025)

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
por: Wang, Yiming, et al.
Publicado: (2025)

MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
por: Peng, Shuai, et al.
Publicado: (2024)

TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
por: Colle, Vincenzo, et al.
Publicado: (2025)

MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability
por: Jung, Kyudan, et al.
Publicado: (2024)

MathScale: Scaling Instruction Tuning for Mathematical Reasoning
por: Tang, Zhengyang, et al.
Publicado: (2024)

IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models
por: Gao, Yiming, et al.
Publicado: (2025)

CRAFT: Extracting and Tuning Cultural Instructions from the Wild
por: Wang, Bin, et al.
Publicado: (2024)

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
por: Li, Xiaoyuan, et al.
Publicado: (2025)

TabularMath: Understanding Math Reasoning over Tables with Large Language Models
por: Tian, Shi-Yu, et al.
Publicado: (2025)

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
por: Ying, Huaiyuan, et al.
Publicado: (2024)

CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
por: Liu, Wentao, et al.
Publicado: (2024)

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
por: Wu, Yanan, et al.
Publicado: (2024)

Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
por: Jiao, Fangkai, et al.
Publicado: (2024)

WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications
por: Li, Xin, et al.
Publicado: (2025)

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution
por: Hong, Hanhua, et al.
Publicado: (2026)

JT-Math: A Multi-Stage Framework for Advanced Mathematical Reasoning in Large Language Models
por: Hao, Yifan, et al.
Publicado: (2025)

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
por: Lin, Yi-Cheng, et al.
Publicado: (2024)

MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
por: Hsu, Wei-Ling, et al.
Publicado: (2025)

Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
por: Sun, Yuhong, et al.
Publicado: (2024)

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
por: Shao, Zhihong, et al.
Publicado: (2025)

MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
por: O'Brien, Dayyán, et al.
Publicado: (2025)

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
por: Lu, Zimu, et al.
Publicado: (2024)