Guardado en:
| Autores principales: | Wei, Chengwei, Wang, Bin, Kim, Jung-jae, Chen, Nancy F. |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2505.15000 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
por: Wei, Chengwei, et al.
Publicado: (2024)
por: Wei, Chengwei, et al.
Publicado: (2024)
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
por: Wei, Chengwei, et al.
Publicado: (2026)
por: Wei, Chengwei, et al.
Publicado: (2026)
VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency
por: Han, Vernon Toh Yan, et al.
Publicado: (2023)
por: Han, Vernon Toh Yan, et al.
Publicado: (2023)
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
por: Xue, Boyang, et al.
Publicado: (2025)
por: Xue, Boyang, et al.
Publicado: (2025)
SKYLENAGE Technical Report: Mathematical Reasoning and Contest-Innovation Benchmarks for Multi-Level Math Evaluation
por: Wei, Hu, et al.
Publicado: (2025)
por: Wei, Hu, et al.
Publicado: (2025)
MathMist: A Parallel Multilingual Benchmark Dataset for Mathematical Problem Solving and Reasoning
por: Sobhani, Mahbub E, et al.
Publicado: (2025)
por: Sobhani, Mahbub E, et al.
Publicado: (2025)
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
por: Yang, Zhen, et al.
Publicado: (2024)
por: Yang, Zhen, et al.
Publicado: (2024)
MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula
por: Hyeon, Sieun, et al.
Publicado: (2024)
por: Hyeon, Sieun, et al.
Publicado: (2024)
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
por: Zhou, Zihao, et al.
Publicado: (2024)
por: Zhou, Zihao, et al.
Publicado: (2024)
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
por: Wang, Lei, et al.
Publicado: (2024)
por: Wang, Lei, et al.
Publicado: (2024)
RoMath: A Mathematical Reasoning Benchmark in Romanian
por: Cosma, Adrian, et al.
Publicado: (2024)
por: Cosma, Adrian, et al.
Publicado: (2024)
Resilience of Large Language Models for Noisy Instructions
por: Wang, Bin, et al.
Publicado: (2024)
por: Wang, Bin, et al.
Publicado: (2024)
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
por: Shi, Wenhao, et al.
Publicado: (2024)
por: Shi, Wenhao, et al.
Publicado: (2024)
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
por: Ma, Jingkun, et al.
Publicado: (2024)
por: Ma, Jingkun, et al.
Publicado: (2024)
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
por: Zou, Chengke, et al.
Publicado: (2024)
por: Zou, Chengke, et al.
Publicado: (2024)
MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy
por: Zhan, Shaoxiong, et al.
Publicado: (2025)
por: Zhan, Shaoxiong, et al.
Publicado: (2025)
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
por: Fang, Meng, et al.
Publicado: (2024)
por: Fang, Meng, et al.
Publicado: (2024)
MathClean: A Benchmark for Synthetic Mathematical Data Cleaning
por: Liang, Hao, et al.
Publicado: (2025)
por: Liang, Hao, et al.
Publicado: (2025)
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
por: Wang, Yiming, et al.
Publicado: (2025)
por: Wang, Yiming, et al.
Publicado: (2025)
MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models
por: Peng, Shuai, et al.
Publicado: (2024)
por: Peng, Shuai, et al.
Publicado: (2024)
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving
por: Colle, Vincenzo, et al.
Publicado: (2025)
por: Colle, Vincenzo, et al.
Publicado: (2025)
MathBridge: A Large Corpus Dataset for Translating Spoken Mathematical Expressions into $LaTeX$ Formulas for Improved Readability
por: Jung, Kyudan, et al.
Publicado: (2024)
por: Jung, Kyudan, et al.
Publicado: (2024)
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
por: Tang, Zhengyang, et al.
Publicado: (2024)
por: Tang, Zhengyang, et al.
Publicado: (2024)
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models
por: Gao, Yiming, et al.
Publicado: (2025)
por: Gao, Yiming, et al.
Publicado: (2025)
CRAFT: Extracting and Tuning Cultural Instructions from the Wild
por: Wang, Bin, et al.
Publicado: (2024)
por: Wang, Bin, et al.
Publicado: (2024)
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
por: Li, Xiaoyuan, et al.
Publicado: (2025)
por: Li, Xiaoyuan, et al.
Publicado: (2025)
TabularMath: Understanding Math Reasoning over Tables with Large Language Models
por: Tian, Shi-Yu, et al.
Publicado: (2025)
por: Tian, Shi-Yu, et al.
Publicado: (2025)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
por: Ying, Huaiyuan, et al.
Publicado: (2024)
por: Ying, Huaiyuan, et al.
Publicado: (2024)
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
por: Liu, Wentao, et al.
Publicado: (2024)
por: Liu, Wentao, et al.
Publicado: (2024)
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models
por: Wu, Yanan, et al.
Publicado: (2024)
por: Wu, Yanan, et al.
Publicado: (2024)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
por: Jiao, Fangkai, et al.
Publicado: (2024)
por: Jiao, Fangkai, et al.
Publicado: (2024)
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications
por: Li, Xin, et al.
Publicado: (2025)
por: Li, Xin, et al.
Publicado: (2025)
HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution
por: Hong, Hanhua, et al.
Publicado: (2026)
por: Hong, Hanhua, et al.
Publicado: (2026)
JT-Math: A Multi-Stage Framework for Advanced Mathematical Reasoning in Large Language Models
por: Hao, Yifan, et al.
Publicado: (2025)
por: Hao, Yifan, et al.
Publicado: (2025)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
por: Lin, Yi-Cheng, et al.
Publicado: (2024)
por: Lin, Yi-Cheng, et al.
Publicado: (2024)
MathEDU: Feedback Generation on Problem-Solving Processes for Mathematical Learning Support
por: Hsu, Wei-Ling, et al.
Publicado: (2025)
por: Hsu, Wei-Ling, et al.
Publicado: (2025)
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
por: Sun, Yuhong, et al.
Publicado: (2024)
por: Sun, Yuhong, et al.
Publicado: (2024)
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
por: Shao, Zhihong, et al.
Publicado: (2025)
por: Shao, Zhihong, et al.
Publicado: (2025)
MatheMagic: Generating Dynamic Mathematics Benchmarks Robust to Memorization
por: O'Brien, Dayyán, et al.
Publicado: (2025)
por: O'Brien, Dayyán, et al.
Publicado: (2025)
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
por: Lu, Zimu, et al.
Publicado: (2024)
por: Lu, Zimu, et al.
Publicado: (2024)
Ejemplares similares
-
CoinMath: Harnessing the Power of Coding Instruction for Math LLMs
por: Wei, Chengwei, et al.
Publicado: (2024) -
InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
por: Wei, Chengwei, et al.
Publicado: (2026) -
VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency
por: Han, Vernon Toh Yan, et al.
Publicado: (2023) -
ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models
por: Xue, Boyang, et al.
Publicado: (2025) -
SKYLENAGE Technical Report: Mathematical Reasoning and Contest-Innovation Benchmarks for Multi-Level Math Evaluation
por: Wei, Hu, et al.
Publicado: (2025)