Guardado en:
| Autores principales: | He, Zhitao, Lyu, Zongwei, Chen, Dazhong, Guo, Dadi, Fung, Yi R. |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2506.06034 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind
por: He, Zhitao, et al.
Publicado: (2026)
por: He, Zhitao, et al.
Publicado: (2026)
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration
por: He, Zhitao, et al.
Publicado: (2025)
por: He, Zhitao, et al.
Publicado: (2025)
Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies
por: McGinness, Lachlan, et al.
Publicado: (2024)
por: McGinness, Lachlan, et al.
Publicado: (2024)
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
por: Chen, Luoxin, et al.
Publicado: (2025)
por: Chen, Luoxin, et al.
Publicado: (2025)
ClinTutor-R1: Advancing Scalable and Robust One-to-Many Alignment in Clinical Socratic Education
por: He, Zhitao, et al.
Publicado: (2025)
por: He, Zhitao, et al.
Publicado: (2025)
MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
por: Huang, Junsheng, et al.
Publicado: (2025)
por: Huang, Junsheng, et al.
Publicado: (2025)
MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
por: Yang, Haolin, et al.
Publicado: (2025)
por: Yang, Haolin, et al.
Publicado: (2025)
On Stable Long-Form Generation: Benchmarking and Mitigating Length Volatility
por: He, Zhitao, et al.
Publicado: (2026)
por: He, Zhitao, et al.
Publicado: (2026)
EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving
por: Li, Mukai, et al.
Publicado: (2025)
por: Li, Mukai, et al.
Publicado: (2025)
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
por: Guo, Dadi, et al.
Publicado: (2026)
por: Guo, Dadi, et al.
Publicado: (2026)
HunyuanProver: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving
por: Li, Yang, et al.
Publicado: (2024)
por: Li, Yang, et al.
Publicado: (2024)
Theorem Provers: One Size Fits All?
por: Oates, Harrison, et al.
Publicado: (2025)
por: Oates, Harrison, et al.
Publicado: (2025)
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Critic-Guided Search
por: Wu, Zijian, et al.
Publicado: (2024)
por: Wu, Zijian, et al.
Publicado: (2024)
PhysProver: Advancing Automatic Theorem Proving for Physics
por: Zhang, Hanning, et al.
Publicado: (2026)
por: Zhang, Hanning, et al.
Publicado: (2026)
Automated Theorem Provers Help Improve Large Language Model Reasoning
por: McGinness, Lachlan, et al.
Publicado: (2024)
por: McGinness, Lachlan, et al.
Publicado: (2024)
Reasoning Path Divergence: A New Metric and Curation Strategy to Unlock LLM Diverse Thinking
por: Ju, Feng, et al.
Publicado: (2025)
por: Ju, Feng, et al.
Publicado: (2025)
Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training
por: Zhou, Xinyuan, et al.
Publicado: (2025)
por: Zhou, Xinyuan, et al.
Publicado: (2025)
CantoASR: Prosody-Aware ASR-LALM Collaboration for Low-Resource Cantonese
por: Chen, Dazhong, et al.
Publicado: (2025)
por: Chen, Dazhong, et al.
Publicado: (2025)
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG
por: Peng, Xiangyu, et al.
Publicado: (2025)
por: Peng, Xiangyu, et al.
Publicado: (2025)
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
por: Tsoukalas, George, et al.
Publicado: (2024)
por: Tsoukalas, George, et al.
Publicado: (2024)
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
por: Chen, Kai, et al.
Publicado: (2025)
por: Chen, Kai, et al.
Publicado: (2025)
CultureCLIP: Empowering CLIP with Cultural Awareness through Synthetic Images and Contextualized Captions
por: Huang, Yuchen, et al.
Publicado: (2025)
por: Huang, Yuchen, et al.
Publicado: (2025)
UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
por: Ji, Yifan, et al.
Publicado: (2026)
por: Ji, Yifan, et al.
Publicado: (2026)
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
por: Chen, Jiangjie, et al.
Publicado: (2025)
por: Chen, Jiangjie, et al.
Publicado: (2025)
MerLean-Prover: A Recursive Looping Harness for Lean 4 Theorem Proving
por: Li, Jinzheng, et al.
Publicado: (2026)
por: Li, Jinzheng, et al.
Publicado: (2026)
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning
por: Shen, Ziju, et al.
Publicado: (2025)
por: Shen, Ziju, et al.
Publicado: (2025)
A Comprehensive Survey of the Lean 4 Theorem Prover: Architecture, Applications, and Advances
por: Tang, Xichen
Publicado: (2025)
por: Tang, Xichen
Publicado: (2025)
Compile to Compress: Boosting Formal Theorem Provers by Compiler Outputs
por: Li, Guchan, et al.
Publicado: (2026)
por: Li, Guchan, et al.
Publicado: (2026)
Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization
por: He, Zhitao, et al.
Publicado: (2025)
por: He, Zhitao, et al.
Publicado: (2025)
AIDBench: A benchmark for evaluating the authorship identification capability of large language models
por: Wen, Zichen, et al.
Publicado: (2024)
por: Wen, Zichen, et al.
Publicado: (2024)
GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
por: Rajabi, Navid, et al.
Publicado: (2024)
por: Rajabi, Navid, et al.
Publicado: (2024)
MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria
por: Ge, Wentao, et al.
Publicado: (2023)
por: Ge, Wentao, et al.
Publicado: (2023)
Traj-MLLM: Can Multimodal Large Language Models Reform Trajectory Data Mining?
por: Liu, Shuo, et al.
Publicado: (2025)
por: Liu, Shuo, et al.
Publicado: (2025)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
por: Gao, Timin, et al.
Publicado: (2024)
por: Gao, Timin, et al.
Publicado: (2024)
Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation
por: Qi, Chengwen, et al.
Publicado: (2025)
por: Qi, Chengwen, et al.
Publicado: (2025)
FedMLLM: Federated Fine-tuning MLLM on Multimodal Heterogeneity Data
por: Xu, Binqian, et al.
Publicado: (2024)
por: Xu, Binqian, et al.
Publicado: (2024)
KOCO-BENCH: Can Large Language Models Leverage Domain Knowledge in Software Development?
por: Jiang, Xue, et al.
Publicado: (2026)
por: Jiang, Xue, et al.
Publicado: (2026)
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models
por: Yoshitake, Michiko, et al.
Publicado: (2024)
por: Yoshitake, Michiko, et al.
Publicado: (2024)
NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms
por: Zheng, Jonathan, et al.
Publicado: (2024)
por: Zheng, Jonathan, et al.
Publicado: (2024)
Prover-Verifier Games improve legibility of LLM outputs
por: Kirchner, Jan Hendrik, et al.
Publicado: (2024)
por: Kirchner, Jan Hendrik, et al.
Publicado: (2024)
Ejemplares similares
-
RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind
por: He, Zhitao, et al.
Publicado: (2026) -
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration
por: He, Zhitao, et al.
Publicado: (2025) -
Steamroller Problems: An Evaluation of LLM Reasoning Capability with Automated Theorem Prover Strategies
por: McGinness, Lachlan, et al.
Publicado: (2024) -
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
por: Chen, Luoxin, et al.
Publicado: (2025) -
ClinTutor-R1: Advancing Scalable and Robust One-to-Many Alignment in Clinical Socratic Education
por: He, Zhitao, et al.
Publicado: (2025)