Guardado en:
| Autores principales: | Gao, Songyang, Gu, Yuzhe, Wu, Zijian, Kong, Lingkai, Zhang, Wenwei, Cai, Zhongrui, Zheng, Fan, Ma, Tianyou, Shen, Junhao, Zhao, Haiteng, Zhang, Duanyang, Zhang, Huilun, Liu, Kuikun, Lyu, Chengqi, Duan, Yanhui, Chen, Chiyu, Ma, Ningsheng, Gao, Jianfei, Lyu, Han, Lin, Dahua, Chen, Kai |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2512.10739 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
por: Hua, Zhouqi, et al.
Publicado: (2025)
por: Hua, Zhouqi, et al.
Publicado: (2025)
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
por: Shen, Junhao, et al.
Publicado: (2025)
por: Shen, Junhao, et al.
Publicado: (2025)
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
por: Zhao, Haiteng, et al.
Publicado: (2025)
por: Zhao, Haiteng, et al.
Publicado: (2025)
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
por: Gu, Yuzhe, et al.
Publicado: (2025)
por: Gu, Yuzhe, et al.
Publicado: (2025)
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification
por: Wu, Zijian, et al.
Publicado: (2025)
por: Wu, Zijian, et al.
Publicado: (2025)
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
por: Lyu, Chengqi, et al.
Publicado: (2025)
por: Lyu, Chengqi, et al.
Publicado: (2025)
ANAH: Analytical Annotation of Hallucinations in Large Language Models
por: Ji, Ziwei, et al.
Publicado: (2024)
por: Ji, Ziwei, et al.
Publicado: (2024)
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
por: Gu, Yuzhe, et al.
Publicado: (2024)
por: Gu, Yuzhe, et al.
Publicado: (2024)
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
por: Liu, Shudong, et al.
Publicado: (2025)
por: Liu, Shudong, et al.
Publicado: (2025)
Are Your LLMs Capable of Stable Reasoning?
por: Liu, Junnan, et al.
Publicado: (2024)
por: Liu, Junnan, et al.
Publicado: (2024)
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
por: Song, Zifan, et al.
Publicado: (2024)
por: Song, Zifan, et al.
Publicado: (2024)
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
por: Zhang, Chuyu, et al.
Publicado: (2024)
por: Zhang, Chuyu, et al.
Publicado: (2024)
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
por: Zhao, Zhonghan, et al.
Publicado: (2025)
por: Zhao, Zhonghan, et al.
Publicado: (2025)
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
por: Chen, Zehui, et al.
Publicado: (2023)
por: Chen, Zehui, et al.
Publicado: (2023)
Fake Alignment: Are LLMs Really Aligned Well?
por: Wang, Yixu, et al.
Publicado: (2023)
por: Wang, Yixu, et al.
Publicado: (2023)
Training Language Models to Critique With Multi-agent Feedback
por: Lan, Tian, et al.
Publicado: (2024)
por: Lan, Tian, et al.
Publicado: (2024)
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
por: Chen, Zehui, et al.
Publicado: (2024)
por: Chen, Zehui, et al.
Publicado: (2024)
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
por: Chen, Zehui, et al.
Publicado: (2024)
por: Chen, Zehui, et al.
Publicado: (2024)
Rethinking Verification for LLM Code Generation: From Generation to Testing
por: Ma, Zihan, et al.
Publicado: (2025)
por: Ma, Zihan, et al.
Publicado: (2025)
InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
por: Li, Peiji, et al.
Publicado: (2025)
por: Li, Peiji, et al.
Publicado: (2025)
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
por: Ying, Huaiyuan, et al.
Publicado: (2024)
por: Ying, Huaiyuan, et al.
Publicado: (2024)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
por: Li, Rongjie, et al.
Publicado: (2024)
por: Li, Rongjie, et al.
Publicado: (2024)
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
por: Wang, Chonghua, et al.
Publicado: (2024)
por: Wang, Chonghua, et al.
Publicado: (2024)
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
por: Liu, Hongwei, et al.
Publicado: (2024)
por: Liu, Hongwei, et al.
Publicado: (2024)
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Critic-Guided Search
por: Wu, Zijian, et al.
Publicado: (2024)
por: Wu, Zijian, et al.
Publicado: (2024)
Towards Imperceptible Adversarial Attacks for Time Series Classification with Local Perturbations and Frequency Analysis
por: Gu, Wenwei, et al.
Publicado: (2025)
por: Gu, Wenwei, et al.
Publicado: (2025)
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
por: Zhuo, Jingming, et al.
Publicado: (2024)
por: Zhuo, Jingming, et al.
Publicado: (2024)
Mastering Olympiad-Level Physics with Artificial Intelligence
por: Jian, Dong-Shan, et al.
Publicado: (2025)
por: Jian, Dong-Shan, et al.
Publicado: (2025)
Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction
por: Su, Wuqi, et al.
Publicado: (2026)
por: Su, Wuqi, et al.
Publicado: (2026)
OpenCompass: A Universal Evaluation Platform for Large Language Models
por: Cao, Maosong, et al.
Publicado: (2026)
por: Cao, Maosong, et al.
Publicado: (2026)
HUMAN RESOURCE DEVELOPMENT AND SOCIAL EMPOWERMENT: A HOLISTIC FRAMEWORK FOR SUSTAINABLE COMMUNITY GROWTH
por: Amiya Bhaumik, Lyu Wenwei, Nandar Win
Publicado: (2026)
por: Amiya Bhaumik, Lyu Wenwei, Nandar Win
Publicado: (2026)
Collaborative Performance Prediction for Large Language Models
por: Zhang, Qiyuan, et al.
Publicado: (2024)
por: Zhang, Qiyuan, et al.
Publicado: (2024)
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
por: He, Chaoqun, et al.
Publicado: (2024)
por: He, Chaoqun, et al.
Publicado: (2024)
InternLM-Law: An Open Source Chinese Legal Large Language Model
por: Fei, Zhiwei, et al.
Publicado: (2024)
por: Fei, Zhiwei, et al.
Publicado: (2024)
Recent Progress of Low‐Dimensional Metal‐Organic Frameworks for Aqueous Zinc‐Based Batteries
por: Hanfang Xing, et al.
Publicado: (2024)
por: Hanfang Xing, et al.
Publicado: (2024)
The adaptive EM schemes for McKean-Vlasov SDEs with common noise in finite and infinite horizons
por: Liu, Hu, et al.
Publicado: (2025)
por: Liu, Hu, et al.
Publicado: (2025)
Exploring the MBTI distribution among Chinese undergraduate physics students: the influence of family income on career trajectories
por: Bai, Songyang, et al.
Publicado: (2024)
por: Bai, Songyang, et al.
Publicado: (2024)
Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
por: Chen, Sizhou, et al.
Publicado: (2023)
por: Chen, Sizhou, et al.
Publicado: (2023)
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
por: Li, Zenan, et al.
Publicado: (2025)
por: Li, Zenan, et al.
Publicado: (2025)
A Level Set Method with Secant Iterations for the Least-Squares Constrained Nuclear Norm Minimization
por: Ma, Chiyu, et al.
Publicado: (2026)
por: Ma, Chiyu, et al.
Publicado: (2026)
Ejemplares similares
-
The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
por: Hua, Zhouqi, et al.
Publicado: (2025) -
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
por: Shen, Junhao, et al.
Publicado: (2025) -
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
por: Zhao, Haiteng, et al.
Publicado: (2025) -
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
por: Gu, Yuzhe, et al.
Publicado: (2025) -
OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification
por: Wu, Zijian, et al.
Publicado: (2025)