:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Gao, Songyang, Gu, Yuzhe, Wu, Zijian, Kong, Lingkai, Zhang, Wenwei, Cai, Zhongrui, Zheng, Fan, Ma, Tianyou, Shen, Junhao, Zhao, Haiteng, Zhang, Duanyang, Zhang, Huilun, Liu, Kuikun, Lyu, Chengqi, Duan, Yanhui, Chen, Chiyu, Ma, Ningsheng, Gao, Jianfei, Lyu, Han, Lin, Dahua, Chen, Kai
Formato:	Preprint
Publicado:	2025
Materias:	Computation and Language Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2512.10739
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner
por: Hua, Zhouqi, et al.
Publicado: (2025)

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
por: Shen, Junhao, et al.
Publicado: (2025)

Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning
por: Zhao, Haiteng, et al.
Publicado: (2025)

Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
por: Gu, Yuzhe, et al.
Publicado: (2025)

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification
por: Wu, Zijian, et al.
Publicado: (2025)

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
por: Lyu, Chengqi, et al.
Publicado: (2025)

ANAH: Analytical Annotation of Hallucinations in Large Language Models
por: Ji, Ziwei, et al.
Publicado: (2024)

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
por: Gu, Yuzhe, et al.
Publicado: (2024)

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
por: Liu, Shudong, et al.
Publicado: (2025)

Are Your LLMs Capable of Stable Reasoning?
por: Liu, Junnan, et al.
Publicado: (2024)

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
por: Song, Zifan, et al.
Publicado: (2024)

CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
por: Zhang, Chuyu, et al.
Publicado: (2024)

RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
por: Zhao, Zhonghan, et al.
Publicado: (2025)

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
por: Chen, Zehui, et al.
Publicado: (2023)

Fake Alignment: Are LLMs Really Aligned Well?
por: Wang, Yixu, et al.
Publicado: (2023)

Training Language Models to Critique With Multi-agent Feedback
por: Lan, Tian, et al.
Publicado: (2024)

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
por: Chen, Zehui, et al.
Publicado: (2024)

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
por: Chen, Zehui, et al.
Publicado: (2024)

Rethinking Verification for LLM Code Generation: From Generation to Testing
por: Ma, Zihan, et al.
Publicado: (2025)

InternBootcamp Technical Report: Boosting LLM Reasoning with Verifiable Task Scaling
por: Li, Peiji, et al.
Publicado: (2025)

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
por: Ying, Huaiyuan, et al.
Publicado: (2024)

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
por: Li, Rongjie, et al.
Publicado: (2024)

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
por: Wang, Chonghua, et al.
Publicado: (2024)

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
por: Liu, Hongwei, et al.
Publicado: (2024)

InternLM2.5-StepProver: Advancing Automated Theorem Proving via Critic-Guided Search
por: Wu, Zijian, et al.
Publicado: (2024)

Towards Imperceptible Adversarial Attacks for Time Series Classification with Local Perturbations and Frequency Analysis
por: Gu, Wenwei, et al.
Publicado: (2025)

ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
por: Zhuo, Jingming, et al.
Publicado: (2024)

Mastering Olympiad-Level Physics with Artificial Intelligence
por: Jian, Dong-Shan, et al.
Publicado: (2025)

Hierarchical Awareness Adapters with Hybrid Pyramid Feature Fusion for Dense Depth Prediction
por: Su, Wuqi, et al.
Publicado: (2026)

OpenCompass: A Universal Evaluation Platform for Large Language Models
por: Cao, Maosong, et al.
Publicado: (2026)

HUMAN RESOURCE DEVELOPMENT AND SOCIAL EMPOWERMENT: A HOLISTIC FRAMEWORK FOR SUSTAINABLE COMMUNITY GROWTH
por: Amiya Bhaumik, Lyu Wenwei, Nandar Win
Publicado: (2026)

Collaborative Performance Prediction for Large Language Models
por: Zhang, Qiyuan, et al.
Publicado: (2024)

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
por: He, Chaoqun, et al.
Publicado: (2024)

InternLM-Law: An Open Source Chinese Legal Large Language Model
por: Fei, Zhiwei, et al.
Publicado: (2024)

Recent Progress of Low‐Dimensional Metal‐Organic Frameworks for Aqueous Zinc‐Based Batteries
por: Hanfang Xing, et al.
Publicado: (2024)

The adaptive EM schemes for McKean-Vlasov SDEs with common noise in finite and infinite horizons
por: Liu, Hu, et al.
Publicado: (2025)

Exploring the MBTI distribution among Chinese undergraduate physics students: the influence of family income on career trajectories
por: Bai, Songyang, et al.
Publicado: (2024)

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks
por: Chen, Sizhou, et al.
Publicado: (2023)

Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
por: Li, Zenan, et al.
Publicado: (2025)

A Level Set Method with Secant Iterations for the Least-Squares Constrained Nuclear Norm Minimization
por: Ma, Chiyu, et al.
Publicado: (2026)