Guardado en:
| Autores principales: | Deng, Hexuan, Ke, Xiaopeng, Li, Yichen, Hu, Ruina, Huang, Dehao, Wong, Derek F., Wang, Yue, Liu, Xuebo, Zhang, Min |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2605.07905 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
por: Ke, Xiaopeng, et al.
Publicado: (2025)
por: Ke, Xiaopeng, et al.
Publicado: (2025)
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
por: Deng, Hexuan, et al.
Publicado: (2024)
por: Deng, Hexuan, et al.
Publicado: (2024)
RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering
por: Yuan, Bo, et al.
Publicado: (2026)
por: Yuan, Bo, et al.
Publicado: (2026)
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
por: Deng, Hexuan, et al.
Publicado: (2025)
por: Deng, Hexuan, et al.
Publicado: (2025)
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
por: Deng, Hexuan, et al.
Publicado: (2024)
por: Deng, Hexuan, et al.
Publicado: (2024)
Large Language Model for Multi-Domain Translation: Benchmarking and Domain CoT Fine-tuning
por: Hu, Tianxiang, et al.
Publicado: (2024)
por: Hu, Tianxiang, et al.
Publicado: (2024)
Dynamic Sampling that Adapts: Self-Aware Iterative Data Persistent Optimization for Mathematical Reasoning
por: Rao, Jun, et al.
Publicado: (2025)
por: Rao, Jun, et al.
Publicado: (2025)
CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation
por: Li, Renhao, et al.
Publicado: (2024)
por: Li, Renhao, et al.
Publicado: (2024)
Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models
por: Nie, Shuo, et al.
Publicado: (2026)
por: Nie, Shuo, et al.
Publicado: (2026)
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
por: Liu, Liangxin, et al.
Publicado: (2024)
por: Liu, Liangxin, et al.
Publicado: (2024)
SGIC: A Self-Guided Iterative Calibration Framework for RAG
por: Chen, Guanhua, et al.
Publicado: (2025)
por: Chen, Guanhua, et al.
Publicado: (2025)
CDT: A Comprehensive Capability Framework for Large Language Models Across Cognition, Domain, and Task
por: Mo, Haosi, et al.
Publicado: (2025)
por: Mo, Haosi, et al.
Publicado: (2025)
Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore
por: Wu, Junchao, et al.
Publicado: (2024)
por: Wu, Junchao, et al.
Publicado: (2024)
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?
por: Prandi, Matteo, et al.
Publicado: (2025)
por: Prandi, Matteo, et al.
Publicado: (2025)
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
por: Wang, Yutong, et al.
Publicado: (2024)
por: Wang, Yutong, et al.
Publicado: (2024)
CoAct: Co-Active LLM Preference Learning with Human-AI Synergy
por: Xu, Ruiyao, et al.
Publicado: (2026)
por: Xu, Ruiyao, et al.
Publicado: (2026)
Loong: A Human-Like Long Document Translation Agent with Observe-and-Act Adaptive Context Selection
por: Wang, Yutong, et al.
Publicado: (2026)
por: Wang, Yutong, et al.
Publicado: (2026)
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
por: Rao, Jun, et al.
Publicado: (2025)
por: Rao, Jun, et al.
Publicado: (2025)
Investigating CoT Monitorability in Large Reasoning Models
por: Yang, Shu, et al.
Publicado: (2025)
por: Yang, Shu, et al.
Publicado: (2025)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection
por: Chen, Yihan, et al.
Publicado: (2025)
por: Chen, Yihan, et al.
Publicado: (2025)
GuideBench: Benchmarking Domain-Oriented Guideline Following for LLM Agents
por: Diao, Lingxiao, et al.
Publicado: (2025)
por: Diao, Lingxiao, et al.
Publicado: (2025)
RelevAI-Reviewer: A Benchmark on AI Reviewers for Survey Paper Relevance
por: Couto, Paulo Henrique, et al.
Publicado: (2024)
por: Couto, Paulo Henrique, et al.
Publicado: (2024)
ExecRepoBench: Multi-level Executable Code Completion Evaluation
por: Yang, Jian, et al.
Publicado: (2024)
por: Yang, Jian, et al.
Publicado: (2024)
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists
por: Shao, Chenyang, et al.
Publicado: (2025)
por: Shao, Chenyang, et al.
Publicado: (2025)
DentalBench: Benchmarking and Advancing LLMs Capability for Bilingual Dentistry Understanding
por: Zhu, Hengchuan, et al.
Publicado: (2025)
por: Zhu, Hengchuan, et al.
Publicado: (2025)
TOD-ProcBench: Benchmarking Complex Instruction-Following in Task-Oriented Dialogues
por: Ghazarian, Sarik, et al.
Publicado: (2025)
por: Ghazarian, Sarik, et al.
Publicado: (2025)
PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing
por: Żurawicki, Krzysztof, et al.
Publicado: (2026)
por: Żurawicki, Krzysztof, et al.
Publicado: (2026)
OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking
por: Yang, Heng, et al.
Publicado: (2025)
por: Yang, Heng, et al.
Publicado: (2025)
CharacterBench: Benchmarking Character Customization of Large Language Models
por: Zhou, Jinfeng, et al.
Publicado: (2024)
por: Zhou, Jinfeng, et al.
Publicado: (2024)
DataSciBench: An LLM Agent Benchmark for Data Science
por: Zhang, Dan, et al.
Publicado: (2025)
por: Zhang, Dan, et al.
Publicado: (2025)
Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents
por: Shen, Yiting, et al.
Publicado: (2026)
por: Shen, Yiting, et al.
Publicado: (2026)
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
por: Wen, Bosi, et al.
Publicado: (2026)
por: Wen, Bosi, et al.
Publicado: (2026)
BenchBench: Benchmarking Automated Benchmark Generation
por: Zheng, Yandan, et al.
Publicado: (2026)
por: Zheng, Yandan, et al.
Publicado: (2026)
MHRC-Bench: A Multilingual Hardware Repository-Level Code Completion benchmark
por: Zou, Qingyun, et al.
Publicado: (2026)
por: Zou, Qingyun, et al.
Publicado: (2026)
CoCoP: Enhancing Text Classification with LLM through Code Completion Prompt
por: Mohajeri, Mohammad Mahdi, et al.
Publicado: (2024)
por: Mohajeri, Mohammad Mahdi, et al.
Publicado: (2024)
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
por: Lin, Zicheng, et al.
Publicado: (2024)
por: Lin, Zicheng, et al.
Publicado: (2024)
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
por: Deng, Yuntian, et al.
Publicado: (2024)
por: Deng, Yuntian, et al.
Publicado: (2024)
PodBench: A Comprehensive Benchmark for Instruction-Aware Audio-Oriented Podcast Script Generation
por: Xu, Chenning, et al.
Publicado: (2026)
por: Xu, Chenning, et al.
Publicado: (2026)
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review
por: Yu, Sungduk, et al.
Publicado: (2025)
por: Yu, Sungduk, et al.
Publicado: (2025)
ClawBench: Can AI Agents Complete Everyday Online Tasks?
por: Zhang, Yuxuan, et al.
Publicado: (2026)
por: Zhang, Yuxuan, et al.
Publicado: (2026)
Ejemplares similares
-
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
por: Ke, Xiaopeng, et al.
Publicado: (2025) -
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
por: Deng, Hexuan, et al.
Publicado: (2024) -
RouterKGQA: Specialized--General Model Routing for Constraint-Aware Knowledge Graph Question Answering
por: Yuan, Bo, et al.
Publicado: (2026) -
REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
por: Deng, Hexuan, et al.
Publicado: (2025) -
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
por: Deng, Hexuan, et al.
Publicado: (2024)