Saved in:
| Main Authors: | Shi, Jiajun, Yang, Jian, Liu, Jiaheng, Bu, Xingyuan, Chen, Jiangjie, Zhou, Junting, Ma, Kaijing, Wen, Zhoufutu, Wang, Bingli, He, Yancheng, Song, Liang, Zhu, Hualei, Li, Shilong, Wang, Xingjian, Zhang, Wei, Yuan, Ruibin, Yao, Yifan, Yang, Wenjun, Wang, Yunli, Fang, Siyuan, Yuan, Siyu, He, Qianyu, Tang, Xiangru, Tan, Yingshui, Zhou, Wangchunshu, Zhang, Zhaoxiang, Li, Zhoujun, Huang, Wenhao, Zhang, Ge |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.14552 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
by: He, Yancheng, et al.
Published: (2025)
by: He, Yancheng, et al.
Published: (2025)
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
by: Zhang, Alexander, et al.
Published: (2025)
by: Zhang, Alexander, et al.
Published: (2025)
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
by: He, Qianyu, et al.
Published: (2025)
by: He, Qianyu, et al.
Published: (2025)
IFEvalCode: Controlled Code Generation
by: Yang, Jian, et al.
Published: (2025)
by: Yang, Jian, et al.
Published: (2025)
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)
by: Li, Shilong, et al.
Published: (2025)
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
by: Li, Shilong, et al.
Published: (2024)
by: Li, Shilong, et al.
Published: (2024)
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
by: Liu, Jianyu, et al.
Published: (2025)
by: Liu, Jianyu, et al.
Published: (2025)
A Comprehensive Survey on Long Context Language Modeling
by: Liu, Jiaheng, et al.
Published: (2025)
by: Liu, Jiaheng, et al.
Published: (2025)
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
by: Ma, Kaijing, et al.
Published: (2024)
by: Ma, Kaijing, et al.
Published: (2024)
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
by: Que, Haoran, et al.
Published: (2024)
by: Que, Haoran, et al.
Published: (2024)
Think-J: Learning to Think for Generative LLM-as-a-Judge
by: Huang, Hui, et al.
Published: (2025)
by: Huang, Hui, et al.
Published: (2025)
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models
by: He, Yancheng, et al.
Published: (2024)
by: He, Yancheng, et al.
Published: (2024)
Skillful High-Resolution Ensemble Precipitation Forecasting with an Integrated Deep Learning Framework
by: He, Shuangshuang, et al.
Published: (2025)
by: He, Shuangshuang, et al.
Published: (2025)
OProver: A Unified Framework for Agentic Formal Theorem Proving
by: Ma, David, et al.
Published: (2026)
by: Ma, David, et al.
Published: (2026)
Optimal Task and Motion Planning for Autonomous Systems Using Petri Nets
by: He, Zhou, et al.
Published: (2025)
by: He, Zhou, et al.
Published: (2025)
FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Solver
by: He, Shuangshuang, et al.
Published: (2025)
by: He, Shuangshuang, et al.
Published: (2025)
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
by: Li, Shilong, et al.
Published: (2024)
by: Li, Shilong, et al.
Published: (2024)
Time–jerk optimal trajectory planning for industrial robots with coupled interpolation function selection
by: Shilong Wang, et al.
Published: (2024)
by: Shilong Wang, et al.
Published: (2024)
MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise
by: Deng, Chunyuan, et al.
Published: (2024)
by: Deng, Chunyuan, et al.
Published: (2024)
Challenging the Law of Energy Conservation Through Superposed Waves Based on Spatial Symmetry of Two RF Sources: Theoretical Derivation and Experimental Verification
by: Jiao, Bingli, et al.
Published: (2025)
by: Jiao, Bingli, et al.
Published: (2025)
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
by: Gu, Jihao, et al.
Published: (2025)
by: Gu, Jihao, et al.
Published: (2025)
Chinese SafetyQA: A Safety Short-form Factuality Benchmark for Large Language Models
by: Tan, Yingshui, et al.
Published: (2024)
by: Tan, Yingshui, et al.
Published: (2024)
MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training
by: Huang, Hui, et al.
Published: (2025)
by: Huang, Hui, et al.
Published: (2025)
AIR: Complex Instruction Generation via Automatic Iterative Refinement
by: Liu, Wei, et al.
Published: (2025)
by: Liu, Wei, et al.
Published: (2025)
MemEvolve: Meta-Evolution of Agent Memory Systems
by: Zhang, Guibin, et al.
Published: (2025)
by: Zhang, Guibin, et al.
Published: (2025)
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
by: Tang, Xiangru, et al.
Published: (2025)
by: Tang, Xiangru, et al.
Published: (2025)
RoomCraft: Controllable and Complete 3D Indoor Scene Generation
by: Zhou, Mengqi, et al.
Published: (2025)
by: Zhou, Mengqi, et al.
Published: (2025)
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
by: Zhang, Yikai, et al.
Published: (2024)
by: Zhang, Yikai, et al.
Published: (2024)
CNCast: Leveraging 3D Swin Transformer and DiT for Enhanced Regional Weather Forecasting
by: Liang, Hongli, et al.
Published: (2025)
by: Liang, Hongli, et al.
Published: (2025)
Polarforming for Wireless Communications: Modeling and Performance Analysis
by: Zhou, Zijian, et al.
Published: (2024)
by: Zhou, Zijian, et al.
Published: (2024)
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
by: Chen, Jiangjie, et al.
Published: (2025)
by: Chen, Jiangjie, et al.
Published: (2025)
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
by: Bai, Ge, et al.
Published: (2024)
by: Bai, Ge, et al.
Published: (2024)
Learnable Graph Matching: A Practical Paradigm for Data Association
by: He, Jiawei, et al.
Published: (2023)
by: He, Jiawei, et al.
Published: (2023)
Weakly Supervised 3D Object Detection with Multi-Stage Generalization
by: He, Jiawei, et al.
Published: (2023)
by: He, Jiawei, et al.
Published: (2023)
MIO: A Foundation Model on Multimodal Tokens
by: Wang, Zekun, et al.
Published: (2024)
by: Wang, Zekun, et al.
Published: (2024)
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
by: Tan, Yingshui, et al.
Published: (2025)
by: Tan, Yingshui, et al.
Published: (2025)
Efficient Agents: Building Effective Agents While Reducing Cost
by: Wang, Ningning, et al.
Published: (2025)
by: Wang, Ningning, et al.
Published: (2025)
AgenTracer: Who Is Inducing Failure in the LLM Agentic Systems?
by: Zhang, Guibin, et al.
Published: (2025)
by: Zhang, Guibin, et al.
Published: (2025)
M3TQA: Massively Multilingual Multitask Table Question Answering
by: Shu, Daixin, et al.
Published: (2025)
by: Shu, Daixin, et al.
Published: (2025)
ROIC-DM: Robust Text Inference and Classification via Diffusion Model
by: Yuan, Shilong, et al.
Published: (2024)
by: Yuan, Shilong, et al.
Published: (2024)
Similar Items
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
by: He, Yancheng, et al.
Published: (2025) -
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
by: Zhang, Alexander, et al.
Published: (2025) -
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models
by: He, Qianyu, et al.
Published: (2025) -
IFEvalCode: Controlled Code Generation
by: Yang, Jian, et al.
Published: (2025) -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
by: Li, Shilong, et al.
Published: (2025)