Saved in:
| Main Authors: | Zhu, Kunlun, Liu, Zijia, Li, Bingxuan, Tian, Muxin, Yang, Yingxuan, Zhang, Jiaxun, Han, Pengrui, Xie, Qipeng, Cui, Fuyang, Zhang, Weijia, Ma, Xiaoteng, Yu, Xiaodong, Ramesh, Gowtham, Wu, Jialian, Liu, Zicheng, Lu, Pan, Zou, James, You, Jiaxuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.25370 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
by: Zhu, Kunlun, et al.
Published: (2025)
by: Zhu, Kunlun, et al.
Published: (2025)
SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs
by: Zhang, Weijia, et al.
Published: (2025)
by: Zhang, Weijia, et al.
Published: (2025)
AcademicEval: Live Long-Context LLM Benchmark
by: Zhang, Haozhen, et al.
Published: (2025)
by: Zhang, Haozhen, et al.
Published: (2025)
TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents
by: Yu, Haofei, et al.
Published: (2025)
by: Yu, Haofei, et al.
Published: (2025)
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)
by: Mishra, Prakamya, et al.
Published: (2025)
Thought-Retriever: Don't Just Retrieve Raw Data, Retrieve Thoughts for Memory-Augmented Agentic Systems
by: Feng, Tao, et al.
Published: (2026)
by: Feng, Tao, et al.
Published: (2026)
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
by: Lin, Guanyu, et al.
Published: (2024)
by: Lin, Guanyu, et al.
Published: (2024)
Instella: Fully Open Language Models with Stellar Performance
by: Liu, Jiang, et al.
Published: (2025)
by: Liu, Jiang, et al.
Published: (2025)
In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
by: Han, Pengrui, et al.
Published: (2024)
by: Han, Pengrui, et al.
Published: (2024)
Real-World Benchmarks Make Membership Inference Attacks Fail on Diffusion Models
by: Liang, Chumeng, et al.
Published: (2024)
by: Liang, Chumeng, et al.
Published: (2024)
Which LLM Multi-Agent Protocol to Choose?
by: Du, Hongyi, et al.
Published: (2025)
by: Du, Hongyi, et al.
Published: (2025)
Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
When Verification Fails: How Compositionally Infeasible Claims Escape Rejection
by: Liu, Muxin, et al.
Published: (2026)
by: Liu, Muxin, et al.
Published: (2026)
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
by: Liu, Zijia, et al.
Published: (2025)
by: Liu, Zijia, et al.
Published: (2025)
SWE-Bench Mobile: Can Large Language Model Agents Develop Industry-Level Mobile Applications?
by: Tian, Muxin, et al.
Published: (2026)
by: Tian, Muxin, et al.
Published: (2026)
Mistake Notebook Learning: Batch-Clustered Failures for Training-Free Agent Adaptation
by: Su, Xuanbo, et al.
Published: (2025)
by: Su, Xuanbo, et al.
Published: (2025)
Large Language Model Reasoning Failures
by: Song, Peiyang, et al.
Published: (2026)
by: Song, Peiyang, et al.
Published: (2026)
DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
by: Huang, Chao, et al.
Published: (2025)
by: Huang, Chao, et al.
Published: (2025)
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)
by: Zhou, Yuzhen, et al.
Published: (2025)
Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning
by: Xu, Zhikun, et al.
Published: (2026)
by: Xu, Zhikun, et al.
Published: (2026)
VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026)
by: Lin, Jingyang, et al.
Published: (2026)
Learning from Online Videos at Inference Time for Computer-Use Agents
by: Liu, Yujian, et al.
Published: (2025)
by: Liu, Yujian, et al.
Published: (2025)
Reasoning Fails Where Step Flow Breaks
by: Xu, Xiaoyu, et al.
Published: (2026)
by: Xu, Xiaoyu, et al.
Published: (2026)
FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
Agent Laboratory: Using LLM Agents as Research Assistants
by: Schmidgall, Samuel, et al.
Published: (2025)
by: Schmidgall, Samuel, et al.
Published: (2025)
Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)
by: Li, Bangzheng, et al.
Published: (2025)
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)
by: Wang, Ze, et al.
Published: (2025)
ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)
by: Guo, Yuxiang, et al.
Published: (2025)
Self-Taught Agentic Long Context Understanding
by: Zhuang, Yufan, et al.
Published: (2025)
by: Zhuang, Yufan, et al.
Published: (2025)
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)
by: Wang, Xingrui, et al.
Published: (2025)
How Far Are We From AGI: Are LLMs All We Need?
by: Feng, Tao, et al.
Published: (2024)
by: Feng, Tao, et al.
Published: (2024)
Uncovering Singularities in Feynman Integrals via Machine Learning
by: Liu, Yuanche, et al.
Published: (2025)
by: Liu, Yuanche, et al.
Published: (2025)
“Everyone's Struggling:” Coping With Institutionalized Hierarchies of Competence Through Emotional Resonance
by: Muxin Zhang, et al.
Published: (2025)
by: Muxin Zhang, et al.
Published: (2025)
Augmenting Interface Usability Heuristics for Reliable Computer-Use Agents
by: Liu, Jiateng, et al.
Published: (2026)
by: Liu, Jiateng, et al.
Published: (2026)
CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models
by: Liang, Yihao, et al.
Published: (2026)
by: Liang, Yihao, et al.
Published: (2026)
MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)
by: Rahman, Aimon, et al.
Published: (2025)
Where LLM Annotators Fail: Label-Free Learning on Graphs with LLMs
by: Thapaliya, Safal, et al.
Published: (2026)
by: Thapaliya, Safal, et al.
Published: (2026)
SonicSense: Object Perception from In-Hand Acoustic Vibration
by: Liu, Jiaxun, et al.
Published: (2024)
by: Liu, Jiaxun, et al.
Published: (2024)
Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach
by: Xiaoteng, et al.
Published: (2024)
by: Xiaoteng, et al.
Published: (2024)
Similar Items
-
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
by: Zhu, Kunlun, et al.
Published: (2025) -
SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs
by: Zhang, Weijia, et al.
Published: (2025) -
AcademicEval: Live Long-Context LLM Benchmark
by: Zhang, Haozhen, et al.
Published: (2025) -
TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents
by: Yu, Haofei, et al.
Published: (2025) -
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)