:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Kunlun, Liu, Zijia, Li, Bingxuan, Tian, Muxin, Yang, Yingxuan, Zhang, Jiaxun, Han, Pengrui, Xie, Qipeng, Cui, Fuyang, Zhang, Weijia, Ma, Xiaoteng, Yu, Xiaodong, Ramesh, Gowtham, Wu, Jialian, Liu, Zicheng, Lu, Pan, Zou, James, You, Jiaxuan
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2509.25370
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
by: Zhu, Kunlun, et al.
Published: (2025)

SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs
by: Zhang, Weijia, et al.
Published: (2025)

AcademicEval: Live Long-Context LLM Benchmark
by: Zhang, Haozhen, et al.
Published: (2025)

TinyScientist: An Interactive, Extensible, and Controllable Framework for Building Research Agents
by: Yu, Haofei, et al.
Published: (2025)

TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
by: Mishra, Prakamya, et al.
Published: (2025)

Thought-Retriever: Don't Just Retrieve Raw Data, Retrieve Thoughts for Memory-Augmented Agentic Systems
by: Feng, Tao, et al.
Published: (2026)

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
by: Lin, Guanyu, et al.
Published: (2024)

Instella: Fully Open Language Models with Stellar Performance
by: Liu, Jiang, et al.
Published: (2025)

In-Context Learning May Not Elicit Trustworthy Reasoning: A-Not-B Errors in Pretrained Language Models
by: Han, Pengrui, et al.
Published: (2024)

Real-World Benchmarks Make Membership Inference Attacks Fail on Diffusion Models
by: Liang, Chumeng, et al.
Published: (2024)

Which LLM Multi-Agent Protocol to Choose?
by: Du, Hongyi, et al.
Published: (2025)

Stabilizing Efficient Reasoning with Step-Level Advantage Selection
by: Wang, Han, et al.
Published: (2026)

When Verification Fails: How Compositionally Infeasible Claims Escape Rejection
by: Liu, Muxin, et al.
Published: (2026)

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
by: Liu, Zijia, et al.
Published: (2025)

SWE-Bench Mobile: Can Large Language Model Agents Develop Industry-Level Mobile Applications?
by: Tian, Muxin, et al.
Published: (2026)

Mistake Notebook Learning: Batch-Clustered Failures for Training-Free Agent Adaptation
by: Su, Xuanbo, et al.
Published: (2025)

Large Language Model Reasoning Failures
by: Song, Peiyang, et al.
Published: (2026)

DRIFT: Transferring Reasoning Priors for Efficient MLLM Fine-Tuning
by: Huang, Chao, et al.
Published: (2025)

APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation
by: Zhou, Yuzhen, et al.
Published: (2025)

Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning
by: Xu, Zhikun, et al.
Published: (2026)

VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
by: Lin, Jingyang, et al.
Published: (2026)

Learning from Online Videos at Inference Time for Computer-Use Agents
by: Liu, Yujian, et al.
Published: (2025)

Reasoning Fails Where Step Flow Breaks
by: Xu, Xiaoyu, et al.
Published: (2026)

FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data
by: Feng, Tao, et al.
Published: (2025)

Agent Laboratory: Using LLM Agents as Research Assistants
by: Schmidgall, Samuel, et al.
Published: (2025)

Latent Visual Reasoning
by: Li, Bangzheng, et al.
Published: (2025)

XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
by: Wang, Xingrui, et al.
Published: (2025)

Instella-T2I: Pushing the Limits of 1D Discrete Latent Space Image Generation
by: Wang, Ze, et al.
Published: (2025)

ImageDoctor: Diagnosing Text-to-Image Generation via Grounded Image Reasoning
by: Guo, Yuxiang, et al.
Published: (2025)

Self-Taught Agentic Long Context Understanding
by: Zhuang, Yufan, et al.
Published: (2025)

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
by: Wang, Xingrui, et al.
Published: (2025)

How Far Are We From AGI: Are LLMs All We Need?
by: Feng, Tao, et al.
Published: (2024)

Uncovering Singularities in Feynman Integrals via Machine Learning
by: Liu, Yuanche, et al.
Published: (2025)

“Everyone's Struggling:” Coping With Institutionalized Hierarchies of Competence Through Emotional Resonance
by: Muxin Zhang, et al.
Published: (2025)

Augmenting Interface Usability Heuristics for Reliable Computer-Use Agents
by: Liu, Jiateng, et al.
Published: (2026)

CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models
by: Liang, Yihao, et al.
Published: (2026)

MOVi: Training-free Text-conditioned Multi-Object Video Generation
by: Rahman, Aimon, et al.
Published: (2025)

Where LLM Annotators Fail: Label-Free Learning on Graphs with LLMs
by: Thapaliya, Safal, et al.
Published: (2026)

SonicSense: Object Perception from In-Hand Acoustic Vibration
by: Liu, Jiaxun, et al.
Published: (2024)

Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach
by: Xiaoteng, et al.
Published: (2024)