Saved in:
| Main Authors: | Wang, Zehao, Jin, Shilong, Cao, Zhao, Wang, Lanjun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.23414 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
by: Wang, Zehao, et al.
Published: (2026)
by: Wang, Zehao, et al.
Published: (2026)
Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
by: Wang, Zehong, et al.
Published: (2026)
by: Wang, Zehong, et al.
Published: (2026)
SE-GA: Memory-Augmented Self-Evolution for GUI Agents
by: Jin, Shilong, et al.
Published: (2026)
by: Jin, Shilong, et al.
Published: (2026)
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
by: Hadeliya, Tsimur, et al.
Published: (2025)
by: Hadeliya, Tsimur, et al.
Published: (2025)
Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification
by: Wu, Zehao, et al.
Published: (2025)
by: Wu, Zehao, et al.
Published: (2025)
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
by: Xiao, Boyu, et al.
Published: (2026)
by: Xiao, Boyu, et al.
Published: (2026)
Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System
by: Zhao, Yang, et al.
Published: (2024)
by: Zhao, Yang, et al.
Published: (2024)
When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE
by: Erol, Melihcan, et al.
Published: (2026)
by: Erol, Melihcan, et al.
Published: (2026)
MemFail: Stress-Testing Failure Modes of LLM Memory Systems
by: Garg, Ishir, et al.
Published: (2026)
by: Garg, Ishir, et al.
Published: (2026)
When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
by: Wang, Youting, et al.
Published: (2026)
by: Wang, Youting, et al.
Published: (2026)
PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement
by: Zhang, Tuo, et al.
Published: (2026)
by: Zhang, Tuo, et al.
Published: (2026)
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
by: Zeng, Yifan, et al.
Published: (2026)
by: Zeng, Yifan, et al.
Published: (2026)
Frequency Matters: When Time Series Foundation Models Fail Under Spectral Shift
by: Wang, Tianze, et al.
Published: (2025)
by: Wang, Tianze, et al.
Published: (2025)
Agentic Unlearning: When LLM Agent Meets Machine Unlearning
by: Wang, Bin, et al.
Published: (2026)
by: Wang, Bin, et al.
Published: (2026)
NK-GAD: Neighbor Knowledge-Enhanced Unsupervised Graph Anomaly Detection
by: Wang, Zehao, et al.
Published: (2026)
by: Wang, Zehao, et al.
Published: (2026)
When LLM Judge Scores Look Good but Best-of-N Decisions Fail
by: Landesberg, Eddie
Published: (2026)
by: Landesberg, Eddie
Published: (2026)
Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity
by: Yang, Yingxuan, et al.
Published: (2026)
by: Yang, Yingxuan, et al.
Published: (2026)
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
by: Yang, Huchen, et al.
Published: (2026)
by: Yang, Huchen, et al.
Published: (2026)
Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning
by: Liu, Zehao, et al.
Published: (2026)
by: Liu, Zehao, et al.
Published: (2026)
When Counterfactual Reasoning Fails: Chaos and Real-World Complexity
by: Aalaila, Yahya, et al.
Published: (2025)
by: Aalaila, Yahya, et al.
Published: (2025)
On the Importance of Task Complexity in Evaluating LLM-Based Multi-Agent Systems
by: Tang, Bohan, et al.
Published: (2025)
by: Tang, Bohan, et al.
Published: (2025)
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems
by: Wang, Zhao, et al.
Published: (2025)
by: Wang, Zhao, et al.
Published: (2025)
CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution
by: Shao, Minghao, et al.
Published: (2025)
by: Shao, Minghao, et al.
Published: (2025)
STeCa: Step-level Trajectory Calibration for LLM Agent Learning
by: Wang, Hanlin, et al.
Published: (2025)
by: Wang, Hanlin, et al.
Published: (2025)
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
by: Yang, Jiaxi, et al.
Published: (2025)
by: Yang, Jiaxi, et al.
Published: (2025)
MPO: Boosting LLM Agents with Meta Plan Optimization
by: Xiong, Weimin, et al.
Published: (2025)
by: Xiong, Weimin, et al.
Published: (2025)
Agent-Oriented Planning in Multi-Agent Systems
by: Li, Ao, et al.
Published: (2024)
by: Li, Ao, et al.
Published: (2024)
When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents
by: Hu, Qisheng, et al.
Published: (2026)
by: Hu, Qisheng, et al.
Published: (2026)
When Will It Fail?: Anomaly to Prompt for Forecasting Future Anomalies in Time Series
by: Park, Min-Yeong, et al.
Published: (2025)
by: Park, Min-Yeong, et al.
Published: (2025)
When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems
by: Su, Junwei, et al.
Published: (2026)
by: Su, Junwei, et al.
Published: (2026)
Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents
by: Dobrovsky, Aline, et al.
Published: (2025)
by: Dobrovsky, Aline, et al.
Published: (2025)
Universe Routing: Why Self-Evolving Agents Need Epistemic Control
by: Wang, Zhaohui Geoffrey
Published: (2026)
by: Wang, Zhaohui Geoffrey
Published: (2026)
When Validation Fails: Cross-Institutional Blood Pressure Prediction and the Limits of Electronic Health Record-Based Models
by: Azam, Md Basit, et al.
Published: (2025)
by: Azam, Md Basit, et al.
Published: (2025)
Synthetic Error Injection Fails to Elicit Self-Correction In Language Models
by: Wu, David X., et al.
Published: (2025)
by: Wu, David X., et al.
Published: (2025)
MAVEN: Multi-Agent Verification-Elaboration Network with In-Step Epistemic Auditing
by: Yao, Yinsheng, et al.
Published: (2026)
by: Yao, Yinsheng, et al.
Published: (2026)
DiFR: Inference Verification Despite Nondeterminism
by: Karvonen, Adam, et al.
Published: (2025)
by: Karvonen, Adam, et al.
Published: (2025)
TruthFlow: Truthful LLM Generation via Representation Flow Correction
by: Wang, Hanyu, et al.
Published: (2025)
by: Wang, Hanyu, et al.
Published: (2025)
When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift
by: Vogt-Lowell, Kevin, et al.
Published: (2026)
by: Vogt-Lowell, Kevin, et al.
Published: (2026)
When Chain-of-Thought Fails, the Solution Hides in the Hidden States
by: Mehrafarin, Houman, et al.
Published: (2026)
by: Mehrafarin, Houman, et al.
Published: (2026)
Epistemic Deep Learning: Enabling Machine Learning Models to Know When They Do Not Know
by: Manchingal, Shireen Kudukkil
Published: (2025)
by: Manchingal, Shireen Kudukkil
Published: (2025)
Similar Items
-
Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
by: Wang, Zehao, et al.
Published: (2026) -
Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
by: Wang, Zehong, et al.
Published: (2026) -
SE-GA: Memory-Augmented Self-Evolution for GUI Agents
by: Jin, Shilong, et al.
Published: (2026) -
When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents
by: Hadeliya, Tsimur, et al.
Published: (2025) -
Gradient-Based Model Fingerprinting for LLM Similarity Detection and Family Classification
by: Wu, Zehao, et al.
Published: (2025)