Saved in:
| Main Authors: | You, Zhiwen, Chen, Xi, Vashishtha, Aniket, Du, Simo, Erion-Barner, Gabriel, Mei, Hongyuan, Peng, Hao, Guo, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.27820 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
by: Vashishtha, Aniket, et al.
Published: (2025)
by: Vashishtha, Aniket, et al.
Published: (2025)
Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent
by: Li, Bingxuan, et al.
Published: (2026)
by: Li, Bingxuan, et al.
Published: (2026)
MedConceal: A Benchmark for Clinical Hidden-Concern Reasoning Under Partial Observability
by: Han, Yikun, et al.
Published: (2026)
by: Han, Yikun, et al.
Published: (2026)
PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization
by: You, Zhiwen, et al.
Published: (2025)
by: You, Zhiwen, et al.
Published: (2025)
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning
by: Aghzal, Mohamed, et al.
Published: (2023)
by: Aghzal, Mohamed, et al.
Published: (2023)
ClinicalAgent: Clinical Trial Multi-Agent System with Large Language Model-based Reasoning
by: Yue, Ling, et al.
Published: (2024)
by: Yue, Ling, et al.
Published: (2024)
Teaching Transformers Causal Reasoning through Axiomatic Training
by: Vashishtha, Aniket, et al.
Published: (2024)
by: Vashishtha, Aniket, et al.
Published: (2024)
Generalization of RLVR Using Causal Reasoning as a Testbed
by: Lu, Brian, et al.
Published: (2025)
by: Lu, Brian, et al.
Published: (2025)
Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis?
by: Yuan, Grace Chang, et al.
Published: (2026)
by: Yuan, Grace Chang, et al.
Published: (2026)
Differentially-private text generation degrades output language quality
by: Çano, Erion, et al.
Published: (2025)
by: Çano, Erion, et al.
Published: (2025)
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists
by: Yang, Junlin, et al.
Published: (2026)
by: Yang, Junlin, et al.
Published: (2026)
AlbNews: A Corpus of Headlines for Topic Modeling in Albanian
by: Çano, Erion, et al.
Published: (2024)
by: Çano, Erion, et al.
Published: (2024)
Evaluating Vision-Language Models as Evaluators in Path Planning
by: Aghzal, Mohamed, et al.
Published: (2024)
by: Aghzal, Mohamed, et al.
Published: (2024)
Towards Self-Improving Error Diagnosis in Multi-Agent Systems
by: Li, Jiazheng, et al.
Published: (2026)
by: Li, Jiazheng, et al.
Published: (2026)
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
by: Elazar, Yanai, et al.
Published: (2023)
by: Elazar, Yanai, et al.
Published: (2023)
CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models
by: Chen, Yuefei, et al.
Published: (2025)
by: Chen, Yuefei, et al.
Published: (2025)
Causal Order: The Key to Leveraging Imperfect Experts in Causal Inference
by: Vashishtha, Aniket, et al.
Published: (2023)
by: Vashishtha, Aniket, et al.
Published: (2023)
Roles with Rails: Contract-Preserving Role Evolution in Multi-Agent Structured Reasoning
by: Ge, Ling-Yue, et al.
Published: (2026)
by: Ge, Ling-Yue, et al.
Published: (2026)
Counterfactual Graph for Multi-Agent LLM Calibration
by: Huang, Jiatan, et al.
Published: (2026)
by: Huang, Jiatan, et al.
Published: (2026)
Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis
by: Wang, Haochun, et al.
Published: (2024)
by: Wang, Haochun, et al.
Published: (2024)
MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning
by: Shen, Yuhao, et al.
Published: (2026)
by: Shen, Yuhao, et al.
Published: (2026)
Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks
by: Ruan, Zhiwen, et al.
Published: (2025)
by: Ruan, Zhiwen, et al.
Published: (2025)
ADVOSYNTH: A Synthetic Multi-Advocate Dataset for Speaker Identification in Courtroom Scenarios
by: Deroy, Aniket
Published: (2026)
by: Deroy, Aniket
Published: (2026)
Sparks of Cooperative Reasoning: LLMs as Strategic Hanabi Agents
by: Ramesh, Mahesh, et al.
Published: (2026)
by: Ramesh, Mahesh, et al.
Published: (2026)
Synthesizing the Virtual Advocate: A Multi-Persona Speech Generation Framework for Diverse Linguistic Jurisdictions in Indic Languages
by: Deroy, Aniket
Published: (2026)
by: Deroy, Aniket
Published: (2026)
Can LLMs Simulate Personas with Reversed Performance? A Systematic Investigation for Counterfactual Instruction Following in Math Reasoning Context
by: Kumar, Sai Adith Senthil, et al.
Published: (2025)
by: Kumar, Sai Adith Senthil, et al.
Published: (2025)
From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning
by: Tahmasbi, Amir, et al.
Published: (2025)
by: Tahmasbi, Amir, et al.
Published: (2025)
MIRIX: Multi-Agent Memory System for LLM-Based Agents
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
Exploring Reasoning Reward Model for Agents
by: Fan, Kaixuan, et al.
Published: (2026)
by: Fan, Kaixuan, et al.
Published: (2026)
Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents
by: Turk, Matt
Published: (2026)
by: Turk, Matt
Published: (2026)
Causal-Counterfactual RAG: The Integration of Causal-Counterfactual Reasoning into RAG
by: Khadilkar, Harshad, et al.
Published: (2025)
by: Khadilkar, Harshad, et al.
Published: (2025)
Dissecting Failure Dynamics in Large Language Model Reasoning
by: Zhu, Wei, et al.
Published: (2026)
by: Zhu, Wei, et al.
Published: (2026)
Statler: State-Maintaining Language Models for Embodied Reasoning
by: Yoneda, Takuma, et al.
Published: (2023)
by: Yoneda, Takuma, et al.
Published: (2023)
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models
by: Lu, Hongyuan, et al.
Published: (2024)
by: Lu, Hongyuan, et al.
Published: (2024)
KnowGuard: Knowledge-Driven Abstention for Multi-Round Clinical Reasoning
by: Dang, Xilin, et al.
Published: (2025)
by: Dang, Xilin, et al.
Published: (2025)
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
by: Chen, Justin Chih-Yao, et al.
Published: (2024)
by: Chen, Justin Chih-Yao, et al.
Published: (2024)
ClinicalAgents: Multi-Agent Orchestration for Clinical Decision Making with Dual-Memory
by: Ge, Zhuohan, et al.
Published: (2026)
by: Ge, Zhuohan, et al.
Published: (2026)
STRIVE: A Think & Improve Approach with Iterative Refinement for Enhancing Question Quality Estimation
by: Deroy, Aniket, et al.
Published: (2025)
by: Deroy, Aniket, et al.
Published: (2025)
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
by: Ruan, Zhiwen, et al.
Published: (2025)
by: Ruan, Zhiwen, et al.
Published: (2025)
A Survey on Large Language Models for Automated Planning
by: Aghzal, Mohamed, et al.
Published: (2025)
by: Aghzal, Mohamed, et al.
Published: (2025)
Similar Items
-
Executable Counterfactuals: Improving LLMs' Causal Reasoning Through Code
by: Vashishtha, Aniket, et al.
Published: (2025) -
Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent
by: Li, Bingxuan, et al.
Published: (2026) -
MedConceal: A Benchmark for Clinical Hidden-Concern Reasoning Under Partial Observability
by: Han, Yikun, et al.
Published: (2026) -
PlainQAFact: Retrieval-augmented Factual Consistency Evaluation Metric for Biomedical Plain Language Summarization
by: You, Zhiwen, et al.
Published: (2025) -
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning
by: Aghzal, Mohamed, et al.
Published: (2023)