Saved in:
| Main Authors: | Han, Yikun, Chan, Joey, Chen, Jingyuan, Ai, Mengting, Du, Simo, Guo, Yue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.08788 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent
by: Li, Bingxuan, et al.
Published: (2026)
by: Li, Bingxuan, et al.
Published: (2026)
Belief Memory: Agent Memory Under Partial Observability
by: Liao, Junfeng, et al.
Published: (2026)
by: Liao, Junfeng, et al.
Published: (2026)
Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning
by: You, Zhiwen, et al.
Published: (2026)
by: You, Zhiwen, et al.
Published: (2026)
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?
by: Chan, Joey, et al.
Published: (2026)
by: Chan, Joey, et al.
Published: (2026)
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
by: Jin, Congyun, et al.
Published: (2024)
by: Jin, Congyun, et al.
Published: (2024)
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
by: Yu, Suhao, et al.
Published: (2025)
by: Yu, Suhao, et al.
Published: (2025)
MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts
by: Iwase, Naoto, et al.
Published: (2025)
by: Iwase, Naoto, et al.
Published: (2025)
Temporal Predictors of Outcome in Reasoning Language Models
by: David, Joey
Published: (2025)
by: David, Joey
Published: (2025)
MedHopQA: A Disease-Centered Multi-Hop Reasoning Benchmark and Evaluation Framework for LLM-Based Biomedical Question Answering
by: Islamaj, Rezarta, et al.
Published: (2026)
by: Islamaj, Rezarta, et al.
Published: (2026)
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
by: Li, Xiaomin, et al.
Published: (2025)
by: Li, Xiaomin, et al.
Published: (2025)
MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models
by: Ai, Mengting, et al.
Published: (2023)
by: Ai, Mengting, et al.
Published: (2023)
GeoReasoner: Reasoning On Geospatially Grounded Context For Natural Language Understanding
by: Yan, Yibo, et al.
Published: (2024)
by: Yan, Yibo, et al.
Published: (2024)
ORBIT -- Open Recommendation Benchmark for Reproducible Research with Hidden Tests
by: He, Jingyuan, et al.
Published: (2025)
by: He, Jingyuan, et al.
Published: (2025)
HiMed: Incentivizing Hindi Reasoning in Medical LLMs
by: Jiang, Dingfeng, et al.
Published: (2026)
by: Jiang, Dingfeng, et al.
Published: (2026)
Iterative Formalization and Planning in Partially Observable Environments
by: Gong, Liancheng, et al.
Published: (2025)
by: Gong, Liancheng, et al.
Published: (2025)
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
by: Zuo, Yuxin, et al.
Published: (2025)
by: Zuo, Yuxin, et al.
Published: (2025)
Draw2Think: Harnessing Geometry Reasoning through Constraint Engine Interaction
by: Hu, Juncheng, et al.
Published: (2026)
by: Hu, Juncheng, et al.
Published: (2026)
MedPRMBench: A Fine-grained Benchmark for Process Reward Models in Medical Reasoning
by: Wu, Lingyan, et al.
Published: (2026)
by: Wu, Lingyan, et al.
Published: (2026)
Computational Reasoning of Large Language Models
by: Wu, Haitao, et al.
Published: (2025)
by: Wu, Haitao, et al.
Published: (2025)
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
by: Tang, Xiangru, et al.
Published: (2025)
by: Tang, Xiangru, et al.
Published: (2025)
AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning
by: Zou, Jiaru, et al.
Published: (2025)
by: Zou, Jiaru, et al.
Published: (2025)
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
When Abundance Conceals Weakness: Knowledge Conflict in Multilingual Models
by: Zhao, Jiaqi, et al.
Published: (2026)
by: Zhao, Jiaqi, et al.
Published: (2026)
Concealment of Intent: A Game-Theoretic Analysis
by: Wu, Xinbo, et al.
Published: (2025)
by: Wu, Xinbo, et al.
Published: (2025)
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
by: Zhang, Ming, et al.
Published: (2025)
by: Zhang, Ming, et al.
Published: (2025)
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios
by: Zhan, Weidong, et al.
Published: (2025)
by: Zhan, Weidong, et al.
Published: (2025)
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
by: Ling, Zhan, et al.
Published: (2025)
by: Ling, Zhan, et al.
Published: (2025)
MedQA-CS: Objective Structured Clinical Examination (OSCE)-Style Benchmark for Evaluating LLM Clinical Skills
by: Yao, Zonghai, et al.
Published: (2024)
by: Yao, Zonghai, et al.
Published: (2024)
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
by: She, Jingyuan Selena, et al.
Published: (2023)
by: She, Jingyuan Selena, et al.
Published: (2023)
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
by: Wang, Yikun, et al.
Published: (2023)
by: Wang, Yikun, et al.
Published: (2023)
ChronoMedKG: A Temporally-Grounded Biomedical Knowledge Graph and Benchmark for Clinical Reasoning
by: Ahmed, Md Shamim, et al.
Published: (2026)
by: Ahmed, Md Shamim, et al.
Published: (2026)
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
by: Wang, Jingyuan, et al.
Published: (2025)
by: Wang, Jingyuan, et al.
Published: (2025)
Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation
by: Tian, Yijun, et al.
Published: (2024)
by: Tian, Yijun, et al.
Published: (2024)
When Evidence Conflicts: Uncertainty and Order Effects in Retrieval-Augmented Biomedical Question Answering
by: Han, Yikun, et al.
Published: (2026)
by: Han, Yikun, et al.
Published: (2026)
Recommending Clinical Trials for Online Patient Cases using Artificial Intelligence
by: Chan, Joey, et al.
Published: (2025)
by: Chan, Joey, et al.
Published: (2025)
MedFactEval and MedAgentBrief: A Framework and Workflow for Generating and Evaluating Factual Clinical Summaries
by: Grolleau, François, et al.
Published: (2025)
by: Grolleau, François, et al.
Published: (2025)
Information Seeking for Robust Decision Making under Partial Observability
by: Fang, Djengo Cyun-Jyun, et al.
Published: (2025)
by: Fang, Djengo Cyun-Jyun, et al.
Published: (2025)
Med-CoReasoner: Reducing Language Disparities in Medical Reasoning via Language-Informed Co-Reasoning
by: Gao, Fan, et al.
Published: (2026)
by: Gao, Fan, et al.
Published: (2026)
ER-Reason: A Benchmark Dataset for LLM Clinical Reasoning in the Emergency Room
by: Mehandru, Nikita, et al.
Published: (2025)
by: Mehandru, Nikita, et al.
Published: (2025)
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
by: Chen, Shan, et al.
Published: (2025)
by: Chen, Shan, et al.
Published: (2025)
Similar Items
-
Joint Optimization of Reasoning and Dual-Memory for Self-Learning Diagnostic Agent
by: Li, Bingxuan, et al.
Published: (2026) -
Belief Memory: Agent Memory Under Partial Observability
by: Liao, Junfeng, et al.
Published: (2026) -
Improving Clinical Diagnosis with Counterfactual Multi-Agent Reasoning
by: You, Zhiwen, et al.
Published: (2026) -
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?
by: Chan, Joey, et al.
Published: (2026) -
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
by: Jin, Congyun, et al.
Published: (2024)