Saved in:
| Main Authors: | Li, Xiaomin, Gao, Mingye, Hao, Yuexing, Li, Taoran, Wan, Guangya, Wang, Zihan, Wang, Yijun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.11613 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Data-adaptive Safety Rules for Training Reward Models
by: Li, Xiaomin, et al.
Published: (2025)
by: Li, Xiaomin, et al.
Published: (2025)
Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning
by: Wan, Guangya, et al.
Published: (2024)
by: Wan, Guangya, et al.
Published: (2024)
Large Language Models for Causal Discovery: Current Landscape and Future Directions
by: Wan, Guangya, et al.
Published: (2024)
by: Wan, Guangya, et al.
Published: (2024)
ENCORE: Entropy-guided Reward Composition for Multi-head Safety Reward Models
by: Li, Xiaomin, et al.
Published: (2025)
by: Li, Xiaomin, et al.
Published: (2025)
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
by: Lyu, Yougang, et al.
Published: (2025)
by: Lyu, Yougang, et al.
Published: (2025)
CliMedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models in Clinical Scenarios
by: Ouyang, Zetian, et al.
Published: (2024)
by: Ouyang, Zetian, et al.
Published: (2024)
Selection of LLM Fine-Tuning Data based on Orthogonal Rules
by: Li, Xiaomin, et al.
Published: (2024)
by: Li, Xiaomin, et al.
Published: (2024)
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
by: Khoshnoodi, Mahsa, et al.
Published: (2024)
by: Khoshnoodi, Mahsa, et al.
Published: (2024)
Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making
by: Huang, Jen-tse, et al.
Published: (2026)
by: Huang, Jen-tse, et al.
Published: (2026)
ClinicalAgents: Multi-Agent Orchestration for Clinical Decision Making with Dual-Memory
by: Ge, Zhuohan, et al.
Published: (2026)
by: Ge, Zhuohan, et al.
Published: (2026)
CliBench: A Multifaceted and Multigranular Evaluation of Large Language Models for Clinical Decision Making
by: Ma, Mingyu Derek, et al.
Published: (2024)
by: Ma, Mingyu Derek, et al.
Published: (2024)
EEG-MedRAG: Enhancing EEG-based Clinical Decision-Making via Hierarchical Hypergraph Retrieval-Augmented Generation
by: Wang, Yi, et al.
Published: (2025)
by: Wang, Yi, et al.
Published: (2025)
Beyond MedQA: Towards Real-world Clinical Decision Making in the Era of LLMs
by: Xiao, Yunpeng, et al.
Published: (2025)
by: Xiao, Yunpeng, et al.
Published: (2025)
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models
by: Pan, Jiabao, et al.
Published: (2024)
by: Pan, Jiabao, et al.
Published: (2024)
Reasoning Aware Self-Consistency: Leveraging Reasoning Paths for Efficient LLM Sampling
by: Wan, Guangya, et al.
Published: (2024)
by: Wan, Guangya, et al.
Published: (2024)
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models
by: Ding, Meidan, et al.
Published: (2025)
by: Ding, Meidan, et al.
Published: (2025)
High-Fidelity Pruning for Large Language Models
by: Zhu, Yijun, et al.
Published: (2026)
by: Zhu, Yijun, et al.
Published: (2026)
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
by: Yu, Guangya, et al.
Published: (2025)
by: Yu, Guangya, et al.
Published: (2025)
Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making
by: Zhang, Qingyuan, et al.
Published: (2025)
by: Zhang, Qingyuan, et al.
Published: (2025)
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
by: Liu, Mianxin, et al.
Published: (2024)
by: Liu, Mianxin, et al.
Published: (2024)
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making
by: M3 Team, et al.
Published: (2026)
by: M3 Team, et al.
Published: (2026)
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
by: Wang, Yuxuan, et al.
Published: (2024)
by: Wang, Yuxuan, et al.
Published: (2024)
GAIN: A Benchmark for Goal-Aligned Decision-Making of Large Language Models under Imperfect Norms
by: Kawarada, Masayuki, et al.
Published: (2026)
by: Kawarada, Masayuki, et al.
Published: (2026)
A Large-Scale Simulation on Large Language Models for Decision-Making in Political Science
by: Yu, Chenxiao, et al.
Published: (2024)
by: Yu, Chenxiao, et al.
Published: (2024)
MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making
by: Tam, Zhi Rui, et al.
Published: (2025)
by: Tam, Zhi Rui, et al.
Published: (2025)
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
by: Gallifant, Jack, et al.
Published: (2024)
by: Gallifant, Jack, et al.
Published: (2024)
UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models
by: Qin, Zhanyue, et al.
Published: (2024)
by: Qin, Zhanyue, et al.
Published: (2024)
Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent
by: Wang, Xiaofeng, et al.
Published: (2025)
by: Wang, Xiaofeng, et al.
Published: (2025)
Efficient Sequential Decision Making with Large Language Models
by: Chen, Dingyang, et al.
Published: (2024)
by: Chen, Dingyang, et al.
Published: (2024)
CLIMB: A Benchmark of Clinical Bias in Large Language Models
by: Zhang, Yubo, et al.
Published: (2024)
by: Zhang, Yubo, et al.
Published: (2024)
A Survey on Large Language Model Benchmarks
by: Ni, Shiwen, et al.
Published: (2025)
by: Ni, Shiwen, et al.
Published: (2025)
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
by: Pandit, Shrey, et al.
Published: (2025)
by: Pandit, Shrey, et al.
Published: (2025)
Large Language Models in the Clinic: A Comprehensive Benchmark
by: Liu, Fenglin, et al.
Published: (2024)
by: Liu, Fenglin, et al.
Published: (2024)
MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering
by: Alonso, Iñigo, et al.
Published: (2024)
by: Alonso, Iñigo, et al.
Published: (2024)
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
by: Tang, Zecheng, et al.
Published: (2026)
by: Tang, Zecheng, et al.
Published: (2026)
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks
by: Daoud, Mouath Abu, et al.
Published: (2025)
by: Daoud, Mouath Abu, et al.
Published: (2025)
Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice Guidelines
by: Oniani, David, et al.
Published: (2024)
by: Oniani, David, et al.
Published: (2024)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
by: Li, Manling, et al.
Published: (2024)
by: Li, Manling, et al.
Published: (2024)
Mil-SCORE: Benchmarking Long-Context Geospatial Reasoning and Planning in Large Language Models
by: Palnitkar, Aadi, et al.
Published: (2026)
by: Palnitkar, Aadi, et al.
Published: (2026)
Counterfactual Probing for Hallucination Detection and Mitigation in Large Language Models
by: Feng, Yijun
Published: (2025)
by: Feng, Yijun
Published: (2025)
Similar Items
-
Data-adaptive Safety Rules for Training Reward Models
by: Li, Xiaomin, et al.
Published: (2025) -
Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning
by: Wan, Guangya, et al.
Published: (2024) -
Large Language Models for Causal Discovery: Current Landscape and Future Directions
by: Wan, Guangya, et al.
Published: (2024) -
ENCORE: Entropy-guided Reward Composition for Multi-head Safety Reward Models
by: Li, Xiaomin, et al.
Published: (2025) -
Self-Adaptive Cognitive Debiasing for Large Language Models in Decision-Making
by: Lyu, Yougang, et al.
Published: (2025)