Saved in:
| Main Authors: | Li, Yahan, Yao, Jifan, Bunyi, John Bosco S., Frank, Adam C., Hwang, Angel Hsing-Chi, Liu, Ruishan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.08584 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CounselReflect: A Toolkit for Auditing Mental-Health Dialogues
by: Li, Yahan, et al.
Published: (2026)
by: Li, Yahan, et al.
Published: (2026)
MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval
by: Chen, Shuheng, et al.
Published: (2026)
by: Chen, Shuheng, et al.
Published: (2026)
MedAraBench: Large-Scale Arabic Medical Question Answering Dataset and Benchmark
by: Abu-Daoud, Mouath, et al.
Published: (2026)
by: Abu-Daoud, Mouath, et al.
Published: (2026)
A Large-Scale Benchmark for Evaluating Large Language Models on Medical Question Answering in Romanian
by: Rogoz, Ana-Cristina, et al.
Published: (2025)
by: Rogoz, Ana-Cristina, et al.
Published: (2025)
On the Calibration of Multilingual Question Answering LLMs
by: Yang, Yahan, et al.
Published: (2023)
by: Yang, Yahan, et al.
Published: (2023)
"It was 80% me, 20% AI": Seeking Authenticity in Co-Writing with Large Language Models
by: Hwang, Angel Hsing-Chi, et al.
Published: (2024)
by: Hwang, Angel Hsing-Chi, et al.
Published: (2024)
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models
by: Li, Linyi, et al.
Published: (2024)
by: Li, Linyi, et al.
Published: (2024)
Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study
by: Adhikary, Prottay Kumar, et al.
Published: (2024)
by: Adhikary, Prottay Kumar, et al.
Published: (2024)
ChineseVideoBench: Benchmarking Multi-modal Large Models for Chinese Video Question Answering
by: Nie, Yuxiang, et al.
Published: (2025)
by: Nie, Yuxiang, et al.
Published: (2025)
MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering
by: Guan, Shaowei, et al.
Published: (2026)
by: Guan, Shaowei, et al.
Published: (2026)
TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health
by: Xiong, Zixin, et al.
Published: (2026)
by: Xiong, Zixin, et al.
Published: (2026)
Evaluating Search Engines and Large Language Models for Answering Health Questions
by: Fernández-Pichel, Marcos, et al.
Published: (2024)
by: Fernández-Pichel, Marcos, et al.
Published: (2024)
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
by: Badawi, Abeer, et al.
Published: (2025)
by: Badawi, Abeer, et al.
Published: (2025)
An Expert Schema for Evaluating Large Language Model Errors in Scholarly Question-Answering Systems
by: Martin-Boyle, Anna, et al.
Published: (2026)
by: Martin-Boyle, Anna, et al.
Published: (2026)
PsychCounsel-Bench: Evaluating the Psychology Intelligence of Large Language Models
by: Zeng, Min
Published: (2025)
by: Zeng, Min
Published: (2025)
MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments
by: Gao, Yicheng, et al.
Published: (2026)
by: Gao, Yicheng, et al.
Published: (2026)
Do Large Language Models Align with Core Mental Health Counseling Competencies?
by: Nguyen, Viet Cuong, et al.
Published: (2024)
by: Nguyen, Viet Cuong, et al.
Published: (2024)
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models
by: Tu, Shangqing, et al.
Published: (2023)
by: Tu, Shangqing, et al.
Published: (2023)
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
by: Choi, Chanyeol, et al.
Published: (2025)
by: Choi, Chanyeol, et al.
Published: (2025)
CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological Counseling
by: Wang, Bichen, et al.
Published: (2025)
by: Wang, Bichen, et al.
Published: (2025)
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models
by: Zhu, Andrew, et al.
Published: (2024)
by: Zhu, Andrew, et al.
Published: (2024)
Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents
by: Wu, Shiwei, et al.
Published: (2024)
by: Wu, Shiwei, et al.
Published: (2024)
Comprehensive Evaluation for a Large Scale Knowledge Graph Question Answering Service
by: Potdar, Saloni, et al.
Published: (2025)
by: Potdar, Saloni, et al.
Published: (2025)
RECAP: Resistance Capture in Text-based Mental Health Counseling with Large Language Models
by: Li, Anqi, et al.
Published: (2026)
by: Li, Anqi, et al.
Published: (2026)
MHGraphBench: Knowledge Graph-Grounded Benchmarking of Mental Health Knowledge in Large Language Models
by: Liu, Weixin, et al.
Published: (2026)
by: Liu, Weixin, et al.
Published: (2026)
RephQA: Evaluating Readability of Large Language Models in Public Health Question Answering
by: Qiu, Weikang, et al.
Published: (2025)
by: Qiu, Weikang, et al.
Published: (2025)
Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports
by: Li, Haopeng, et al.
Published: (2024)
by: Li, Haopeng, et al.
Published: (2024)
NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance
by: Eyasir, Abrar, et al.
Published: (2026)
by: Eyasir, Abrar, et al.
Published: (2026)
Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling
by: Hu, He, et al.
Published: (2025)
by: Hu, He, et al.
Published: (2025)
CBT-LLM: A Chinese Large Language Model for Cognitive Behavioral Therapy-based Mental Health Question Answering
by: Na, Hongbin
Published: (2024)
by: Na, Hongbin
Published: (2024)
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
by: Wu, Xianjie, et al.
Published: (2024)
by: Wu, Xianjie, et al.
Published: (2024)
Policy or Community?: Supporting Individual Model Creators' Open Model Development in Model Marketplaces
by: Kang, Eun Jeong, et al.
Published: (2026)
by: Kang, Eun Jeong, et al.
Published: (2026)
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
by: Tu, Yahan, et al.
Published: (2024)
by: Tu, Yahan, et al.
Published: (2024)
Cancer-Myth: Evaluating Large Language Models on Patient Questions with False Presuppositions
by: Zhu, Wang Bill, et al.
Published: (2025)
by: Zhu, Wang Bill, et al.
Published: (2025)
Evaluating Monolingual and Multilingual Large Language Models for Greek Question Answering: The DemosQA Benchmark
by: Mastrokostas, Charalampos, et al.
Published: (2026)
by: Mastrokostas, Charalampos, et al.
Published: (2026)
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift
by: Li, Lin, et al.
Published: (2023)
by: Li, Lin, et al.
Published: (2023)
PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics
by: Shen, Yaling, et al.
Published: (2026)
by: Shen, Yaling, et al.
Published: (2026)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
by: Chen, Hanjie, et al.
Published: (2024)
by: Chen, Hanjie, et al.
Published: (2024)
HalluScore: Large Language Model Hallucination Question Answering Benchmark
by: Alansari, Aisha, et al.
Published: (2026)
by: Alansari, Aisha, et al.
Published: (2026)
Beyond Idealized Patients: Evaluating LLMs under Challenging Patient Behaviors in Medical Consultations
by: Li, Yahan, et al.
Published: (2026)
by: Li, Yahan, et al.
Published: (2026)
Similar Items
-
CounselReflect: A Toolkit for Auditing Mental-Health Dialogues
by: Li, Yahan, et al.
Published: (2026) -
MED-COPILOT: A Medical Assistant Powered by GraphRAG and Similar Patient Case Retrieval
by: Chen, Shuheng, et al.
Published: (2026) -
MedAraBench: Large-Scale Arabic Medical Question Answering Dataset and Benchmark
by: Abu-Daoud, Mouath, et al.
Published: (2026) -
A Large-Scale Benchmark for Evaluating Large Language Models on Medical Question Answering in Romanian
by: Rogoz, Ana-Cristina, et al.
Published: (2025) -
On the Calibration of Multilingual Question Answering LLMs
by: Yang, Yahan, et al.
Published: (2023)