Saved in:
| Main Authors: | Li, Yahan, Harrigian, Keith, Zirikly, Ayah, Dredze, Mark |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.05845 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Give me Some Hard Questions: Synthetic Data Generation for Clinical QA
by: Bai, Fan, et al.
Published: (2024)
by: Bai, Fan, et al.
Published: (2024)
CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic
by: Mankarious, Saad, et al.
Published: (2025)
by: Mankarious, Saad, et al.
Published: (2025)
Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework
by: Drenkow, Nathan, et al.
Published: (2025)
by: Drenkow, Nathan, et al.
Published: (2025)
Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic
by: Mankarious, Saad, et al.
Published: (2026)
by: Mankarious, Saad, et al.
Published: (2026)
MindSET: Advancing Mental Health Benchmarking through Large-Scale Social Media Data
by: Mankarious, Saad, et al.
Published: (2025)
by: Mankarious, Saad, et al.
Published: (2025)
LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition
by: Bai, Fan, et al.
Published: (2025)
by: Bai, Fan, et al.
Published: (2025)
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
by: Wanner, Miriam, et al.
Published: (2024)
by: Wanner, Miriam, et al.
Published: (2024)
Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models
by: Sun, Kaiser, et al.
Published: (2024)
by: Sun, Kaiser, et al.
Published: (2024)
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
by: An, Bang, et al.
Published: (2025)
by: An, Bang, et al.
Published: (2025)
Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making
by: Huang, Jen-tse, et al.
Published: (2026)
by: Huang, Jen-tse, et al.
Published: (2026)
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
by: DeLucia, Alexandra, et al.
Published: (2025)
by: DeLucia, Alexandra, et al.
Published: (2025)
Evaluating the Evaluators: Are readability metrics good measures of readability?
by: Cachola, Isabel, et al.
Published: (2025)
by: Cachola, Isabel, et al.
Published: (2025)
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
by: Chen, Hanjie, et al.
Published: (2024)
by: Chen, Hanjie, et al.
Published: (2024)
MedExAgent: Training LLM Agents to Ask, Examine, and Diagnose in Noisy Clinical Environments
by: Gao, Yicheng, et al.
Published: (2026)
by: Gao, Yicheng, et al.
Published: (2026)
Evaluating Implicit Biases in LLM Reasoning through Logic Grid Puzzles
by: Jahara, Fatima, et al.
Published: (2025)
by: Jahara, Fatima, et al.
Published: (2025)
Task Matters: Knowledge Requirements Shape LLM Responses to Context-Memory Conflict
by: Sun, Kaiser, et al.
Published: (2025)
by: Sun, Kaiser, et al.
Published: (2025)
On the Failure of Latent State Persistence in Large Language Models
by: Huang, Jen-tse, et al.
Published: (2025)
by: Huang, Jen-tse, et al.
Published: (2025)
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
by: Sun, Kaiser, et al.
Published: (2026)
by: Sun, Kaiser, et al.
Published: (2026)
Making FETCH! Happen: Finding Emergent Dog Whistles Through Common Habitats
by: Sasse, Kuleen, et al.
Published: (2024)
by: Sasse, Kuleen, et al.
Published: (2024)
MedScore: Generalizable Factuality Evaluation of Free-Form Medical Answers by Domain-adapted Claim Decomposition and Verification
by: Huang, Heyuan, et al.
Published: (2025)
by: Huang, Heyuan, et al.
Published: (2025)
XTRUST: On the Multilingual Trustworthiness of Large Language Models
by: Li, Yahan, et al.
Published: (2024)
by: Li, Yahan, et al.
Published: (2024)
Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts
by: Levy, Sharon, et al.
Published: (2024)
by: Levy, Sharon, et al.
Published: (2024)
Large Language Model Evaluation via Matrix Nuclear-Norm
by: Li, Yahan, et al.
Published: (2024)
by: Li, Yahan, et al.
Published: (2024)
Mirroring Minds: Asymmetric Linguistic Accommodation and Diagnostic Identity in ADHD and Autism Reddit Communities
by: Mankarious, Saad, et al.
Published: (2026)
by: Mankarious, Saad, et al.
Published: (2026)
A Closer Look at Claim Decomposition
by: Wanner, Miriam, et al.
Published: (2024)
by: Wanner, Miriam, et al.
Published: (2024)
Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation
by: Huang, Jen-tse, et al.
Published: (2026)
by: Huang, Jen-tse, et al.
Published: (2026)
A New NMT Model for Translating Clinical Texts from English to Spanish
by: Li, Rumeng, et al.
Published: (2025)
by: Li, Rumeng, et al.
Published: (2025)
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
by: Tu, Yahan, et al.
Published: (2024)
by: Tu, Yahan, et al.
Published: (2024)
Revisiting Anthropomorphic Reflection Markers in Large Language Model Reasoning
by: Yu, Yahan, et al.
Published: (2026)
by: Yu, Yahan, et al.
Published: (2026)
Schema-Driven Information Extraction from Heterogeneous Tables
by: Bai, Fan, et al.
Published: (2023)
by: Bai, Fan, et al.
Published: (2023)
L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification
by: Kondadadi, Rishik, et al.
Published: (2026)
by: Kondadadi, Rishik, et al.
Published: (2026)
Challenges in Explaining Pretrained Clinical Text Classifiers
by: Miok, Kristian, et al.
Published: (2026)
by: Miok, Kristian, et al.
Published: (2026)
Evaluating Biases in Context-Dependent Health Questions
by: Levy, Sharon, et al.
Published: (2024)
by: Levy, Sharon, et al.
Published: (2024)
Weird Generalization is Weirdly Brittle
by: Wanner, Miriam, et al.
Published: (2026)
by: Wanner, Miriam, et al.
Published: (2026)
ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes
by: Yang, Zhichao, et al.
Published: (2024)
by: Yang, Zhichao, et al.
Published: (2024)
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL
by: Shen, Yifei, et al.
Published: (2026)
by: Shen, Yifei, et al.
Published: (2026)
Knowing But Not Doing: Convergent Morality and Divergent Action in LLMs
by: Huang, Jen-tse, et al.
Published: (2026)
by: Huang, Jen-tse, et al.
Published: (2026)
Bridging Electronic Health Records and Clinical Texts: Contrastive Learning for Enhanced Clinical Tasks
by: Ketabi, Sara, et al.
Published: (2025)
by: Ketabi, Sara, et al.
Published: (2025)
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
by: Zhang, Shiyue, et al.
Published: (2023)
by: Zhang, Shiyue, et al.
Published: (2023)
Extracting Patient History from Clinical Text: A Comparative Study of Clinical Large Language Models
by: Nghiem, Hieu, et al.
Published: (2025)
by: Nghiem, Hieu, et al.
Published: (2025)
Similar Items
-
Give me Some Hard Questions: Synthetic Data Generation for Clinical QA
by: Bai, Fan, et al.
Published: (2024) -
CARMA: Comprehensive Automatically-annotated Reddit Mental Health Dataset for Arabic
by: Mankarious, Saad, et al.
Published: (2025) -
Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework
by: Drenkow, Nathan, et al.
Published: (2025) -
Style Transfer as Bias Mitigation: Diffusion Models for Synthetic Mental Health Text for Arabic
by: Mankarious, Saad, et al.
Published: (2026) -
MindSET: Advancing Mental Health Benchmarking through Large-Scale Social Media Data
by: Mankarious, Saad, et al.
Published: (2025)