Uloženo v:
Podrobná bibliografie
Hlavní autor: SOVEREIGN Research Kernel
Médium: Recurso digital
Jazyk:angličtina
Vydáno: Zenodo 2026
Témata:
On-line přístup:https://doi.org/10.5281/zenodo.20433629
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • <p>Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including</p><p><strong>Research goal:</strong> What is the inference efficiency tradeoff between SMoES and hard-routing MoE approaches when evaluated on language model reasoning tasks across varying input modalities?</p><p><em>Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.</em></p>