Saved in:
| Main Authors: | Busch, Kiran, Leopold, Henrik |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.03255 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
xSemAD: Explainable Semantic Anomaly Detection in Event Logs Using Sequence-to-Sequence Models
by: Busch, Kiran, et al.
Published: (2024)
by: Busch, Kiran, et al.
Published: (2024)
A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks
by: Jahan, Israt, et al.
Published: (2023)
by: Jahan, Israt, et al.
Published: (2023)
TaskBench: Benchmarking Large Language Models for Task Automation
by: Shen, Yongliang, et al.
Published: (2023)
by: Shen, Yongliang, et al.
Published: (2023)
Large Language Model Benchmarks in Medical Tasks
by: Yan, Lawrence K. Q., et al.
Published: (2024)
by: Yan, Lawrence K. Q., et al.
Published: (2024)
Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting
by: Jiang, Jiyue, et al.
Published: (2025)
by: Jiang, Jiyue, et al.
Published: (2025)
Benchmarking Open-Source Large Language Models on Healthcare Text Classification Tasks
by: Guo, Yuting, et al.
Published: (2025)
by: Guo, Yuting, et al.
Published: (2025)
Toward a Benchmark for Controllable Simulation of Imperfect Students with Large Language Models
by: Apartsin, Alexander, et al.
Published: (2026)
by: Apartsin, Alexander, et al.
Published: (2026)
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding
by: KJ, Sankalp, et al.
Published: (2025)
by: KJ, Sankalp, et al.
Published: (2025)
Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts
by: Lee, Rhui Dih, et al.
Published: (2024)
by: Lee, Rhui Dih, et al.
Published: (2024)
LTLBench: Towards Benchmarks for Evaluating Temporal Reasoning in Large Language Models
by: Tang, Weizhi, et al.
Published: (2024)
by: Tang, Weizhi, et al.
Published: (2024)
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
by: Liu, Junlin, et al.
Published: (2026)
by: Liu, Junlin, et al.
Published: (2026)
Contextual Attention Modulation: Towards Efficient Multi-Task Adaptation in Large Language Models
by: Pan, Dayan, et al.
Published: (2025)
by: Pan, Dayan, et al.
Published: (2025)
Codenames as a Benchmark for Large Language Models
by: Stephenson, Matthew, et al.
Published: (2024)
by: Stephenson, Matthew, et al.
Published: (2024)
MedArabiQ: Benchmarking Large Language Models on Arabic Medical Tasks
by: Daoud, Mouath Abu, et al.
Published: (2025)
by: Daoud, Mouath Abu, et al.
Published: (2025)
MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
by: Tang, Zecheng, et al.
Published: (2026)
by: Tang, Zecheng, et al.
Published: (2026)
Large Language Models aren't all that you need
by: Holla, Kiran Voderhobli, et al.
Published: (2024)
by: Holla, Kiran Voderhobli, et al.
Published: (2024)
Evaluating Large Language Models on the GMAT: Implications for the Future of Business Education
by: Ashrafimoghari, Vahid, et al.
Published: (2024)
by: Ashrafimoghari, Vahid, et al.
Published: (2024)
Benchmarking the Medical Understanding and Reasoning of Large Language Models in Arabic Healthcare Tasks
by: AlDahoul, Nouar, et al.
Published: (2025)
by: AlDahoul, Nouar, et al.
Published: (2025)
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
by: Zhao, Justin, et al.
Published: (2024)
by: Zhao, Justin, et al.
Published: (2024)
Harnessing Business and Media Insights with Large Language Models
by: Bao, Yujia, et al.
Published: (2024)
by: Bao, Yujia, et al.
Published: (2024)
Scan-do Attitude: Towards Autonomous CT Protocol Management using a Large Language Model Agent
by: Kang, Xingjian, et al.
Published: (2025)
by: Kang, Xingjian, et al.
Published: (2025)
Benchmarking Distributional Alignment of Large Language Models
by: Meister, Nicole, et al.
Published: (2024)
by: Meister, Nicole, et al.
Published: (2024)
Benchmarking the Pedagogical Knowledge of Large Language Models
by: Lelièvre, Maxime, et al.
Published: (2025)
by: Lelièvre, Maxime, et al.
Published: (2025)
BELL: Benchmarking the Explainability of Large Language Models
by: Ahmed, Syed Quiser, et al.
Published: (2025)
by: Ahmed, Syed Quiser, et al.
Published: (2025)
Task-Aligned Tool Recommendation for Large Language Models
by: Gao, Hang, et al.
Published: (2024)
by: Gao, Hang, et al.
Published: (2024)
Reasoning Capabilities of Large Language Models on Dynamic Tasks
by: Wong, Annie, et al.
Published: (2025)
by: Wong, Annie, et al.
Published: (2025)
Evaluating Ill-Defined Tasks in Large Language Models
by: Zhou, Yi, et al.
Published: (2026)
by: Zhou, Yi, et al.
Published: (2026)
Impact of Task Phrasing on Presumptions in Large Language Models
by: Ong, Kenneth J. K.
Published: (2026)
by: Ong, Kenneth J. K.
Published: (2026)
Towards Atoms of Large Language Models
by: Hu, Chenhui, et al.
Published: (2025)
by: Hu, Chenhui, et al.
Published: (2025)
Parrot Mind: Towards Explaining the Complex Task Reasoning of Pretrained Large Language Models with Template-Content Structure
by: Yang, Haotong, et al.
Published: (2023)
by: Yang, Haotong, et al.
Published: (2023)
Towards a Personal Health Large Language Model
by: Cosentino, Justin, et al.
Published: (2024)
by: Cosentino, Justin, et al.
Published: (2024)
An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks
by: Suresh, Varsha, et al.
Published: (2024)
by: Suresh, Varsha, et al.
Published: (2024)
Large Language Models for Extrapolative Modeling of Manufacturing Processes
by: Khanghah, Kiarash Naghavi, et al.
Published: (2025)
by: Khanghah, Kiarash Naghavi, et al.
Published: (2025)
Benchmarking Benchmark Leakage in Large Language Models
by: Xu, Ruijie, et al.
Published: (2024)
by: Xu, Ruijie, et al.
Published: (2024)
E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task
by: Liu, Jingyao, et al.
Published: (2025)
by: Liu, Jingyao, et al.
Published: (2025)
Large Language Models in the Clinic: A Comprehensive Benchmark
by: Liu, Fenglin, et al.
Published: (2024)
by: Liu, Fenglin, et al.
Published: (2024)
BeHonest: Benchmarking Honesty in Large Language Models
by: Chern, Steffi, et al.
Published: (2024)
by: Chern, Steffi, et al.
Published: (2024)
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
Do Large Language Models Mirror Cognitive Language Processing?
by: Ren, Yuqi, et al.
Published: (2024)
by: Ren, Yuqi, et al.
Published: (2024)
Evaluating the Performance of Large Language Models on GAOKAO Benchmark
by: Zhang, Xiaotian, et al.
Published: (2023)
by: Zhang, Xiaotian, et al.
Published: (2023)
Similar Items
-
xSemAD: Explainable Semantic Anomaly Detection in Event Logs Using Sequence-to-Sequence Models
by: Busch, Kiran, et al.
Published: (2024) -
A Comprehensive Evaluation of Large Language Models on Benchmark Biomedical Text Processing Tasks
by: Jahan, Israt, et al.
Published: (2023) -
TaskBench: Benchmarking Large Language Models for Task Automation
by: Shen, Yongliang, et al.
Published: (2023) -
Large Language Model Benchmarks in Medical Tasks
by: Yan, Lawrence K. Q., et al.
Published: (2024) -
Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting
by: Jiang, Jiyue, et al.
Published: (2025)