:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Demir, M. Mikail, Canbaz, M. Abdullah
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.17691
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

LegalGuardian: A Privacy-Preserving Framework for Secure Integration of Large Language Models in Legal Practice
by: Demir, M. Mikail, et al.
Published: (2025)

Heuristics and Biases in AI Decision-Making: Implications for Responsible AGI
by: Saeedi, Payam, et al.
Published: (2024)

LLM-Assisted Crisis Management: Building Advanced LLM Platforms for Effective Emergency Response and Public Collaboration
by: Otal, Hakan T., et al.
Published: (2024)

LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems
by: Otal, Hakan T., et al.
Published: (2024)

Precedent-Informed Reasoning: Mitigating Overthinking in Large Reasoning Models via Test-Time Precedent Learning
by: Wang, Qianyue, et al.
Published: (2026)

Flick: Few Labels Text Classification using K-Aware Intermediate Learning in Multi-Task Low-Resource Languages
by: Almutairi, Ali, et al.
Published: (2025)

In-Context Learning for Extreme Multi-Label Classification
by: D'Oosterlinck, Karel, et al.
Published: (2024)

Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels
by: Jia, Zixia, et al.
Published: (2024)

Is Your LLM Really Mastering the Concept? A Multi-Agent Benchmark
by: Xu, Shuhang, et al.
Published: (2025)

Multi-Label Clinical Text Eligibility Classification and Summarization System
by: Yerramsetty, Surya Tejaswi, et al.
Published: (2025)

The Right Model for the Job: An Evaluation of Legal Multi-Label Classification Baselines
by: Forster, Martina, et al.
Published: (2024)

Are Your LLMs Capable of Stable Reasoning?
by: Liu, Junnan, et al.
Published: (2024)

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
by: Zhang, Ming, et al.
Published: (2025)

Modeling Bias Evolution in Fashion Recommender Systems: A System Dynamics Approach
by: Goodarzi, Mahsa, et al.
Published: (2025)

Belief in Authority: Impact of Authority in Multi-Agent Evaluation Framework
by: Choi, Junhyuk, et al.
Published: (2026)

Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim $\rightarrow$ Evidence Reasoning
by: Javaji, Shashidhar Reddy, et al.
Published: (2025)

Hierarchical Multi-Label Classification of Online Vaccine Concerns
by: Zhu, Chloe Qinyu, et al.
Published: (2024)

Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
by: Han, Xueran, et al.
Published: (2025)

PsychiatryBench: A Multi-Task Benchmark for LLMs in Psychiatry
by: Fouda, Aya E., et al.
Published: (2025)

Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?
by: Mavi, John, et al.
Published: (2024)

Your AI, Not Your View: The Bias of LLMs in Investment Analysis
by: Lee, Hoyoung, et al.
Published: (2025)

Do LLMs Truly Understand When a Precedent Is Overruled?
by: Zhang, Li, et al.
Published: (2025)

Benchmarking LLMs for Pairwise Causal Discovery in Biomedical and Multi-Domain Contexts
by: Anuyah, Sydney, et al.
Published: (2026)

Instances and Labels: Hierarchy-aware Joint Supervised Contrastive Learning for Hierarchical Multi-Label Text Classification
by: Yu, Simon, et al.
Published: (2023)

Do LLMs Recognize Your Latent Preferences? A Benchmark for Latent Information Discovery in Personalized Interaction
by: Tsaknakis, Ioannis, et al.
Published: (2025)

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
by: Sirdeshmukh, Ved, et al.
Published: (2025)

DKEC: Domain Knowledge Enhanced Multi-Label Classification for Diagnosis Prediction
by: Ge, Xueren, et al.
Published: (2023)

Protecting Your LLMs with Information Bottleneck
by: Liu, Zichuan, et al.
Published: (2024)

MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs
by: Fabbri, Alexander R., et al.
Published: (2025)

Syntriever: How to Train Your Retriever with Synthetic Data from LLMs
by: Kim, Minsang, et al.
Published: (2025)

This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs
by: Wolf, Lorenz, et al.
Published: (2025)

One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification
by: Van Nooten, Jens, et al.
Published: (2025)

Label Distribution Learning-Enhanced Dual-KNN for Text Classification
by: Yuan, Bo, et al.
Published: (2025)

Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs
by: Zi, Xing, et al.
Published: (2026)

GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving
by: Zhang, Jiaxin, et al.
Published: (2024)

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA
by: Wang, Minzheng, et al.
Published: (2024)

TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
by: Wei, Shaohang, et al.
Published: (2025)

Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs
by: Arabelly, Abhinav, et al.
Published: (2025)

Do LLMs Agree on the Creativity Evaluation of Alternative Uses?
by: Rabeyah, Abdullah Al, et al.
Published: (2024)

XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
by: Kabir, Mohsinul, et al.
Published: (2026)