:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wullschleger, Pascal, Zarharan, Majid, Daly, Donnacha, Pouly, Marc, Foster, Jennifer
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computation and Language
Online-Zugang:	https://arxiv.org/abs/2505.11470
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

FoodTaxo: Generating Food Taxonomies with Large Language Models
von: Wullschleger, Pascal, et al.
Veröffentlicht: (2025)

Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models
von: Zarharan, Majid, et al.
Veröffentlicht: (2024)

FarExStance: Explainable Stance Detection for Farsi
von: Zarharan, Majid, et al.
Veröffentlicht: (2024)

Estimating Text Similarity based on Semantic Concept Embeddings
von: der Brück, Tim vor, et al.
Veröffentlicht: (2024)

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
von: Gigant, Théo, et al.
Veröffentlicht: (2024)

Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
von: Vasselli, Justin, et al.
Veröffentlicht: (2025)

A Taxonomy for Design and Evaluation of Prompt-Based Natural Language Explanations
von: Nejadgholi, Isar, et al.
Veröffentlicht: (2025)

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
von: Zheng, Danna, et al.
Veröffentlicht: (2024)

A Dual-Axis Taxonomy of Knowledge Editing for LLMs: From Mechanisms to Functions
von: Salehoof, Amir Mohammad, et al.
Veröffentlicht: (2025)

Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics
von: Hua, Yilun, et al.
Veröffentlicht: (2026)

BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization
von: Rafid, Ahmed, et al.
Veröffentlicht: (2026)

SCORE: Specificity, Context Utilization, Robustness, and Relevance for Reference-Free LLM Evaluation
von: Shomee, Homaira Huda, et al.
Veröffentlicht: (2026)

Evaluation Revisited: A Taxonomy of Evaluation Concerns in Natural Language Processing
von: Dhar, Ruchira, et al.
Veröffentlicht: (2026)

CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization
von: Gong, Ziwei, et al.
Veröffentlicht: (2024)

LITE: LLM-Impelled efficient Taxonomy Evaluation
von: Zhang, Lin, et al.
Veröffentlicht: (2025)

Monotonic Reference-Free Refinement for Autoformalization
von: Zhang, Lan, et al.
Veröffentlicht: (2026)

PREF: Reference-Free Evaluation of Personalised Text Generation in LLMs
von: Fu, Xiao, et al.
Veröffentlicht: (2025)

Taxonomy-based CheckList for Large Language Model Evaluation
von: Zhang, Damin
Veröffentlicht: (2023)

An Online Reference-Free Evaluation Framework for Flowchart Image-to-Code Generation
von: Nguyen, Giang Son, et al.
Veröffentlicht: (2026)

Evaluating Nuanced Bias in Large Language Model Free Response Answers
von: Healey, Jennifer, et al.
Veröffentlicht: (2024)

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics
von: Ahmadi, Saba, et al.
Veröffentlicht: (2023)

The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure
von: Cook, David A.
Veröffentlicht: (2026)

Tutor Move Taxonomy: A Theory-Aligned Framework for Analyzing Instructional Moves in Tutoring
von: Zhou, Zhuqian, et al.
Veröffentlicht: (2026)

Generating Leakage-Free Benchmarks for Robust RAG Evaluation
von: Liu, Jiayi, et al.
Veröffentlicht: (2026)

LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
von: Schlangen, David
Veröffentlicht: (2024)

ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization
von: Bae, Suyoung, et al.
Veröffentlicht: (2026)

SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
von: He, Hangfeng, et al.
Veröffentlicht: (2023)

Evaluating Optimal Reference Translations
von: Zouhar, Vilém, et al.
Veröffentlicht: (2023)

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
von: Jain, Shomik, et al.
Veröffentlicht: (2025)

MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation
von: Srun, Nalin, et al.
Veröffentlicht: (2026)

Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
von: Maurya, Kaushal Kumar, et al.
Veröffentlicht: (2024)

Cobra Effect in Reference-Free Image Captioning Metrics
von: Ma, Zheng, et al.
Veröffentlicht: (2024)

References Matter: Investigating the Impact of Reference Set Variation on Summarization Evaluation
von: Casola, Silvia, et al.
Veröffentlicht: (2025)

Evaluating the performance of state-of-the-art esg domain-specific pre-trained large language models in text classification against existing models and traditional machine learning techniques
von: Chung, Tin Yuet, et al.
Veröffentlicht: (2024)

Knowledge in Triples for LLMs: Enhancing Table QA Accuracy with Semantic Extraction
von: Sholehrasa, Hossein, et al.
Veröffentlicht: (2024)

References Indeed Matter? Reference-Free Preference Optimization for Conversational Query Reformulation
von: Kim, Doyoung, et al.
Veröffentlicht: (2025)

From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility
von: Levinson, Gavin, et al.
Veröffentlicht: (2026)

Defining Cultural Capabilities for AI Evaluation: A Taxonomy Grounded in Intercultural Communication Theory
von: Nejadgholi, Isar, et al.
Veröffentlicht: (2026)

Can Deep Research Agents Retrieve and Organize? Evaluating the Synthesis Gap with Expert Taxonomies
von: Zhang, Ming, et al.
Veröffentlicht: (2026)

Taming LLMs with Negative Samples: A Reference-Free Framework to Evaluate Presentation Content with Actionable Feedback
von: Muppidi, Ananth, et al.
Veröffentlicht: (2025)