:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Yujie, Gruhlke, Niklas, Lauscher, Anne
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2510.10539
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination
by: Islam, Saad Obaid ul, et al.
Published: (2025)

GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German
by: Mewes, Fabian, et al.
Published: (2026)

Around the World in 24 Hours: Probing LLM Knowledge of Time and Place
by: Holtermann, Carolin, et al.
Published: (2025)

Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024)

Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language
by: Kaiser, Jan, et al.
Published: (2024)

Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback
by: Purkayastha, Sukannya, et al.
Published: (2026)

Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German
by: Lardelli, Manuel, et al.
Published: (2024)

The Echoes of Multilinguality: Tracing Cultural Value Shifts during LM Fine-tuning
by: Choenni, Rochelle, et al.
Published: (2024)

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
by: Bui, Minh Duc, et al.
Published: (2024)

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
by: Belouadi, Jonas, et al.
Published: (2023)

SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image Generation
by: Holtermann, Carolin, et al.
Published: (2026)

TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models
by: Holtermann, Carolin, et al.
Published: (2026)

Local Contrastive Editing of Gender Stereotypes
by: Lutz, Marlene, et al.
Published: (2024)

GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking
by: Schneider, Florian, et al.
Published: (2025)

Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue
by: Purkayastha, Sukannya, et al.
Published: (2025)

The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification
by: Waldis, Andreas, et al.
Published: (2024)

Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP
by: Gautam, Vagrant, et al.
Published: (2024)

LLM Hallucination Detection: HSAD
by: Li, JinXin, et al.
Published: (2025)

The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
by: Islam, Saad Obaid ul, et al.
Published: (2025)

What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition
by: Holtermann, Carolin, et al.
Published: (2024)

Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
by: Holtermann, Carolin, et al.
Published: (2024)

Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting
by: Beck, Tilman, et al.
Published: (2023)

Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
by: Gautam, Vagrant, et al.
Published: (2024)

Aligned Probing: Relating Toxic Behavior and Model Internals
by: Waldis, Andreas, et al.
Published: (2025)

LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews
by: Purkayastha, Sukannya, et al.
Published: (2025)

Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective
by: Lee, Jae Hee, et al.
Published: (2025)

Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations
by: van Liemt, Erin MacMurray, et al.
Published: (2026)

Do Benchmarks Underestimate LLM Performance? Evaluating Hallucination Detection With LLM-First Human-Adjudicated Assessment
by: Atasoy, I. F., et al.
Published: (2026)

Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
by: Hussain, Khizar, et al.
Published: (2026)

Span-Level Hallucination Detection for LLM-Generated Answers
by: Elchafei, Passant, et al.
Published: (2025)

WinoPron: Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case
by: Gautam, Vagrant, et al.
Published: (2024)

Steer LLM Latents for Hallucination Detection
by: Park, Seongheon, et al.
Published: (2025)

PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations
by: Wu, Yuhe, et al.
Published: (2026)

ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
by: Frohmann, Markus, et al.
Published: (2023)

HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
by: Urlana, Ashok, et al.
Published: (2025)

Large Language Models Discriminate Against Speakers of German Dialects
by: Bui, Minh Duc, et al.
Published: (2025)

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers
by: Yehuda, Yakir, et al.
Published: (2024)

Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
by: Karamolegkou, Antonia, et al.
Published: (2024)

HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
by: Du, Xuefeng, et al.
Published: (2024)

Hallucination Detection and Hallucination Mitigation: An Investigation
by: Luo, Junliang, et al.
Published: (2024)