Saved in:
| Main Authors: | Ren, Yujie, Gruhlke, Niklas, Lauscher, Anne |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.10539 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination
by: Islam, Saad Obaid ul, et al.
Published: (2025)
by: Islam, Saad Obaid ul, et al.
Published: (2025)
GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German
by: Mewes, Fabian, et al.
Published: (2026)
by: Mewes, Fabian, et al.
Published: (2026)
Around the World in 24 Hours: Probing LLM Knowledge of Time and Place
by: Holtermann, Carolin, et al.
Published: (2025)
by: Holtermann, Carolin, et al.
Published: (2025)
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024)
by: Vida, Karina, et al.
Published: (2024)
Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language
by: Kaiser, Jan, et al.
Published: (2024)
by: Kaiser, Jan, et al.
Published: (2024)
Reviewing the Reviewer: Elevating Peer Review Quality through LLM-Guided Feedback
by: Purkayastha, Sukannya, et al.
Published: (2026)
by: Purkayastha, Sukannya, et al.
Published: (2026)
Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German
by: Lardelli, Manuel, et al.
Published: (2024)
by: Lardelli, Manuel, et al.
Published: (2024)
The Echoes of Multilinguality: Tracing Cultural Value Shifts during LM Fine-tuning
by: Choenni, Rochelle, et al.
Published: (2024)
by: Choenni, Rochelle, et al.
Published: (2024)
Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models
by: Bui, Minh Duc, et al.
Published: (2024)
by: Bui, Minh Duc, et al.
Published: (2024)
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
by: Belouadi, Jonas, et al.
Published: (2023)
by: Belouadi, Jonas, et al.
Published: (2023)
SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image Generation
by: Holtermann, Carolin, et al.
Published: (2026)
by: Holtermann, Carolin, et al.
Published: (2026)
TempViz: On the Evaluation of Temporal Knowledge in Text-to-Image Models
by: Holtermann, Carolin, et al.
Published: (2026)
by: Holtermann, Carolin, et al.
Published: (2026)
Local Contrastive Editing of Gender Stereotypes
by: Lutz, Marlene, et al.
Published: (2024)
by: Lutz, Marlene, et al.
Published: (2024)
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking
by: Schneider, Florian, et al.
Published: (2025)
by: Schneider, Florian, et al.
Published: (2025)
Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue
by: Purkayastha, Sukannya, et al.
Published: (2025)
by: Purkayastha, Sukannya, et al.
Published: (2025)
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification
by: Waldis, Andreas, et al.
Published: (2024)
by: Waldis, Andreas, et al.
Published: (2024)
Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP
by: Gautam, Vagrant, et al.
Published: (2024)
by: Gautam, Vagrant, et al.
Published: (2024)
LLM Hallucination Detection: HSAD
by: Li, JinXin, et al.
Published: (2025)
by: Li, JinXin, et al.
Published: (2025)
The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
by: Islam, Saad Obaid ul, et al.
Published: (2025)
by: Islam, Saad Obaid ul, et al.
Published: (2025)
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition
by: Holtermann, Carolin, et al.
Published: (2024)
by: Holtermann, Carolin, et al.
Published: (2024)
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
by: Holtermann, Carolin, et al.
Published: (2024)
by: Holtermann, Carolin, et al.
Published: (2024)
Sensitivity, Performance, Robustness: Deconstructing the Effect of Sociodemographic Prompting
by: Beck, Tilman, et al.
Published: (2023)
by: Beck, Tilman, et al.
Published: (2023)
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
by: Gautam, Vagrant, et al.
Published: (2024)
by: Gautam, Vagrant, et al.
Published: (2024)
Aligned Probing: Relating Toxic Behavior and Model Internals
by: Waldis, Andreas, et al.
Published: (2025)
by: Waldis, Andreas, et al.
Published: (2025)
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews
by: Purkayastha, Sukannya, et al.
Published: (2025)
by: Purkayastha, Sukannya, et al.
Published: (2025)
Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective
by: Lee, Jae Hee, et al.
Published: (2025)
by: Lee, Jae Hee, et al.
Published: (2025)
Cultural Authenticity: Comparing LLM Cultural Representations to Native Human Expectations
by: van Liemt, Erin MacMurray, et al.
Published: (2026)
by: van Liemt, Erin MacMurray, et al.
Published: (2026)
Do Benchmarks Underestimate LLM Performance? Evaluating Hallucination Detection With LLM-First Human-Adjudicated Assessment
by: Atasoy, I. F., et al.
Published: (2026)
by: Atasoy, I. F., et al.
Published: (2026)
Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
by: Hussain, Khizar, et al.
Published: (2026)
by: Hussain, Khizar, et al.
Published: (2026)
Span-Level Hallucination Detection for LLM-Generated Answers
by: Elchafei, Passant, et al.
Published: (2025)
by: Elchafei, Passant, et al.
Published: (2025)
WinoPron: Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case
by: Gautam, Vagrant, et al.
Published: (2024)
by: Gautam, Vagrant, et al.
Published: (2024)
Steer LLM Latents for Hallucination Detection
by: Park, Seongheon, et al.
Published: (2025)
by: Park, Seongheon, et al.
Published: (2025)
PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations
by: Wu, Yuhe, et al.
Published: (2026)
by: Wu, Yuhe, et al.
Published: (2026)
ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
by: Frohmann, Markus, et al.
Published: (2023)
by: Frohmann, Markus, et al.
Published: (2023)
HalluCounter: Reference-free LLM Hallucination Detection in the Wild!
by: Urlana, Ashok, et al.
Published: (2025)
by: Urlana, Ashok, et al.
Published: (2025)
Large Language Models Discriminate Against Speakers of German Dialects
by: Bui, Minh Duc, et al.
Published: (2025)
by: Bui, Minh Duc, et al.
Published: (2025)
InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers
by: Yehuda, Yakir, et al.
Published: (2024)
by: Yehuda, Yakir, et al.
Published: (2024)
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
by: Karamolegkou, Antonia, et al.
Published: (2024)
by: Karamolegkou, Antonia, et al.
Published: (2024)
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
by: Du, Xuefeng, et al.
Published: (2024)
by: Du, Xuefeng, et al.
Published: (2024)
Hallucination Detection and Hallucination Mitigation: An Investigation
by: Luo, Junliang, et al.
Published: (2024)
by: Luo, Junliang, et al.
Published: (2024)
Similar Items
-
How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination
by: Islam, Saad Obaid ul, et al.
Published: (2025) -
GRUFF: LLM Pronoun Fidelity, Reasoning, and Biases in German
by: Mewes, Fabian, et al.
Published: (2026) -
Around the World in 24 Hours: Probing LLM Knowledge of Time and Place
by: Holtermann, Carolin, et al.
Published: (2025) -
Decoding Multilingual Moral Preferences: Unveiling LLM's Biases Through the Moral Machine Experiment
by: Vida, Karina, et al.
Published: (2024) -
Large Language Models for Human-Machine Collaborative Particle Accelerator Tuning through Natural Language
by: Kaiser, Jan, et al.
Published: (2024)