Saved in:
| Main Authors: | Kunilovskaya, Maria, Bhatia, Gagan, Albertelli, Lisa Sophie, Chen, Yanran, Greisinger, Christian, Kiefer, Lotta, Leiter, Christoph, Roy, Subhadeep, Achamaleh, Tewodros, Manzoor, Muhammad Arslan, Pohl, Sebastian, Hou, Yufang, Eger, Steffen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2606.02255 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
by: Roy, Subhadeep, et al.
Published: (2026)
by: Roy, Subhadeep, et al.
Published: (2026)
GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark
by: Kiefer, Lotta, et al.
Published: (2026)
by: Kiefer, Lotta, et al.
Published: (2026)
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
by: Greisinger, Christian, et al.
Published: (2026)
by: Greisinger, Christian, et al.
Published: (2026)
DeepSeek-R1 vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?
by: Larionov, Daniil, et al.
Published: (2025)
by: Larionov, Daniil, et al.
Published: (2025)
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
by: Leiter, Christoph, et al.
Published: (2024)
by: Leiter, Christoph, et al.
Published: (2024)
Do Emotions Really Affect Argument Convincingness? A Dynamic Approach with LLM-based Manipulation Checks
by: Chen, Yanran, et al.
Published: (2025)
by: Chen, Yanran, et al.
Published: (2025)
BMX: Boosting Natural Language Generation Metrics with Explainability
by: Leiter, Christoph, et al.
Published: (2022)
by: Leiter, Christoph, et al.
Published: (2022)
NLLG Quarterly arXiv Report 09/24: What are the most influential current AI Papers?
by: Leiter, Christoph, et al.
Published: (2024)
by: Leiter, Christoph, et al.
Published: (2024)
Is there really a Citation Age Bias in NLP?
by: Nguyen, Hoa, et al.
Published: (2024)
by: Nguyen, Hoa, et al.
Published: (2024)
CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks
by: Leiter, Christoph, et al.
Published: (2025)
by: Leiter, Christoph, et al.
Published: (2025)
Evaluating Diversity in Automatic Poetry Generation
by: Chen, Yanran, et al.
Published: (2024)
by: Chen, Yanran, et al.
Published: (2024)
Translationese as a Rational Response to Translation Task Difficulty
by: Kunilovskaya, Maria
Published: (2026)
by: Kunilovskaya, Maria
Published: (2026)
Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection
by: Chen, Yanran, et al.
Published: (2025)
by: Chen, Yanran, et al.
Published: (2025)
Towards Explainable Evaluation Metrics for Machine Translation
by: Leiter, Christoph, et al.
Published: (2023)
by: Leiter, Christoph, et al.
Published: (2023)
ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs
by: Wang, Zhipin, et al.
Published: (2026)
by: Wang, Zhipin, et al.
Published: (2026)
EPIC-EuroParl-UdS: Information-Theoretic Perspectives on Translation and Interpreting
by: Kunilovskaya, Maria, et al.
Published: (2026)
by: Kunilovskaya, Maria, et al.
Published: (2026)
Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
by: Chen, Yanran, et al.
Published: (2024)
by: Chen, Yanran, et al.
Published: (2024)
ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models
by: Belouadi, Jonas, et al.
Published: (2022)
by: Belouadi, Jonas, et al.
Published: (2022)
PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics
by: Larionov, Daniil, et al.
Published: (2024)
by: Larionov, Daniil, et al.
Published: (2024)
USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
by: Belouadi, Jonas, et al.
Published: (2022)
by: Belouadi, Jonas, et al.
Published: (2022)
LLM-based multi-agent poetry generation in non-cooperative environments
by: Zhang, Ran, et al.
Published: (2024)
by: Zhang, Ran, et al.
Published: (2024)
BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression
by: Larionov, Daniil, et al.
Published: (2025)
by: Larionov, Daniil, et al.
Published: (2025)
Argument Summarization and its Evaluation in the Era of Large Language Models
by: Altemeyer, Moritz, et al.
Published: (2025)
by: Altemeyer, Moritz, et al.
Published: (2025)
Leveraging Vision-Language Pre-training for Human Activity Recognition in Still Images
by: Mahanta, Cristina, et al.
Published: (2025)
by: Mahanta, Cristina, et al.
Published: (2025)
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation
by: Eger, Steffen, et al.
Published: (2025)
by: Eger, Steffen, et al.
Published: (2025)
Who and What? Using Linguistic Features and Annotator Characteristics to Analyze Annotation Variation
by: Maurer, Maximilian, et al.
Published: (2026)
by: Maurer, Maximilian, et al.
Published: (2026)
Annotator-Centric Active Learning for Subjective NLP Tasks
by: van der Meer, Michiel, et al.
Published: (2024)
by: van der Meer, Michiel, et al.
Published: (2024)
The Nature of NLP: Analyzing Contributions in NLP Papers
by: Pramanick, Aniket, et al.
Published: (2024)
by: Pramanick, Aniket, et al.
Published: (2024)
Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP
by: Xu, Yinuo, et al.
Published: (2026)
by: Xu, Yinuo, et al.
Published: (2026)
ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
by: Zhang, Leixin, et al.
Published: (2024)
by: Zhang, Leixin, et al.
Published: (2024)
Says Who? Effective Zero-Shot Annotation of Focalization
by: Hicke, Rebecca M. M., et al.
Published: (2024)
by: Hicke, Rebecca M. M., et al.
Published: (2024)
Blind Spots and Biases: Exploring the Role of Annotator Cognitive Biases in NLP
by: Gautam, Sanjana, et al.
Published: (2024)
by: Gautam, Sanjana, et al.
Published: (2024)
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
by: Zhang, Ran, et al.
Published: (2023)
by: Zhang, Ran, et al.
Published: (2023)
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
by: Zhang, Ran, et al.
Published: (2024)
by: Zhang, Ran, et al.
Published: (2024)
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
by: Belouadi, Jonas, et al.
Published: (2023)
by: Belouadi, Jonas, et al.
Published: (2023)
Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning
by: Bhatia, Gagan, et al.
Published: (2025)
by: Bhatia, Gagan, et al.
Published: (2025)
Disability-First AI Dataset Annotation: Co-designing Stuttered Speech Annotation Guidelines with People Who Stutter
by: Tang, Xinru, et al.
Published: (2026)
by: Tang, Xinru, et al.
Published: (2026)
A Multi-View Media Profiling Suite: Resources, Evaluation, and Analysis
by: Manzoor, Muhammad Arslan, et al.
Published: (2026)
by: Manzoor, Muhammad Arslan, et al.
Published: (2026)
Las actitudes sexistas ambivalentes de entrenadores y entrenadoras y su relación la percepción del clima motivacional que crean en sus equipos deportivos
by: Ezequiel Leiter
Published: (2021)
by: Ezequiel Leiter
Published: (2021)
Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph
by: Nechakhin, Vladyslav, et al.
Published: (2024)
by: Nechakhin, Vladyslav, et al.
Published: (2024)
Similar Items
-
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
by: Roy, Subhadeep, et al.
Published: (2026) -
GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark
by: Kiefer, Lotta, et al.
Published: (2026) -
TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
by: Greisinger, Christian, et al.
Published: (2026) -
DeepSeek-R1 vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?
by: Larionov, Daniil, et al.
Published: (2025) -
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
by: Leiter, Christoph, et al.
Published: (2024)