:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Belouadi, Jonas, Eger, Steffen
Format:	Preprint
Published:	2022
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2202.10062
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models
by: Belouadi, Jonas, et al.
Published: (2022)

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
by: Belouadi, Jonas, et al.
Published: (2023)

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
by: Belouadi, Jonas, et al.
Published: (2024)

Towards Explainable Evaluation Metrics for Machine Translation
by: Leiter, Christoph, et al.
Published: (2023)

BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression
by: Larionov, Daniil, et al.
Published: (2025)

PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
by: Leiter, Christoph, et al.
Published: (2024)

PromptOptMe: Error-Aware Prompt Compression for LLM-based MT Evaluation Metrics
by: Larionov, Daniil, et al.
Published: (2024)

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
by: Zhang, Ran, et al.
Published: (2024)

NLLG Quarterly arXiv Report 09/24: What are the most influential current AI Papers?
by: Leiter, Christoph, et al.
Published: (2024)

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
by: Belouadi, Jonas, et al.
Published: (2025)

BMX: Boosting Natural Language Generation Metrics with Explainability
by: Leiter, Christoph, et al.
Published: (2022)

Do Emotions Really Affect Argument Convincingness? A Dynamic Approach with LLM-based Manipulation Checks
by: Chen, Yanran, et al.
Published: (2025)

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
by: Zhang, Leixin, et al.
Published: (2024)

CROC: Evaluating and Training T2I Metrics with Pseudo- and Human-Labeled Contrastive Robustness Checks
by: Leiter, Christoph, et al.
Published: (2025)

xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
by: Larionov, Daniil, et al.
Published: (2024)

MultiMat: Multimodal Program Synthesis for Procedural Materials using Large Multimodal Models
by: Belouadi, Jonas, et al.
Published: (2025)

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
by: Zhang, Ran, et al.
Published: (2023)

LLM-based multi-agent poetry generation in non-cooperative environments
by: Zhang, Ran, et al.
Published: (2024)

ContrastScore: Towards Higher Quality, Less Biased, More Efficient Evaluation Metrics with Contrastive Evaluation
by: Wang, Xiao, et al.
Published: (2025)

LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering
by: Zhang, Ran, et al.
Published: (2025)

Evaluating Diversity in Automatic Poetry Generation
by: Chen, Yanran, et al.
Published: (2024)

TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning
by: Greisinger, Christian, et al.
Published: (2026)

Is there really a Citation Age Bias in NLP?
by: Nguyen, Hoa, et al.
Published: (2024)

Syntactic Language Change in English and German: Metrics, Parsers, and Convergences
by: Chen, Yanran, et al.
Published: (2024)

Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph
by: Nechakhin, Vladyslav, et al.
Published: (2024)

Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs
by: Mekki, Abdellah El, et al.
Published: (2024)

Evaluating Automatic Metrics with Incremental Machine Translation Systems
by: Wu, Guojun, et al.
Published: (2024)

Beyond Reproduction: A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation
by: Zhang, Ran, et al.
Published: (2026)

Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
by: Perrella, Stefano, et al.
Published: (2024)

Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics
by: Roy, Subhadeep, et al.
Published: (2026)

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language
by: Wang, Shun, et al.
Published: (2024)

Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
by: Perrella, Stefano, et al.
Published: (2024)

Argument Summarization and its Evaluation in the Era of Large Language Models
by: Altemeyer, Moritz, et al.
Published: (2025)

ValueGround: Evaluating Culture-Conditioned Visual Value Grounding in MLLMs
by: Wang, Zhipin, et al.
Published: (2026)

Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection
by: Chen, Yanran, et al.
Published: (2025)

GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark
by: Kiefer, Lotta, et al.
Published: (2026)

Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
by: Yari, Amir Hossein, et al.
Published: (2025)

Ensemble Self-Training for Unsupervised Machine Translation
by: Aharon, Ido, et al.
Published: (2026)

Quality and Quantity of Machine Translation References for Automatic Metrics
by: Zouhar, Vilém, et al.
Published: (2024)

LLM Analysis of 150+ years of German Parliamentary Debates on Migration Reveals Shift from Post-War Solidarity to Anti-Solidarity in the Last Decade
by: Kostikova, Aida, et al.
Published: (2025)