Saved in:
| Main Authors: | Pohl, Sebastian, Ploner, Max, Akbik, Alan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.05937 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
by: Ploner, Max, et al.
Published: (2024)
by: Ploner, Max, et al.
Published: (2024)
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models
by: Wiland, Jacek, et al.
Published: (2024)
by: Wiland, Jacek, et al.
Published: (2024)
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts
by: Christoph, Daniel, et al.
Published: (2025)
by: Christoph, Daniel, et al.
Published: (2025)
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks
by: Garbas, Lukas, et al.
Published: (2024)
by: Garbas, Lukas, et al.
Published: (2024)
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data
by: Golde, Jonas, et al.
Published: (2024)
by: Golde, Jonas, et al.
Published: (2024)
Self-Aware Knowledge Probing: Evaluating Language Models' Relational Knowledge through Confidence Calibration
by: Kissling, Christopher, et al.
Published: (2026)
by: Kissling, Christopher, et al.
Published: (2026)
Evaluating Design Decisions for Dual Encoder-based Entity Disambiguation
by: Rücker, Susanna, et al.
Published: (2025)
by: Rücker, Susanna, et al.
Published: (2025)
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
by: Aynetdinov, Ansar, et al.
Published: (2024)
by: Aynetdinov, Ansar, et al.
Published: (2024)
Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs
by: Williams, Tristan, et al.
Published: (2026)
by: Williams, Tristan, et al.
Published: (2026)
Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions
by: Dallabetta, Max, et al.
Published: (2024)
by: Dallabetta, Max, et al.
Published: (2024)
Pre-Training Curriculum for Multi-Token Prediction in Language Models
by: Aynetdinov, Ansar, et al.
Published: (2025)
by: Aynetdinov, Ansar, et al.
Published: (2025)
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition
by: Golde, Jonas, et al.
Published: (2025)
by: Golde, Jonas, et al.
Published: (2025)
Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data
by: Toporkov, Olia, et al.
Published: (2025)
by: Toporkov, Olia, et al.
Published: (2025)
What Matters When Building Universal Multilingual Named Entity Recognition Models?
by: Golde, Jonas, et al.
Published: (2026)
by: Golde, Jonas, et al.
Published: (2026)
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models
by: Haller, Patrick, et al.
Published: (2024)
by: Haller, Patrick, et al.
Published: (2024)
Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition
by: Golde, Jonas, et al.
Published: (2024)
by: Golde, Jonas, et al.
Published: (2024)
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
What Matters in Linearizing Language Models? A Comparative Study of Architecture, Scale, and Task Adaptation
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
by: Aynetdinov, Ansar, et al.
Published: (2026)
by: Aynetdinov, Ansar, et al.
Published: (2026)
Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning
by: Schulte, David, et al.
Published: (2024)
by: Schulte, David, et al.
Published: (2024)
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition
by: Merdjanovska, Elena, et al.
Published: (2024)
by: Merdjanovska, Elena, et al.
Published: (2024)
MastermindEval: A Simple But Scalable Reasoning Benchmark
by: Golde, Jonas, et al.
Published: (2025)
by: Golde, Jonas, et al.
Published: (2025)
Question Decomposition for Retrieval-Augmented Generation
by: Ammann, Paul J. L., et al.
Published: (2025)
by: Ammann, Paul J. L., et al.
Published: (2025)
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
by: Golde, Jonas, et al.
Published: (2023)
by: Golde, Jonas, et al.
Published: (2023)
Medical Coding with Biomedical Transformer Ensembles and Zero/Few-shot Learning
by: Ziletti, Angelo, et al.
Published: (2022)
by: Ziletti, Angelo, et al.
Published: (2022)
HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools
by: Sänger, Mario, et al.
Published: (2024)
by: Sänger, Mario, et al.
Published: (2024)
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
by: Voronov, Anton, et al.
Published: (2024)
by: Voronov, Anton, et al.
Published: (2024)
Model-Aware Tokenizer Transfer
by: Haltiuk, Mykola, et al.
Published: (2025)
by: Haltiuk, Mykola, et al.
Published: (2025)
Targum -- A Multilingual New Testament Translation Corpus
by: Rapacz, Maciej, et al.
Published: (2026)
by: Rapacz, Maciej, et al.
Published: (2026)
WilKE: Wise-Layer Knowledge Editor for Lifelong Knowledge Editing
by: Hu, Chenhui, et al.
Published: (2024)
by: Hu, Chenhui, et al.
Published: (2024)
DRIP-R: A Benchmark for Decision-Making and Reasoning Under Real-World Policy Ambiguity in the Retail Domain
by: Borkakoty, Hsuvas, et al.
Published: (2026)
by: Borkakoty, Hsuvas, et al.
Published: (2026)
Towards a Holistic Evaluation of LLMs on Factual Knowledge Recall
by: Yuan, Jiaqing, et al.
Published: (2024)
by: Yuan, Jiaqing, et al.
Published: (2024)
Towards Efficient LLMs Annealing with Principled Sample Selection
by: Xu, Yuanjian, et al.
Published: (2026)
by: Xu, Yuanjian, et al.
Published: (2026)
In Good GRACEs: Principled Teacher Selection for Knowledge Distillation
by: Panigrahi, Abhishek, et al.
Published: (2025)
by: Panigrahi, Abhishek, et al.
Published: (2025)
A Principled Framework for Evaluating on Typologically Diverse Languages
by: Ploeger, Esther, et al.
Published: (2024)
by: Ploeger, Esther, et al.
Published: (2024)
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
by: Ku, Max, et al.
Published: (2023)
by: Ku, Max, et al.
Published: (2023)
Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities
by: Oh, Hongseok, et al.
Published: (2025)
by: Oh, Hongseok, et al.
Published: (2025)
Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution
by: Anikina, Tatiana, et al.
Published: (2025)
by: Anikina, Tatiana, et al.
Published: (2025)
Do Language Models Encode Knowledge of Linguistic Constraint Violations?
by: Hardy, et al.
Published: (2026)
by: Hardy, et al.
Published: (2026)
PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
Similar Items
-
LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
by: Ploner, Max, et al.
Published: (2024) -
BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models
by: Wiland, Jacek, et al.
Published: (2024) -
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts
by: Christoph, Daniel, et al.
Published: (2025) -
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks
by: Garbas, Lukas, et al.
Published: (2024) -
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data
by: Golde, Jonas, et al.
Published: (2024)