Saved in:
| Main Authors: | Wiland, Jacek, Ploner, Max, Akbik, Alan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.04113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
by: Ploner, Max, et al.
Published: (2024)
by: Ploner, Max, et al.
Published: (2024)
Towards a Principled Evaluation of Knowledge Editors
by: Pohl, Sebastian, et al.
Published: (2025)
by: Pohl, Sebastian, et al.
Published: (2025)
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts
by: Christoph, Daniel, et al.
Published: (2025)
by: Christoph, Daniel, et al.
Published: (2025)
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks
by: Garbas, Lukas, et al.
Published: (2024)
by: Garbas, Lukas, et al.
Published: (2024)
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data
by: Golde, Jonas, et al.
Published: (2024)
by: Golde, Jonas, et al.
Published: (2024)
Self-Aware Knowledge Probing: Evaluating Language Models' Relational Knowledge through Confidence Calibration
by: Kissling, Christopher, et al.
Published: (2026)
by: Kissling, Christopher, et al.
Published: (2026)
Pre-Training Curriculum for Multi-Token Prediction in Language Models
by: Aynetdinov, Ansar, et al.
Published: (2025)
by: Aynetdinov, Ansar, et al.
Published: (2025)
Evaluating Design Decisions for Dual Encoder-based Entity Disambiguation
by: Rücker, Susanna, et al.
Published: (2025)
by: Rücker, Susanna, et al.
Published: (2025)
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
by: Aynetdinov, Ansar, et al.
Published: (2024)
by: Aynetdinov, Ansar, et al.
Published: (2024)
Beyond Marginal Distributions: A Framework to Evaluate the Representativeness of Demographic-Aligned LLMs
by: Williams, Tristan, et al.
Published: (2026)
by: Williams, Tristan, et al.
Published: (2026)
BabyHGRN: Exploring RNNs for Sample-Efficient Training of Language Models
by: Haller, Patrick, et al.
Published: (2024)
by: Haller, Patrick, et al.
Published: (2024)
What Matters in Linearizing Language Models? A Comparative Study of Architecture, Scale, and Task Adaptation
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
Sample-Efficient Language Modeling with Linear Attention and Lightweight Enhancements
by: Haller, Patrick, et al.
Published: (2025)
by: Haller, Patrick, et al.
Published: (2025)
Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling
by: Aynetdinov, Ansar, et al.
Published: (2026)
by: Aynetdinov, Ansar, et al.
Published: (2026)
Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions
by: Dallabetta, Max, et al.
Published: (2024)
by: Dallabetta, Max, et al.
Published: (2024)
Lemma Dilemma: On Lemma Generation Without Domain- or Language-Specific Training Data
by: Toporkov, Olia, et al.
Published: (2025)
by: Toporkov, Olia, et al.
Published: (2025)
What Matters When Building Universal Multilingual Named Entity Recognition Models?
by: Golde, Jonas, et al.
Published: (2026)
by: Golde, Jonas, et al.
Published: (2026)
Large-Scale Label Interpretation Learning for Few-Shot Named Entity Recognition
by: Golde, Jonas, et al.
Published: (2024)
by: Golde, Jonas, et al.
Published: (2024)
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition
by: Golde, Jonas, et al.
Published: (2025)
by: Golde, Jonas, et al.
Published: (2025)
Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning
by: Schulte, David, et al.
Published: (2024)
by: Schulte, David, et al.
Published: (2024)
MastermindEval: A Simple But Scalable Reasoning Benchmark
by: Golde, Jonas, et al.
Published: (2025)
by: Golde, Jonas, et al.
Published: (2025)
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition
by: Merdjanovska, Elena, et al.
Published: (2024)
by: Merdjanovska, Elena, et al.
Published: (2024)
Question Decomposition for Retrieval-Augmented Generation
by: Ammann, Paul J. L., et al.
Published: (2025)
by: Ammann, Paul J. L., et al.
Published: (2025)
AntLM: Bridging Causal and Masked Language Models
by: Yu, Xinru, et al.
Published: (2024)
by: Yu, Xinru, et al.
Published: (2024)
Exploration of Masked and Causal Language Modelling for Text Generation
by: Micheletti, Nicolo, et al.
Published: (2024)
by: Micheletti, Nicolo, et al.
Published: (2024)
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
by: Golde, Jonas, et al.
Published: (2023)
by: Golde, Jonas, et al.
Published: (2023)
WikiCausal: Corpus and Evaluation Framework for Causal Knowledge Graph Construction
by: Hassanzadeh, Oktie
Published: (2024)
by: Hassanzadeh, Oktie
Published: (2024)
OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models
by: Lee, Unggi, et al.
Published: (2026)
by: Lee, Unggi, et al.
Published: (2026)
BEAR: Budgeted Evidence Allocation for Multi-Document Reasoning
by: Sun, Lin, et al.
Published: (2026)
by: Sun, Lin, et al.
Published: (2026)
Bridge: A Unified Framework to Knowledge Graph Completion via Language Models and Knowledge Representation
by: Qiao, Qiao, et al.
Published: (2024)
by: Qiao, Qiao, et al.
Published: (2024)
A Comprehensive Evaluation of Semantic Relation Knowledge of Pretrained Language Models and Humans
by: Cao, Zhihan, et al.
Published: (2024)
by: Cao, Zhihan, et al.
Published: (2024)
Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models
by: Liu, Yang
Published: (2024)
by: Liu, Yang
Published: (2024)
UniKnow: A Unified Framework for Reliable Language Model Behavior across Parametric and External Knowledge
by: Kim, Youna, et al.
Published: (2025)
by: Kim, Youna, et al.
Published: (2025)
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models
by: Huang, Zhaoheng, et al.
Published: (2024)
by: Huang, Zhaoheng, et al.
Published: (2024)
Causal Reasoning in Large Language Models: A Knowledge Graph Approach
by: Kim, Yejin, et al.
Published: (2024)
by: Kim, Yejin, et al.
Published: (2024)
Redefining Evaluation Standards: A Unified Framework for Evaluating the Korean Capabilities of Language Models
by: Lee, Hanwool, et al.
Published: (2025)
by: Lee, Hanwool, et al.
Published: (2025)
Knowledge Reasoning Language Model: Unifying Knowledge and Language for Inductive Knowledge Graph Reasoning
by: Zhuo, Xingrui, et al.
Published: (2025)
by: Zhuo, Xingrui, et al.
Published: (2025)
UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models
by: Li, Zhuoyang, et al.
Published: (2024)
by: Li, Zhuoyang, et al.
Published: (2024)
Medical Coding with Biomedical Transformer Ensembles and Zero/Few-shot Learning
by: Ziletti, Angelo, et al.
Published: (2022)
by: Ziletti, Angelo, et al.
Published: (2022)
Causal Evaluation of Language Models
by: Chen, Sirui, et al.
Published: (2024)
by: Chen, Sirui, et al.
Published: (2024)
Similar Items
-
LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
by: Ploner, Max, et al.
Published: (2024) -
Towards a Principled Evaluation of Knowledge Editors
by: Pohl, Sebastian, et al.
Published: (2025) -
From Data to Knowledge: Evaluating How Efficiently Language Models Learn Facts
by: Christoph, Daniel, et al.
Published: (2025) -
TransformerRanker: A Tool for Efficiently Finding the Best-Suited Language Models for Downstream Classification Tasks
by: Garbas, Lukas, et al.
Published: (2024) -
Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data
by: Golde, Jonas, et al.
Published: (2024)