Saved in:
| Main Authors: | Smywiński-Pohl, Aleksander, Libal, Tomer, Kaczmarczyk, Adam, Król, Magdalena |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.13965 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal
by: Karp, Michał, et al.
Published: (2025)
by: Karp, Michał, et al.
Published: (2025)
Model-Aware Tokenizer Transfer
by: Haltiuk, Mykola, et al.
Published: (2025)
by: Haltiuk, Mykola, et al.
Published: (2025)
Targum -- A Multilingual New Testament Translation Corpus
by: Rapacz, Maciej, et al.
Published: (2026)
by: Rapacz, Maciej, et al.
Published: (2026)
eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey
by: Nowak, Krzysztof, et al.
Published: (2024)
by: Nowak, Krzysztof, et al.
Published: (2024)
Cognitive models can reveal interpretable value trade-offs in language models
by: Murthy, Sonia K., et al.
Published: (2025)
by: Murthy, Sonia K., et al.
Published: (2025)
LLM_annotate: A Python package for annotating and analyzing fiction characters
by: Rosenbusch, Hannes
Published: (2025)
by: Rosenbusch, Hannes
Published: (2025)
The Illusion-Illusion: Vision Language Models See Illusions Where There are None
by: Ullman, Tomer
Published: (2024)
by: Ullman, Tomer
Published: (2024)
LLMs for automatic annotation of Mandarin narrative transcripts
by: Zhao, Qingwen, et al.
Published: (2026)
by: Zhao, Qingwen, et al.
Published: (2026)
Recovering document annotations for sentence-level bitext
by: Wicks, Rachel, et al.
Published: (2024)
by: Wicks, Rachel, et al.
Published: (2024)
When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection
by: Qazi, Alamgir Munir, et al.
Published: (2025)
by: Qazi, Alamgir Munir, et al.
Published: (2025)
Towards a Principled Evaluation of Knowledge Editors
by: Pohl, Sebastian, et al.
Published: (2025)
by: Pohl, Sebastian, et al.
Published: (2025)
Are complicated loss functions necessary for teaching LLMs to reason?
by: Carrino, Gabriele, et al.
Published: (2026)
by: Carrino, Gabriele, et al.
Published: (2026)
Annotation alignment: Comparing LLM and human annotations of conversational safety
by: Movva, Rajiv, et al.
Published: (2024)
by: Movva, Rajiv, et al.
Published: (2024)
Coconstructions in spoken data: UD annotation guidelines and first results
by: Pannitto, Ludovica, et al.
Published: (2026)
by: Pannitto, Ludovica, et al.
Published: (2026)
A framework for annotating and modelling intentions behind metaphor use
by: Michelli, Gianluca, et al.
Published: (2024)
by: Michelli, Gianluca, et al.
Published: (2024)
Are generative AI text annotations systematically biased?
by: Stolwijk, Sjoerd B., et al.
Published: (2025)
by: Stolwijk, Sjoerd B., et al.
Published: (2025)
Can sparse autoencoders be used to decompose and interpret steering vectors?
by: Mayne, Harry, et al.
Published: (2024)
by: Mayne, Harry, et al.
Published: (2024)
Shades of Zero: Distinguishing Impossibility from Inconceivability
by: Hu, Jennifer, et al.
Published: (2025)
by: Hu, Jennifer, et al.
Published: (2025)
How much speech data is necessary for ASR in African languages? An evaluation of data scaling in Kinyarwanda and Kikuyu
by: Akera, Benjamin, et al.
Published: (2025)
by: Akera, Benjamin, et al.
Published: (2025)
LM-PUB-QUIZ: A Comprehensive Framework for Zero-Shot Evaluation of Relational Knowledge in Language Models
by: Ploner, Max, et al.
Published: (2024)
by: Ploner, Max, et al.
Published: (2024)
PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization
by: Tao, Meiling, et al.
Published: (2025)
by: Tao, Meiling, et al.
Published: (2025)
A multitask learning framework for leveraging subjectivity of annotators to identify misogyny
by: Angel, Jason, et al.
Published: (2024)
by: Angel, Jason, et al.
Published: (2024)
Exploring transfer learning for Deep NLP systems on rarely annotated languages
by: Yadav, Dipendra, et al.
Published: (2024)
by: Yadav, Dipendra, et al.
Published: (2024)
A review of annotation classification tools in the educational domain
by: Gayoso-Cabada, Joaquín, et al.
Published: (2025)
by: Gayoso-Cabada, Joaquín, et al.
Published: (2025)
Scalable multilingual PII annotation for responsible AI in LLMs
by: Meena, Bharti, et al.
Published: (2025)
by: Meena, Bharti, et al.
Published: (2025)
Large language models struggle with ethnographic text annotation
by: Goodall, Leonardo S., et al.
Published: (2026)
by: Goodall, Leonardo S., et al.
Published: (2026)
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
by: Murthy, Sonia K., et al.
Published: (2024)
by: Murthy, Sonia K., et al.
Published: (2024)
ELITE: Embedding-Less retrieval with Iterative Text Exploration
by: Wang, Zhangyu, et al.
Published: (2025)
by: Wang, Zhangyu, et al.
Published: (2025)
Transformer verbatim in-context retrieval across time and scale
by: Armeni, Kristijan, et al.
Published: (2024)
by: Armeni, Kristijan, et al.
Published: (2024)
Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation
by: Vente, Blake
Published: (2024)
by: Vente, Blake
Published: (2024)
The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System
by: Hussain, Zafar, et al.
Published: (2026)
by: Hussain, Zafar, et al.
Published: (2026)
EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection
by: Schuhmann, Christoph, et al.
Published: (2025)
by: Schuhmann, Christoph, et al.
Published: (2025)
ParaRev: Building a dataset for Scientific Paragraph Revision annotated with revision instruction
by: Jourdan, Léane, et al.
Published: (2025)
by: Jourdan, Léane, et al.
Published: (2025)
Constraining constructions with WordNet: pros and cons for the semantic annotation of fillers in the Italian Constructicon
by: Pisciotta, Flavio, et al.
Published: (2025)
by: Pisciotta, Flavio, et al.
Published: (2025)
Understanding the effects of word-level linguistic annotations in under-resourced neural machine translation
by: Sánchez-Cartagena, Víctor M., et al.
Published: (2024)
by: Sánchez-Cartagena, Víctor M., et al.
Published: (2024)
"You are an expert annotator": Automatic Best-Worst-Scaling Annotations for Emotion Intensity Modeling
by: Bagdon, Christopher, et al.
Published: (2024)
by: Bagdon, Christopher, et al.
Published: (2024)
Large corpora and large language models: a replicable method for automating grammatical annotation
by: Morin, Cameron, et al.
Published: (2024)
by: Morin, Cameron, et al.
Published: (2024)
Counting on Consensus: Selecting the Right Inter-annotator Agreement Metric for NLP Annotation and Evaluation
by: James, Joseph
Published: (2026)
by: James, Joseph
Published: (2026)
Parser agreement and disagreement in L2 Korean UD: Implications for human-in-the-loop annotation
by: Sung, Hakyung, et al.
Published: (2026)
by: Sung, Hakyung, et al.
Published: (2026)
Researchers waste 80% of LLM annotation costs by classifying one text at a time
by: Pipal, Christian, et al.
Published: (2026)
by: Pipal, Christian, et al.
Published: (2026)
Similar Items
-
LLM-as-a-Judge is Bad, Based on AI Attempting the Exam Qualifying for the Member of the Polish National Board of Appeal
by: Karp, Michał, et al.
Published: (2025) -
Model-Aware Tokenizer Transfer
by: Haltiuk, Mykola, et al.
Published: (2025) -
Targum -- A Multilingual New Testament Translation Corpus
by: Rapacz, Maciej, et al.
Published: (2026) -
eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey
by: Nowak, Krzysztof, et al.
Published: (2024) -
Cognitive models can reveal interpretable value trade-offs in language models
by: Murthy, Sonia K., et al.
Published: (2025)