:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Oliva, Maria Paz, Correia, Adriana, Vankov, Ivan, Botev, Viktor
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2508.13816
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ConSens: Assessing context grounding in open-book question answering
by: Vankov, Ivan, et al.
Published: (2025)

The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported
by: Shostack, Adam
Published: (2024)

Evaluating Embedding Frameworks for Scientific Domain
by: Ahmed, Nouman, et al.
Published: (2025)

A word association network methodology for evaluating implicit biases in LLMs compared to humans
by: Abramski, Katherine, et al.
Published: (2025)

Automated alignment is harder than you think
by: Bowkis, Aleksandr, et al.
Published: (2026)

"They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs
by: Costa, Mariana Lins
Published: (2025)

Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains
by: Malin, Ben, et al.
Published: (2025)

LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics
by: Humblot-Renaux, Galadrielle, et al.
Published: (2026)

How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives
by: Ichmoukhamedov, Timour, et al.
Published: (2024)

System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics exam
by: de Winter, Joost, et al.
Published: (2024)

LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering
by: Zhang, Ran, et al.
Published: (2025)

Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
by: Pudasaini, Shushanta, et al.
Published: (2026)

Investigating the structure of emotions by analyzing similarity and association of emotion words
by: Iwaki, Fumitaka, et al.
Published: (2026)

LongTail-Swap: benchmarking language models' abilities on rare words
by: Algayres, Robin, et al.
Published: (2025)

Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)

Can large language models understand uncommon meanings of common words?
by: Wu, Jinyang, et al.
Published: (2024)

The Good, The Bad, and Why: Unveiling Emotions in Generative AI
by: Li, Cheng, et al.
Published: (2023)

Chain-of-Description: What I can understand, I can put into words
by: Guo, Jiaxin, et al.
Published: (2025)

Scaling few-shot spoken word classification with generative meta-continual learning
by: Beyers, Louise, et al.
Published: (2026)

How word semantics and phonology affect handwriting of Alzheimer's patients: a machine learning based analysis
by: Cilia, Nicole Dalia, et al.
Published: (2023)

Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics
by: Kocbek, Primoz, et al.
Published: (2025)

You shall know a piece by the company it keeps. Chess plays as a data for word2vec models
by: Orekhov, Boris
Published: (2024)

Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
by: Bannò, Stefano, et al.
Published: (2025)

Why They Disagree: Decoding Differences in Opinions about AI Risk on the Lex Fridman Podcast
by: Truong, Nghi, et al.
Published: (2025)

Does language matter for spoken word classification? A multilingual generative meta-learning approach
by: Ziki, Batsirayi Mupamhi, et al.
Published: (2026)

MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch
by: Saeed, Nadia
Published: (2024)

Choices Speak Louder than Questions
by: Cho, Gyeongje, et al.
Published: (2025)

Why are LLMs' abilities emergent?
by: Havlík, Vladimír
Published: (2025)

Towards a resource for multilingual lexicons: an MT assisted and human-in-the-loop multilingual parallel corpus with multi-word expression annotation
by: Han, Lifeng, et al.
Published: (2020)

Evolutionary ecology of words
by: Suzuki, Reiji, et al.
Published: (2025)

From communities to interpretable network and word embedding: an unified approach
by: Prouteau, Thibault, et al.
Published: (2024)

Why Slop Matters
by: Kommers, Cody, et al.
Published: (2025)

Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning
by: Chadimová, Milena, et al.
Published: (2024)

Why Attend to Everything? Focus is the Key
by: Yao, Hengshuai, et al.
Published: (2026)

Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary Generation
by: Tiwari, Abhisek, et al.
Published: (2024)

From melodic note sequences to pitches using word2vec
by: Defays, Daniel
Published: (2024)

Impact of enriched meaning representations for language generation in dialogue tasks: A comprehensive exploration of the relevance of tasks, corpora and metrics
by: Vázquez, Alain, et al.
Published: (2026)

LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
by: Cruz, Jan Christian Blaise, et al.
Published: (2026)

Re-evaluating Theory of Mind evaluation in large language models
by: Hu, Jennifer, et al.
Published: (2025)

Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service
by: Merx, Raphael, et al.
Published: (2024)