Saved in:
| Main Authors: | Oliva, Maria Paz, Correia, Adriana, Vankov, Ivan, Botev, Viktor |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.13816 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ConSens: Assessing context grounding in open-book question answering
by: Vankov, Ivan, et al.
Published: (2025)
by: Vankov, Ivan, et al.
Published: (2025)
The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported
by: Shostack, Adam
Published: (2024)
by: Shostack, Adam
Published: (2024)
Evaluating Embedding Frameworks for Scientific Domain
by: Ahmed, Nouman, et al.
Published: (2025)
by: Ahmed, Nouman, et al.
Published: (2025)
A word association network methodology for evaluating implicit biases in LLMs compared to humans
by: Abramski, Katherine, et al.
Published: (2025)
by: Abramski, Katherine, et al.
Published: (2025)
Automated alignment is harder than you think
by: Bowkis, Aleksandr, et al.
Published: (2026)
by: Bowkis, Aleksandr, et al.
Published: (2026)
"They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs
by: Costa, Mariana Lins
Published: (2025)
by: Costa, Mariana Lins
Published: (2025)
Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains
by: Malin, Ben, et al.
Published: (2025)
by: Malin, Ben, et al.
Published: (2025)
LLMs as annotators of credibility assessment in Danish asylum decisions: evaluating classification performance and errors beyond aggregated metrics
by: Humblot-Renaux, Galadrielle, et al.
Published: (2026)
by: Humblot-Renaux, Galadrielle, et al.
Published: (2026)
How good is my story? Towards quantitative metrics for evaluating LLM-generated XAI narratives
by: Ichmoukhamedov, Timour, et al.
Published: (2024)
by: Ichmoukhamedov, Timour, et al.
Published: (2024)
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics exam
by: de Winter, Joost, et al.
Published: (2024)
by: de Winter, Joost, et al.
Published: (2024)
LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering
by: Zhang, Ran, et al.
Published: (2025)
by: Zhang, Ran, et al.
Published: (2025)
Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy
by: Pudasaini, Shushanta, et al.
Published: (2026)
by: Pudasaini, Shushanta, et al.
Published: (2026)
Investigating the structure of emotions by analyzing similarity and association of emotion words
by: Iwaki, Fumitaka, et al.
Published: (2026)
by: Iwaki, Fumitaka, et al.
Published: (2026)
LongTail-Swap: benchmarking language models' abilities on rare words
by: Algayres, Robin, et al.
Published: (2025)
by: Algayres, Robin, et al.
Published: (2025)
Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans
by: Reviriego, Pedro, et al.
Published: (2023)
by: Reviriego, Pedro, et al.
Published: (2023)
Can large language models understand uncommon meanings of common words?
by: Wu, Jinyang, et al.
Published: (2024)
by: Wu, Jinyang, et al.
Published: (2024)
The Good, The Bad, and Why: Unveiling Emotions in Generative AI
by: Li, Cheng, et al.
Published: (2023)
by: Li, Cheng, et al.
Published: (2023)
Chain-of-Description: What I can understand, I can put into words
by: Guo, Jiaxin, et al.
Published: (2025)
by: Guo, Jiaxin, et al.
Published: (2025)
Scaling few-shot spoken word classification with generative meta-continual learning
by: Beyers, Louise, et al.
Published: (2026)
by: Beyers, Louise, et al.
Published: (2026)
How word semantics and phonology affect handwriting of Alzheimer's patients: a machine learning based analysis
by: Cilia, Nicole Dalia, et al.
Published: (2023)
by: Cilia, Nicole Dalia, et al.
Published: (2023)
Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics
by: Kocbek, Primoz, et al.
Published: (2025)
by: Kocbek, Primoz, et al.
Published: (2025)
You shall know a piece by the company it keeps. Chess plays as a data for word2vec models
by: Orekhov, Boris
Published: (2024)
by: Orekhov, Boris
Published: (2024)
Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
by: Bannò, Stefano, et al.
Published: (2025)
by: Bannò, Stefano, et al.
Published: (2025)
Why They Disagree: Decoding Differences in Opinions about AI Risk on the Lex Fridman Podcast
by: Truong, Nghi, et al.
Published: (2025)
by: Truong, Nghi, et al.
Published: (2025)
Does language matter for spoken word classification? A multilingual generative meta-learning approach
by: Ziki, Batsirayi Mupamhi, et al.
Published: (2026)
by: Ziki, Batsirayi Mupamhi, et al.
Published: (2026)
MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch
by: Saeed, Nadia
Published: (2024)
by: Saeed, Nadia
Published: (2024)
Choices Speak Louder than Questions
by: Cho, Gyeongje, et al.
Published: (2025)
by: Cho, Gyeongje, et al.
Published: (2025)
Why are LLMs' abilities emergent?
by: Havlík, Vladimír
Published: (2025)
by: Havlík, Vladimír
Published: (2025)
Towards a resource for multilingual lexicons: an MT assisted and human-in-the-loop multilingual parallel corpus with multi-word expression annotation
by: Han, Lifeng, et al.
Published: (2020)
by: Han, Lifeng, et al.
Published: (2020)
Evolutionary ecology of words
by: Suzuki, Reiji, et al.
Published: (2025)
by: Suzuki, Reiji, et al.
Published: (2025)
From communities to interpretable network and word embedding: an unified approach
by: Prouteau, Thibault, et al.
Published: (2024)
by: Prouteau, Thibault, et al.
Published: (2024)
Why Slop Matters
by: Kommers, Cody, et al.
Published: (2025)
by: Kommers, Cody, et al.
Published: (2025)
Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning
by: Chadimová, Milena, et al.
Published: (2024)
by: Chadimová, Milena, et al.
Published: (2024)
Why Attend to Everything? Focus is the Key
by: Yao, Hengshuai, et al.
Published: (2026)
by: Yao, Hengshuai, et al.
Published: (2026)
Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary Generation
by: Tiwari, Abhisek, et al.
Published: (2024)
by: Tiwari, Abhisek, et al.
Published: (2024)
From melodic note sequences to pitches using word2vec
by: Defays, Daniel
Published: (2024)
by: Defays, Daniel
Published: (2024)
Impact of enriched meaning representations for language generation in dialogue tasks: A comprehensive exploration of the relevance of tasks, corpora and metrics
by: Vázquez, Alain, et al.
Published: (2026)
by: Vázquez, Alain, et al.
Published: (2026)
LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
by: Cruz, Jan Christian Blaise, et al.
Published: (2026)
by: Cruz, Jan Christian Blaise, et al.
Published: (2026)
Re-evaluating Theory of Mind evaluation in large language models
by: Hu, Jennifer, et al.
Published: (2025)
by: Hu, Jennifer, et al.
Published: (2025)
Low-resource Machine Translation: what for? who for? An observational study on a dedicated Tetun language translation service
by: Merx, Raphael, et al.
Published: (2024)
by: Merx, Raphael, et al.
Published: (2024)
Similar Items
-
ConSens: Assessing context grounding in open-book question answering
by: Vankov, Ivan, et al.
Published: (2025) -
The Boy Who Survived: Removing Harry Potter from an LLM is harder than reported
by: Shostack, Adam
Published: (2024) -
Evaluating Embedding Frameworks for Scientific Domain
by: Ahmed, Nouman, et al.
Published: (2025) -
A word association network methodology for evaluating implicit biases in LLMs compared to humans
by: Abramski, Katherine, et al.
Published: (2025) -
Automated alignment is harder than you think
by: Bowkis, Aleksandr, et al.
Published: (2026)