:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Coronado-Blázquez, Javier
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.19965
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
by: Awad, Samer, et al.
Published: (2026)

Assessing the Performance of Human-Capable LLMs -- Are LLMs Coming for Your Job?
by: Mavi, John, et al.
Published: (2024)

Exploring the psychology of LLMs' Moral and Legal Reasoning
by: Almeida, Guilherme F. C. F., et al.
Published: (2023)

Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach
by: Coronado-Blázquez, Javier
Published: (2025)

From Rogue to Safe AI: The Role of Explicit Refusals in Aligning LLMs with International Humanitarian Law
by: Mavi, John, et al.
Published: (2025)

A NLP Approach to "Review Bombing" in Metacritic PC Videogames User Ratings
by: Coronado-Blázquez, Javier
Published: (2024)

Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation
by: Berberette, Elijah, et al.
Published: (2024)

PsyMem: Fine-grained psychological alignment and Explicit Memory Control for Advanced Role-Playing LLMs
by: Cheng, Xilong, et al.
Published: (2025)

ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs
by: Muhamad, Wicaksono Leksono, et al.
Published: (2026)

Are LLMs effective psychological assessors? Leveraging adaptive RAG for interpretable mental health screening through psychometric practice
by: Ravenda, Federico, et al.
Published: (2025)

A Geometric Taxonomy of Hallucinations in LLMs
by: Marín, Javier
Published: (2026)

Empirical Characterization of Temporal Constraint Processing in LLMs
by: Marín, Javier
Published: (2025)

Unifying Ontology Construction and Semantic Alignment for Deterministic Enterprise Reasoning at Scale
by: Zhu, Hongyin
Published: (2026)

<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs
by: Pletenev, Sergey, et al.
Published: (2025)

Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection
by: Miralles-González, Pablo, et al.
Published: (2025)

An evaluation of LLMs for generating movie reviews: GPT-4o, Gemini-2.0 and DeepSeek-V3
by: Sands, Brendan, et al.
Published: (2025)

Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection
by: Piot, Paloma, et al.
Published: (2025)

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans
by: Conde, Javier, et al.
Published: (2025)

Automated test generation to evaluate tool-augmented LLMs as conversational AI agents
by: Arcadinho, Samuel, et al.
Published: (2024)

What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
by: Maar, Jim, et al.
Published: (2026)

Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident, Especially When They are Wrong
by: Fu, Tairan, et al.
Published: (2025)

Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
by: Tiwari, Aman, et al.
Published: (2024)

Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content
by: Mushtaq, Abdullah, et al.
Published: (2025)

CogBench: a large language model walks into a psychology lab
by: Coda-Forno, Julian, et al.
Published: (2024)

Non-Determinism of "Deterministic" LLM Settings
by: Atil, Berk, et al.
Published: (2024)

Retrieval-augmented generation in multilingual settings
by: Chirkova, Nadezhda, et al.
Published: (2024)

Sentiment analysis and random forest to classify LLM versus human source applied to Scientific Texts
by: Sanchez-Medina, Javier J.
Published: (2024)

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
by: Tie, Guiyao, et al.
Published: (2025)

The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs
by: Bui, Anh Thu Maria, et al.
Published: (2024)

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation
by: Alam, Firoj, et al.
Published: (2026)

A validity-guided workflow for robust large language model research in psychology
by: Lin, Zhicheng
Published: (2025)

Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs
by: Papi, Sara, et al.
Published: (2025)

Ranking LLMs by compression
by: Guo, Peijia, et al.
Published: (2024)

Densing Law of LLMs
by: Xiao, Chaojun, et al.
Published: (2024)

The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth
by: Lissak, Shir, et al.
Published: (2024)

Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
by: Kwon, Deuksin, et al.
Published: (2024)

Benchmark of stylistic variation in LLM-generated texts
by: Milička, Jiří, et al.
Published: (2025)

Are generative AI text annotations systematically biased?
by: Stolwijk, Sjoerd B., et al.
Published: (2025)

Gender Bias in LLM-generated Interview Responses
by: Kong, Haein, et al.
Published: (2024)

Raply: A profanity-mitigated rap generator
by: Bendali, Omar Manil, et al.
Published: (2024)