:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mousavi, Seyed Mahed, Cecchinato, Edoardo, Hornikova, Lucia, Riccardi, Giuseppe
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2506.23864
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

What Does Loss Optimization Actually Teach, If Anything? Knowledge Dynamics in Continual Pre-training of LLMs
by: Mousavi, Seyed Mahed, et al.
Published: (2026)

LLMs as Repositories of Factual Knowledge: Limitations and Solutions
by: Mousavi, Seyed Mahed, et al.
Published: (2025)

DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
by: Mousavi, Seyed Mahed, et al.
Published: (2024)

[De|Re]constructing VLMs' Reasoning in Counting
by: Alghisi, Simone, et al.
Published: (2025)

Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
by: Alghisi, Simone, et al.
Published: (2024)

Are LLMs Robust for Spoken Dialogues?
by: Mousavi, Seyed Mahed, et al.
Published: (2024)

CIVET: Systematic Evaluation of Understanding in VLMs
by: Rizzoli, Massimo, et al.
Published: (2025)

Getting to the Point: Pointing Improves LVLMs at Counting
by: Alghisi, Simone, et al.
Published: (2026)

V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge in Vision Language Models
by: Mousavi, Seyed Mahed, et al.
Published: (2026)

MATEO: A Multimodal Benchmark for Temporal Reasoning and Planning in LVLMs
by: Roccabruna, Gabriel, et al.
Published: (2026)

What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)

The Reasoning Error About Reasoning: Why Different Types of Reasoning Require Different Representational Structures
by: Wu, Yiling
Published: (2026)

Virtual Garbage Collector (VGC): A Zone-Based Garbage Collection Architecture for Python's Parallel Runtime
by: M, Abdulla
Published: (2025)

What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization
by: Zhou, Weixiao, et al.
Published: (2025)

LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
by: Davoodi, Arash Gholami, et al.
Published: (2024)

What Do Speech Foundation Models Not Learn About Speech?
by: Waheed, Abdul, et al.
Published: (2024)

Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification?
by: Roccabruna, Gabriel, et al.
Published: (2024)

What Do Self-Supervised Speech Models Know About Words?
by: Pasad, Ankita, et al.
Published: (2023)

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
by: Kang, Deokhyung, et al.
Published: (2025)

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)

What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
by: Noels, Sander, et al.
Published: (2025)

PARSE: An Open-Domain Reasoning Question Answering Benchmark for Persian
by: Mozafari, Jamshid, et al.
Published: (2026)

What Do AI Agents Talk About? Discourse and Architectural Constraints in the First AI-Only Social Network
by: Dube, Taksch, et al.
Published: (2026)

Thinking Out Loud: Do Reasoning Models Know When They're Right?
by: Zeng, Qingcheng, et al.
Published: (2025)

Classifying Unreliable Narrators with Large Language Models
by: Brei, Anneliese, et al.
Published: (2025)

Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning
by: Sok, Jaewon, et al.
Published: (2026)

What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets
by: Stranisci, Marco Antonio, et al.
Published: (2025)

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model
by: Rahmanian, Mojdeh, et al.
Published: (2024)

HearSay Benchmark: Do Audio LLMs Leak What They Hear?
by: Wang, Jin, et al.
Published: (2026)

BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models
by: Li, Chuyuan, et al.
Published: (2025)

What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks
by: Chizhov, Pavel, et al.
Published: (2025)

A Primer in Post-Training Reasoning Data: What We Know About How It Works
by: Li, Yaoming, et al.
Published: (2026)

Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning
by: Damirchi, Hamed, et al.
Published: (2026)

Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth
by: Davoudi, Seyed Pouyan Mousavi, et al.
Published: (2025)

Do Reasoning LLMs Refuse What They Infer in Long Contexts?
by: Fu, Yu, et al.
Published: (2026)

What Do LLMs Know About Alzheimer's Disease? Multi-loss Fine-Tuning and Probing for AD Detection
by: Jiang, Lei, et al.
Published: (2026)

One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise
by: Afzali, Amirabbas, et al.
Published: (2025)

Reasoning Models Will Sometimes Lie About Their Reasoning
by: Walden, William, et al.
Published: (2026)

Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation
by: Li, Zhuohang, et al.
Published: (2024)

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
by: Mundada, Gagan, et al.
Published: (2025)