Saved in:
| Main Authors: | Mousavi, Seyed Mahed, Cecchinato, Edoardo, Hornikova, Lucia, Riccardi, Giuseppe |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.23864 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
What Does Loss Optimization Actually Teach, If Anything? Knowledge Dynamics in Continual Pre-training of LLMs
by: Mousavi, Seyed Mahed, et al.
Published: (2026)
by: Mousavi, Seyed Mahed, et al.
Published: (2026)
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
by: Mousavi, Seyed Mahed, et al.
Published: (2025)
by: Mousavi, Seyed Mahed, et al.
Published: (2025)
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
by: Mousavi, Seyed Mahed, et al.
Published: (2024)
by: Mousavi, Seyed Mahed, et al.
Published: (2024)
[De|Re]constructing VLMs' Reasoning in Counting
by: Alghisi, Simone, et al.
Published: (2025)
by: Alghisi, Simone, et al.
Published: (2025)
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
by: Alghisi, Simone, et al.
Published: (2024)
by: Alghisi, Simone, et al.
Published: (2024)
Are LLMs Robust for Spoken Dialogues?
by: Mousavi, Seyed Mahed, et al.
Published: (2024)
by: Mousavi, Seyed Mahed, et al.
Published: (2024)
CIVET: Systematic Evaluation of Understanding in VLMs
by: Rizzoli, Massimo, et al.
Published: (2025)
by: Rizzoli, Massimo, et al.
Published: (2025)
Getting to the Point: Pointing Improves LVLMs at Counting
by: Alghisi, Simone, et al.
Published: (2026)
by: Alghisi, Simone, et al.
Published: (2026)
V-DyKnow: A Dynamic Benchmark for Time-Sensitive Knowledge in Vision Language Models
by: Mousavi, Seyed Mahed, et al.
Published: (2026)
by: Mousavi, Seyed Mahed, et al.
Published: (2026)
MATEO: A Multimodal Benchmark for Temporal Reasoning and Planning in LVLMs
by: Roccabruna, Gabriel, et al.
Published: (2026)
by: Roccabruna, Gabriel, et al.
Published: (2026)
What Are They Doing? Joint Audio-Speech Co-Reasoning
by: Wang, Yingzhi, et al.
Published: (2024)
by: Wang, Yingzhi, et al.
Published: (2024)
The Reasoning Error About Reasoning: Why Different Types of Reasoning Require Different Representational Structures
by: Wu, Yiling
Published: (2026)
by: Wu, Yiling
Published: (2026)
Virtual Garbage Collector (VGC): A Zone-Based Garbage Collection Architecture for Python's Parallel Runtime
by: M, Abdulla
Published: (2025)
by: M, Abdulla
Published: (2025)
What Are They Talking About? A Benchmark of Knowledge-Grounded Discussion Summarization
by: Zhou, Weixiao, et al.
Published: (2025)
by: Zhou, Weixiao, et al.
Published: (2025)
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs
by: Davoodi, Arash Gholami, et al.
Published: (2024)
by: Davoodi, Arash Gholami, et al.
Published: (2024)
What Do Speech Foundation Models Not Learn About Speech?
by: Waheed, Abdul, et al.
Published: (2024)
by: Waheed, Abdul, et al.
Published: (2024)
Will LLMs Replace the Encoder-Only Models in Temporal Relation Classification?
by: Roccabruna, Gabriel, et al.
Published: (2024)
by: Roccabruna, Gabriel, et al.
Published: (2024)
What Do Self-Supervised Speech Models Know About Words?
by: Pasad, Ankita, et al.
Published: (2023)
by: Pasad, Ankita, et al.
Published: (2023)
Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?
by: Kang, Deokhyung, et al.
Published: (2025)
by: Kang, Deokhyung, et al.
Published: (2025)
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
by: Wu, Mingqi, et al.
Published: (2025)
by: Wu, Mingqi, et al.
Published: (2025)
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
by: Noels, Sander, et al.
Published: (2025)
by: Noels, Sander, et al.
Published: (2025)
PARSE: An Open-Domain Reasoning Question Answering Benchmark for Persian
by: Mozafari, Jamshid, et al.
Published: (2026)
by: Mozafari, Jamshid, et al.
Published: (2026)
What Do AI Agents Talk About? Discourse and Architectural Constraints in the First AI-Only Social Network
by: Dube, Taksch, et al.
Published: (2026)
by: Dube, Taksch, et al.
Published: (2026)
Thinking Out Loud: Do Reasoning Models Know When They're Right?
by: Zeng, Qingcheng, et al.
Published: (2025)
by: Zeng, Qingcheng, et al.
Published: (2025)
Classifying Unreliable Narrators with Large Language Models
by: Brei, Anneliese, et al.
Published: (2025)
by: Brei, Anneliese, et al.
Published: (2025)
Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning
by: Sok, Jaewon, et al.
Published: (2026)
by: Sok, Jaewon, et al.
Published: (2026)
What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets
by: Stranisci, Marco Antonio, et al.
Published: (2025)
by: Stranisci, Marco Antonio, et al.
Published: (2025)
Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model
by: Rahmanian, Mojdeh, et al.
Published: (2024)
by: Rahmanian, Mojdeh, et al.
Published: (2024)
HearSay Benchmark: Do Audio LLMs Leak What They Hear?
by: Wang, Jin, et al.
Published: (2026)
by: Wang, Jin, et al.
Published: (2026)
BeDiscovER: The Benchmark of Discourse Understanding in the Era of Reasoning Language Models
by: Li, Chuyuan, et al.
Published: (2025)
by: Li, Chuyuan, et al.
Published: (2025)
What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks
by: Chizhov, Pavel, et al.
Published: (2025)
by: Chizhov, Pavel, et al.
Published: (2025)
A Primer in Post-Training Reasoning Data: What We Know About How It Works
by: Li, Yaoming, et al.
Published: (2026)
by: Li, Yaoming, et al.
Published: (2026)
Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning
by: Damirchi, Hamed, et al.
Published: (2026)
by: Damirchi, Hamed, et al.
Published: (2026)
Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth
by: Davoudi, Seyed Pouyan Mousavi, et al.
Published: (2025)
by: Davoudi, Seyed Pouyan Mousavi, et al.
Published: (2025)
Do Reasoning LLMs Refuse What They Infer in Long Contexts?
by: Fu, Yu, et al.
Published: (2026)
by: Fu, Yu, et al.
Published: (2026)
What Do LLMs Know About Alzheimer's Disease? Multi-loss Fine-Tuning and Probing for AD Detection
by: Jiang, Lei, et al.
Published: (2026)
by: Jiang, Lei, et al.
Published: (2026)
One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise
by: Afzali, Amirabbas, et al.
Published: (2025)
by: Afzali, Amirabbas, et al.
Published: (2025)
Reasoning Models Will Sometimes Lie About Their Reasoning
by: Walden, William, et al.
Published: (2026)
by: Walden, William, et al.
Published: (2026)
Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation
by: Li, Zhuohang, et al.
Published: (2024)
by: Li, Zhuohang, et al.
Published: (2024)
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
by: Mundada, Gagan, et al.
Published: (2025)
by: Mundada, Gagan, et al.
Published: (2025)
Similar Items
-
What Does Loss Optimization Actually Teach, If Anything? Knowledge Dynamics in Continual Pre-training of LLMs
by: Mousavi, Seyed Mahed, et al.
Published: (2026) -
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
by: Mousavi, Seyed Mahed, et al.
Published: (2025) -
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMs
by: Mousavi, Seyed Mahed, et al.
Published: (2024) -
[De|Re]constructing VLMs' Reasoning in Counting
by: Alghisi, Simone, et al.
Published: (2025) -
Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue
by: Alghisi, Simone, et al.
Published: (2024)