Saved in:
| Main Authors: | Hakimov, Sherzod, Bernard, Roland, Leiber, Tim, Osswald, Karl, Richert, Kristina, Yang, Ruilin, Bernardi, Raffaella, Schlangen, David |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.08098 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely
by: Kranti, Chalamalasetti, et al.
Published: (2026)
by: Kranti, Chalamalasetti, et al.
Published: (2026)
From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning
by: Kranti, Chalamalasetti, et al.
Published: (2025)
by: Kranti, Chalamalasetti, et al.
Published: (2025)
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
by: Hakimov, Sherzod, et al.
Published: (2025)
by: Hakimov, Sherzod, et al.
Published: (2025)
Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies
by: Sadler, Philipp, et al.
Published: (2024)
by: Sadler, Philipp, et al.
Published: (2024)
Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming
by: Kranti, Chalamalasetti, et al.
Published: (2024)
by: Kranti, Chalamalasetti, et al.
Published: (2024)
Plant in Cupboard, Orange on Rably, Inat Aphone. Benchmarking Incremental Learning of Situation and Language Model using a Text-Simulated Situated Environment
by: Jordan, Jonathan, et al.
Published: (2025)
by: Jordan, Jonathan, et al.
Published: (2025)
Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft
by: Kranti, Chalamalasetti, et al.
Published: (2024)
by: Kranti, Chalamalasetti, et al.
Published: (2024)
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
by: Kranti, Chalamalasetti, et al.
Published: (2025)
by: Kranti, Chalamalasetti, et al.
Published: (2025)
Learning Communication Policies for Different Follower Behaviors in a Collaborative Reference Game
by: Sadler, Philipp, et al.
Published: (2024)
by: Sadler, Philipp, et al.
Published: (2024)
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
by: Bhavsar, Nidhir, et al.
Published: (2024)
by: Bhavsar, Nidhir, et al.
Published: (2024)
TurkicNLP: An NLP Toolkit for Turkic Languages
by: Hakimov, Sherzod
Published: (2026)
by: Hakimov, Sherzod
Published: (2026)
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
by: Beyer, Anne, et al.
Published: (2024)
by: Beyer, Anne, et al.
Published: (2024)
The Image Reconstruction Game: Drawing Common Ground Through Iterative Multimodal Dialogue
by: Hakimov, Sherzod, et al.
Published: (2026)
by: Hakimov, Sherzod, et al.
Published: (2026)
Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict
by: Hakimov, Sherzod, et al.
Published: (2023)
by: Hakimov, Sherzod, et al.
Published: (2023)
M2SA: Multimodal and Multilingual Model for Sentiment Analysis of Tweets
by: Thakkar, Gaurish, et al.
Published: (2024)
by: Thakkar, Gaurish, et al.
Published: (2024)
A Third Paradigm for LLM Evaluation: Dialogue Game-Based Evaluation using clembench
by: Schlangen, David, et al.
Published: (2025)
by: Schlangen, David, et al.
Published: (2025)
Using Game Play to Investigate Multimodal and Conversational Grounding in Large Multimodal Models
by: Hakimov, Sherzod, et al.
Published: (2024)
by: Hakimov, Sherzod, et al.
Published: (2024)
A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences
by: Bertolazzi, Leonardo, et al.
Published: (2024)
by: Bertolazzi, Leonardo, et al.
Published: (2024)
Free-text Rationale Generation under Readability Level Control
by: Hsu, Yi-Sheng, et al.
Published: (2024)
by: Hsu, Yi-Sheng, et al.
Published: (2024)
Playpen: An Environment for Exploring Learning Through Conversational Interaction
by: Horst, Nicola, et al.
Published: (2025)
by: Horst, Nicola, et al.
Published: (2025)
What Are We Measuring in NLG? A Meta-Analysis of Evaluation Trends 2020-2025
by: Yang, Jing, et al.
Published: (2026)
by: Yang, Jing, et al.
Published: (2026)
How Language Models Conflate Logical Validity with Plausibility: A Representational Analysis of Content Effects
by: Bertolazzi, Leonardo, et al.
Published: (2025)
by: Bertolazzi, Leonardo, et al.
Published: (2025)
Strategic Responses to Personalized Pricing and Demand for Privacy: An Experiment
by: Bó, Inácio, et al.
Published: (2023)
by: Bó, Inácio, et al.
Published: (2023)
Representations of Fact, Fiction and Forecast in Large Language Models: Epistemics and Attitudes
by: Li, Meng, et al.
Published: (2025)
by: Li, Meng, et al.
Published: (2025)
The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It
by: Bertolazzi, Leonardo, et al.
Published: (2025)
by: Bertolazzi, Leonardo, et al.
Published: (2025)
Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests
by: Momentè, Filippo, et al.
Published: (2025)
by: Momentè, Filippo, et al.
Published: (2025)
Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests
by: Madureira, Brielen, et al.
Published: (2024)
by: Madureira, Brielen, et al.
Published: (2024)
LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation
by: Schlangen, David
Published: (2024)
by: Schlangen, David
Published: (2024)
SoT: Structured-of-Thought Prompting Guides Multilingual Reasoning in Large Language Models
by: Qi, Rui, et al.
Published: (2025)
by: Qi, Rui, et al.
Published: (2025)
The Price of a Second Thought: On the Evaluation of Reasoning Efficiency in Large Language Models
by: Fan, Siqi, et al.
Published: (2025)
by: Fan, Siqi, et al.
Published: (2025)
Diastereoselective Synthesis of Pyridone ribo‐C‐Nucleosides via Heck Reaction and Oxidation
by: Tim Gniech, et al.
Published: (2024)
by: Tim Gniech, et al.
Published: (2024)
Front Cover: Diastereoselective Synthesis of Pyridone ribo‐C‐Nucleosides via Heck Reaction and Oxidation (Eur. J. Org. Chem. 28/2024)
by: Tim Gniech, et al.
Published: (2024)
by: Tim Gniech, et al.
Published: (2024)
Medicina e arte uma ressonância
by: Walter Osswald
Published: (2002)
by: Walter Osswald
Published: (2002)
Aspectos de autoridad y poder en las ceremonias de canonización de Ignacio de Loyola y Francisco Javier en Portugal
by: Cristina Osswald
Published: (2013)
by: Cristina Osswald
Published: (2013)
On Christian Martyrdom in Japan (1597-1658)
by: Cristina Osswald
Published: (2021)
by: Cristina Osswald
Published: (2021)
EL OLVIDO EN LA FENOMENOLOGÍA DE HUSSERL. DOS FENÓMENOS LÍMITE
by: Andrés Osswald
Published: (2017)
by: Andrés Osswald
Published: (2017)
A perceção dos jesuítas no mundo português: entre o trato de e o gosto por orientalia (sécs. XVI-XVII)
by: Cristina Osswald
Published: (2018)
by: Cristina Osswald
Published: (2018)
Sete notas sobre a cura pelo nada
by: Walter Osswald
Published: (2003)
by: Walter Osswald
Published: (2003)
Las narrativas de "las pioneras". Cuestiones de género y moralidades en el desarrollo de la danza moderna en la Argentina (1940-1960)
by: Denise Osswald
Published: (2010)
by: Denise Osswald
Published: (2010)
SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning Tasks
by: Wan, Wentao, et al.
Published: (2025)
by: Wan, Wentao, et al.
Published: (2025)
Similar Items
-
Multi-Turn Multi-Agent Dialogue for Collaborative Reconstruction Improves VLM Performance on Spatial Reasoning, But Only Barely
by: Kranti, Chalamalasetti, et al.
Published: (2026) -
From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning
by: Kranti, Chalamalasetti, et al.
Published: (2025) -
Ad-hoc Concept Forming in the Game Codenames as a Means for Evaluating Large Language Models
by: Hakimov, Sherzod, et al.
Published: (2025) -
Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies
by: Sadler, Philipp, et al.
Published: (2024) -
Towards No-Code Programming of Cobots: Experiments with Code Synthesis by Large Code Models for Conversational Programming
by: Kranti, Chalamalasetti, et al.
Published: (2024)