:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Khraishi, Raad, Zafar, Iman, Myles, Katie, Cowan, Greig A
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2603.03111
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Evaluating the Sensitivity of LLMs to Prior Context
by: Hankache, Robert, et al.
Published: (2025)

Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks
by: Pelosio, Giulio, et al.
Published: (2025)

Drift No More? Context Equilibria in Multi-Turn LLM Interactions
by: Dongre, Vardhan, et al.
Published: (2025)

Helping Customers in Distress: An LLM-powered Agent that Converses, Probes, and Routes
by: Atreya, Alankar, et al.
Published: (2026)

The Slow Drift of Support: Boundary Failures in Multi-Turn Mental Health LLM Dialogues
by: Cheng, Youyou, et al.
Published: (2026)

Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple LLM Judges
by: Tang, Yuqi, et al.
Published: (2025)

How Personality Traits Shape LLM Risk-Taking Behaviour
by: Hartley, John, et al.
Published: (2025)

Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
by: Kawada, Sebastien
Published: (2026)

Evaluating Temporal Consistency in Multi-Turn Language Models
by: Atri, Yash Kumar, et al.
Published: (2026)

Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction
by: Bao, Han, et al.
Published: (2026)

TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
by: Zhang, Yiran, et al.
Published: (2025)

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
by: Hu, Yuanzhe, et al.
Published: (2025)

Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey
by: Guan, Shengyue, et al.
Published: (2025)

PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
by: Farhansyah, Mohammad Rifqi, et al.
Published: (2026)

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence
by: Harshavardhan
Published: (2026)

Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs
by: Sinha, Aditya, et al.
Published: (2026)

A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
by: Pan, Ruihao, et al.
Published: (2026)

Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users
by: Ozolcer, Melik, et al.
Published: (2025)

Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?
by: Rajput, Prateek, et al.
Published: (2026)

Adaptive Stopping for Multi-Turn LLM Reasoning
by: Zhou, Xiaofan, et al.
Published: (2026)

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
by: Kruthof, Garvin
Published: (2026)

Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA
by: Zhang, Yiran, et al.
Published: (2025)

Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
by: Ramnath, Sahana, et al.
Published: (2025)

MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
by: Kwan, Wai-Chung, et al.
Published: (2024)

Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text
by: Mohamed, Amr, et al.
Published: (2025)

LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
by: Li, Haoyang, et al.
Published: (2025)

MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
by: Katsis, Yannis, et al.
Published: (2025)

MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings
by: Zhang, Yiqun, et al.
Published: (2026)

Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation
by: Luo, Jiani, et al.
Published: (2026)

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
by: Juneja, Prerna, et al.
Published: (2026)

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
by: Badola, Kartikeya, et al.
Published: (2025)

MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation
by: Li, Xiaoyuan, et al.
Published: (2025)

Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
by: Gosai, Advait, et al.
Published: (2025)

Token Statistics Reveal Conversational Drift in Multi-turn LLM Interaction
by: Hafez, Wael, et al.
Published: (2026)

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
by: Li, Jiaqian, et al.
Published: (2026)

Examining Identity Drift in Conversations of LLM Agents
by: Choi, Junhyuk, et al.
Published: (2024)

RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator
by: Tang, Zhenwei, et al.
Published: (2026)

CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
by: Yu, Erxin, et al.
Published: (2024)

Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production
by: Liu, Junhua, et al.
Published: (2024)