Saved in:
| Main Authors: | Khraishi, Raad, Zafar, Iman, Myles, Katie, Cowan, Greig A |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.03111 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluating the Sensitivity of LLMs to Prior Context
by: Hankache, Robert, et al.
Published: (2025)
by: Hankache, Robert, et al.
Published: (2025)
Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks
by: Pelosio, Giulio, et al.
Published: (2025)
by: Pelosio, Giulio, et al.
Published: (2025)
Drift No More? Context Equilibria in Multi-Turn LLM Interactions
by: Dongre, Vardhan, et al.
Published: (2025)
by: Dongre, Vardhan, et al.
Published: (2025)
Helping Customers in Distress: An LLM-powered Agent that Converses, Probes, and Routes
by: Atreya, Alankar, et al.
Published: (2026)
by: Atreya, Alankar, et al.
Published: (2026)
The Slow Drift of Support: Boundary Failures in Multi-Turn Mental Health LLM Dialogues
by: Cheng, Youyou, et al.
Published: (2026)
by: Cheng, Youyou, et al.
Published: (2026)
Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple LLM Judges
by: Tang, Yuqi, et al.
Published: (2025)
by: Tang, Yuqi, et al.
Published: (2025)
How Personality Traits Shape LLM Risk-Taking Behaviour
by: Hartley, John, et al.
Published: (2025)
by: Hartley, John, et al.
Published: (2025)
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
by: Kawada, Sebastien
Published: (2026)
by: Kawada, Sebastien
Published: (2026)
Evaluating Temporal Consistency in Multi-Turn Language Models
by: Atri, Yash Kumar, et al.
Published: (2026)
by: Atri, Yash Kumar, et al.
Published: (2026)
Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction
by: Bao, Han, et al.
Published: (2026)
by: Bao, Han, et al.
Published: (2026)
TurnBench-MS: A Benchmark for Evaluating Multi-Turn, Multi-Step Reasoning in Large Language Models
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
by: Hu, Yuanzhe, et al.
Published: (2025)
by: Hu, Yuanzhe, et al.
Published: (2025)
Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey
by: Guan, Shengyue, et al.
Published: (2025)
by: Guan, Shengyue, et al.
Published: (2025)
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues
by: Farhansyah, Mohammad Rifqi, et al.
Published: (2026)
by: Farhansyah, Mohammad Rifqi, et al.
Published: (2026)
Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence
by: Harshavardhan
Published: (2026)
by: Harshavardhan
Published: (2026)
Beyond Continuity: Challenges of Context Switching in Multi-Turn Dialogue with LLMs
by: Sinha, Aditya, et al.
Published: (2026)
by: Sinha, Aditya, et al.
Published: (2026)
A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction
by: Pan, Ruihao, et al.
Published: (2026)
by: Pan, Ruihao, et al.
Published: (2026)
Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users
by: Ozolcer, Melik, et al.
Published: (2025)
by: Ozolcer, Melik, et al.
Published: (2025)
Evaluation Drift in LLM Personality Induction: Are We Moving the Goalpost?
by: Rajput, Prateek, et al.
Published: (2026)
by: Rajput, Prateek, et al.
Published: (2026)
Adaptive Stopping for Multi-Turn LLM Reasoning
by: Zhou, Xiaofan, et al.
Published: (2026)
by: Zhou, Xiaofan, et al.
Published: (2026)
Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation
by: Kruthof, Garvin
Published: (2026)
by: Kruthof, Garvin
Published: (2026)
Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA
by: Zhang, Yiran, et al.
Published: (2025)
by: Zhang, Yiran, et al.
Published: (2025)
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
by: Ramnath, Sahana, et al.
Published: (2025)
by: Ramnath, Sahana, et al.
Published: (2025)
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
by: Kwan, Wai-Chung, et al.
Published: (2024)
by: Kwan, Wai-Chung, et al.
Published: (2024)
Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text
by: Mohamed, Amr, et al.
Published: (2025)
by: Mohamed, Amr, et al.
Published: (2025)
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
by: Katsis, Yannis, et al.
Published: (2025)
by: Katsis, Yannis, et al.
Published: (2025)
MTRouter: Cost-Aware Multi-Turn LLM Routing with History-Model Joint Embeddings
by: Zhang, Yiqun, et al.
Published: (2026)
by: Zhang, Yiqun, et al.
Published: (2026)
Know You Before You Speak: User-State Modeling for LLM Personalization in Multi-Turn Conversation
by: Luo, Jiani, et al.
Published: (2026)
by: Luo, Jiani, et al.
Published: (2026)
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
by: Juneja, Prerna, et al.
Published: (2026)
by: Juneja, Prerna, et al.
Published: (2026)
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
by: Badola, Kartikeya, et al.
Published: (2025)
by: Badola, Kartikeya, et al.
Published: (2025)
MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation
by: Li, Xiaoyuan, et al.
Published: (2025)
by: Li, Xiaoyuan, et al.
Published: (2025)
Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction
by: Gosai, Advait, et al.
Published: (2025)
by: Gosai, Advait, et al.
Published: (2025)
Token Statistics Reveal Conversational Drift in Multi-turn LLM Interaction
by: Hafez, Wael, et al.
Published: (2026)
by: Hafez, Wael, et al.
Published: (2026)
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
by: Li, Nathaniel, et al.
Published: (2024)
by: Li, Nathaniel, et al.
Published: (2024)
TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
by: Li, Jiaqian, et al.
Published: (2026)
by: Li, Jiaqian, et al.
Published: (2026)
Examining Identity Drift in Conversations of LLM Agents
by: Choi, Junhyuk, et al.
Published: (2024)
by: Choi, Junhyuk, et al.
Published: (2024)
RankJudge: A Multi-Turn LLM-as-a-Judge Synthetic Benchmark Generator
by: Tang, Zhenwei, et al.
Published: (2026)
by: Tang, Zhenwei, et al.
Published: (2026)
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
by: Yu, Erxin, et al.
Published: (2024)
by: Yu, Erxin, et al.
Published: (2024)
Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production
by: Liu, Junhua, et al.
Published: (2024)
by: Liu, Junhua, et al.
Published: (2024)
Similar Items
-
Evaluating the Sensitivity of LLMs to Prior Context
by: Hankache, Robert, et al.
Published: (2025) -
Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks
by: Pelosio, Giulio, et al.
Published: (2025) -
Drift No More? Context Equilibria in Multi-Turn LLM Interactions
by: Dongre, Vardhan, et al.
Published: (2025) -
Helping Customers in Distress: An LLM-powered Agent that Converses, Probes, and Routes
by: Atreya, Alankar, et al.
Published: (2026) -
The Slow Drift of Support: Boundary Failures in Multi-Turn Mental Health LLM Dialogues
by: Cheng, Youyou, et al.
Published: (2026)