:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mayilvaghanan, Kawin, Gupta, Siddhant, Kumar, Ayush
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.14970
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries
by: Mayilvaghanan, Kawin, et al.
Published: (2025)

Tool-Aware Planning in Contact Center AI: Evaluating LLMs through Lineage-Guided Query Decomposition
by: Nathan, Varun, et al.
Published: (2026)

Integration of LLM Quality Assurance into an NLG System
by: Chen, Ching-Yi, et al.
Published: (2025)

Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
by: Devanathan, Rishikesh, et al.
Published: (2025)

Counterfactual Graph for Multi-Agent LLM Calibration
by: Huang, Jiatan, et al.
Published: (2026)

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems
by: Liu, Lijia, et al.
Published: (2025)

LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
by: Embar, Varsha, et al.
Published: (2025)

Multi-Facet Counterfactual Learning for Content Quality Evaluation
by: Zheng, Jiasheng, et al.
Published: (2024)

Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness
by: Amiri-Margavi, Alireza, et al.
Published: (2026)

Uncertainty and Fairness Awareness in LLM-Based Recommendation Systems
by: Sah, Chandan Kumar, et al.
Published: (2026)

MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning
by: Ghosh, Rajarshi, et al.
Published: (2025)

Aligning (Medical) LLMs for (Counterfactual) Fairness
by: Poulain, Raphael, et al.
Published: (2024)

Causal-Counterfactual RAG: The Integration of Causal-Counterfactual Reasoning into RAG
by: Khadilkar, Harshad, et al.
Published: (2025)

Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategies
by: Kulkarni, Siddhant, et al.
Published: (2026)

Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems
by: Wu, Wanxing, et al.
Published: (2026)

Agent-as-a-Graph: Knowledge Graph-Based Tool and Agent Retrieval for LLM Multi-Agent Systems
by: Nizar, Faheem, et al.
Published: (2025)

Question Answering on Patient Medical Records with Private Fine-Tuned LLMs
by: Kothari, Sara, et al.
Published: (2025)

IITR-CIOL@NLU of Devanagari Script Languages 2025: Multilingual Hate Speech Detection and Target Identification in Devanagari-Scripted Languages
by: Gupta, Siddhant, et al.
Published: (2024)

ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects
by: Maheshwari, Ayush, et al.
Published: (2025)

Evaluating the Retrieval Component in LLM-Based Question Answering Systems
by: Alinejad, Ashkan, et al.
Published: (2024)

Anchor Points: Benchmarking Models with Much Fewer Examples
by: Vivek, Rajan, et al.
Published: (2023)

Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
by: Ouyang, Tinghui, et al.
Published: (2024)

Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence
by: Mo, Kaijie, et al.
Published: (2026)

Introducing Super RAGs in Mistral 8x7B-v1
by: Thakur, Ayush, et al.
Published: (2024)

Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
by: Ethayarajh, Kawin, et al.
Published: (2021)

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
by: Baumgärtner, Tim, et al.
Published: (2026)

FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes
by: Nawale, Janki Atul, et al.
Published: (2025)

Text-Based Detection of On-Hold Scripts in Contact Center Calls
by: Galimzianov, Dmitrii, et al.
Published: (2024)

LLM-Based Support for Diabetes Diagnosis: Opportunities, Scenarios, and Challenges with GPT-5
by: Gupta, Gaurav Kumar, et al.
Published: (2025)

Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
by: Khatchadourian, Raffi
Published: (2026)

LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications
by: Krishnamoorthy, Saranya, et al.
Published: (2024)

Tool-to-Agent Retrieval: Bridging Tools and Agents for Scalable LLM Multi-Agent Systems
by: Lumer, Elias, et al.
Published: (2025)

Data Checklist: On Unit-Testing Datasets with Usable Information
by: Zhang, Heidi C., et al.
Published: (2024)

Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
by: Bhambri, Siddhant, et al.
Published: (2025)

Substance over Style: Evaluating Proactive Conversational Coaching Agents
by: Srinivas, Vidya, et al.
Published: (2025)

FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP
by: Tokpo, Ewoenam Kwaku, et al.
Published: (2024)

Are VLMs Really Blind
by: Singh, Ayush, et al.
Published: (2024)

Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback
by: Arora, Siddhant, et al.
Published: (2026)

Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
by: Chen, Chaoran, et al.
Published: (2025)

Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance
by: Hillebrand, Lars, et al.
Published: (2025)