Saved in:
| Main Authors: | Mayilvaghanan, Kawin, Gupta, Siddhant, Kumar, Ayush |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.14970 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries
by: Mayilvaghanan, Kawin, et al.
Published: (2025)
by: Mayilvaghanan, Kawin, et al.
Published: (2025)
Tool-Aware Planning in Contact Center AI: Evaluating LLMs through Lineage-Guided Query Decomposition
by: Nathan, Varun, et al.
Published: (2026)
by: Nathan, Varun, et al.
Published: (2026)
Integration of LLM Quality Assurance into an NLG System
by: Chen, Ching-Yi, et al.
Published: (2025)
by: Chen, Ching-Yi, et al.
Published: (2025)
Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
by: Devanathan, Rishikesh, et al.
Published: (2025)
by: Devanathan, Rishikesh, et al.
Published: (2025)
Counterfactual Graph for Multi-Agent LLM Calibration
by: Huang, Jiatan, et al.
Published: (2026)
by: Huang, Jiatan, et al.
Published: (2026)
Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems
by: Liu, Lijia, et al.
Published: (2025)
by: Liu, Lijia, et al.
Published: (2025)
LLM-Based Insight Extraction for Contact Center Analytics and Cost-Efficient Deployment
by: Embar, Varsha, et al.
Published: (2025)
by: Embar, Varsha, et al.
Published: (2025)
Multi-Facet Counterfactual Learning for Content Quality Evaluation
by: Zheng, Jiasheng, et al.
Published: (2024)
by: Zheng, Jiasheng, et al.
Published: (2024)
Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness
by: Amiri-Margavi, Alireza, et al.
Published: (2026)
by: Amiri-Margavi, Alireza, et al.
Published: (2026)
Uncertainty and Fairness Awareness in LLM-Based Recommendation Systems
by: Sah, Chandan Kumar, et al.
Published: (2026)
by: Sah, Chandan Kumar, et al.
Published: (2026)
MEDEQUALQA: Evaluating Biases in LLMs with Counterfactual Reasoning
by: Ghosh, Rajarshi, et al.
Published: (2025)
by: Ghosh, Rajarshi, et al.
Published: (2025)
Aligning (Medical) LLMs for (Counterfactual) Fairness
by: Poulain, Raphael, et al.
Published: (2024)
by: Poulain, Raphael, et al.
Published: (2024)
Causal-Counterfactual RAG: The Integration of Causal-Counterfactual Reasoning into RAG
by: Khadilkar, Harshad, et al.
Published: (2025)
by: Khadilkar, Harshad, et al.
Published: (2025)
Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategies
by: Kulkarni, Siddhant, et al.
Published: (2026)
by: Kulkarni, Siddhant, et al.
Published: (2026)
Towards Fair and Comprehensive Evaluation of Routers in Collaborative LLM Systems
by: Wu, Wanxing, et al.
Published: (2026)
by: Wu, Wanxing, et al.
Published: (2026)
Agent-as-a-Graph: Knowledge Graph-Based Tool and Agent Retrieval for LLM Multi-Agent Systems
by: Nizar, Faheem, et al.
Published: (2025)
by: Nizar, Faheem, et al.
Published: (2025)
Question Answering on Patient Medical Records with Private Fine-Tuned LLMs
by: Kothari, Sara, et al.
Published: (2025)
by: Kothari, Sara, et al.
Published: (2025)
IITR-CIOL@NLU of Devanagari Script Languages 2025: Multilingual Hate Speech Detection and Target Identification in Devanagari-Scripted Languages
by: Gupta, Siddhant, et al.
Published: (2024)
by: Gupta, Siddhant, et al.
Published: (2024)
ParamBench: A Graduate-Level Benchmark for Evaluating LLM Understanding on Indic Subjects
by: Maheshwari, Ayush, et al.
Published: (2025)
by: Maheshwari, Ayush, et al.
Published: (2025)
Evaluating the Retrieval Component in LLM-Based Question Answering Systems
by: Alinejad, Ashkan, et al.
Published: (2024)
by: Alinejad, Ashkan, et al.
Published: (2024)
Anchor Points: Benchmarking Models with Much Fewer Examples
by: Vivek, Rajan, et al.
Published: (2023)
by: Vivek, Rajan, et al.
Published: (2023)
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
by: Ouyang, Tinghui, et al.
Published: (2024)
by: Ouyang, Tinghui, et al.
Published: (2024)
Faithfulness vs. Safety: Evaluating LLM Behavior Under Counterfactual Medical Evidence
by: Mo, Kaijie, et al.
Published: (2026)
by: Mo, Kaijie, et al.
Published: (2026)
Introducing Super RAGs in Mistral 8x7B-v1
by: Thakur, Ayush, et al.
Published: (2024)
by: Thakur, Ayush, et al.
Published: (2024)
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
by: Ethayarajh, Kawin, et al.
Published: (2021)
by: Ethayarajh, Kawin, et al.
Published: (2021)
SciCoQA: Quality Assurance for Scientific Paper--Code Alignment
by: Baumgärtner, Tim, et al.
Published: (2026)
by: Baumgärtner, Tim, et al.
Published: (2026)
FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes
by: Nawale, Janki Atul, et al.
Published: (2025)
by: Nawale, Janki Atul, et al.
Published: (2025)
Text-Based Detection of On-Hold Scripts in Contact Center Calls
by: Galimzianov, Dmitrii, et al.
Published: (2024)
by: Galimzianov, Dmitrii, et al.
Published: (2024)
LLM-Based Support for Diabetes Diagnosis: Opportunities, Scenarios, and Challenges with GPT-5
by: Gupta, Gaurav Kumar, et al.
Published: (2025)
by: Gupta, Gaurav Kumar, et al.
Published: (2025)
Replayable Financial Agents: A Determinism-Faithfulness Assurance Harness for Tool-Using LLM Agents
by: Khatchadourian, Raffi
Published: (2026)
by: Khatchadourian, Raffi
Published: (2026)
LLM-Based Section Identifiers Excel on Open Source but Stumble in Real World Applications
by: Krishnamoorthy, Saranya, et al.
Published: (2024)
by: Krishnamoorthy, Saranya, et al.
Published: (2024)
Tool-to-Agent Retrieval: Bridging Tools and Agents for Scalable LLM Multi-Agent Systems
by: Lumer, Elias, et al.
Published: (2025)
by: Lumer, Elias, et al.
Published: (2025)
Data Checklist: On Unit-Testing Datasets with Usable Information
by: Zhang, Heidi C., et al.
Published: (2024)
by: Zhang, Heidi C., et al.
Published: (2024)
Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
by: Bhambri, Siddhant, et al.
Published: (2025)
by: Bhambri, Siddhant, et al.
Published: (2025)
Substance over Style: Evaluating Proactive Conversational Coaching Agents
by: Srinivas, Vidya, et al.
Published: (2025)
by: Srinivas, Vidya, et al.
Published: (2025)
FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP
by: Tokpo, Ewoenam Kwaku, et al.
Published: (2024)
by: Tokpo, Ewoenam Kwaku, et al.
Published: (2024)
Are VLMs Really Blind
by: Singh, Ayush, et al.
Published: (2024)
by: Singh, Ayush, et al.
Published: (2024)
Optimizing Conversational Quality in Spoken Dialogue Systems with Reinforcement Learning from AI Feedback
by: Arora, Siddhant, et al.
Published: (2026)
by: Arora, Siddhant, et al.
Published: (2026)
Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
by: Chen, Chaoran, et al.
Published: (2025)
by: Chen, Chaoran, et al.
Published: (2025)
Advancing Risk and Quality Assurance: A RAG Chatbot for Improved Regulatory Compliance
by: Hillebrand, Lars, et al.
Published: (2025)
by: Hillebrand, Lars, et al.
Published: (2025)
Similar Items
-
Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries
by: Mayilvaghanan, Kawin, et al.
Published: (2025) -
Tool-Aware Planning in Contact Center AI: Evaluating LLMs through Lineage-Guided Query Decomposition
by: Nathan, Varun, et al.
Published: (2026) -
Integration of LLM Quality Assurance into an NLG System
by: Chen, Ching-Yi, et al.
Published: (2025) -
Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation
by: Devanathan, Rishikesh, et al.
Published: (2025) -
Counterfactual Graph for Multi-Agent LLM Calibration
by: Huang, Jiatan, et al.
Published: (2026)