:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Balaji, Sumanth, Mishra, Piyush, Sachdeva, Aashraya, Agrawal, Suraj
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.00596
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
by: Kumar, Abhishek, et al.
Published: (2026)

Beyond IVR Touch-Tones: Customer Intent Routing using LLMs
by: Rojas-Galeano, Sergio
Published: (2025)

Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models
by: Manduru, Sumanth, et al.
Published: (2025)

BeliefShift: Benchmarking Temporal Belief Consistency and Opinion Drift in LLM Agents
by: Myakala, Praveen Kumar, et al.
Published: (2026)

Beyond Sentiment: A Multi-Agent Pipeline for Actionable Business Advice from Reviews
by: Bhandari, Kartikey Singh, et al.
Published: (2026)

Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health
by: Dey, Sumon Kanti, et al.
Published: (2025)

MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents
by: Gong, Ming, et al.
Published: (2025)

PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars
by: Prabhu, Sumanth
Published: (2024)

ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
by: Wang, Haoxin, et al.
Published: (2025)

GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
by: Kang, Haoqiang, et al.
Published: (2025)

Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications
by: Agrawal, Vishakha, et al.
Published: (2025)

Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection
by: Narayanasamy, Priyadarshan, et al.
Published: (2026)

A Benchmark Dataset and Evaluation Framework for Vietnamese Large Language Models in Customer Support
by: Nguyen, Long S. T., et al.
Published: (2025)

Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M
by: Pant, Piyush
Published: (2025)

Accelerating Direct Preference Optimization with Prefix Sharing
by: Wang, Franklin, et al.
Published: (2024)

Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
by: Wei, Tianxin, et al.
Published: (2025)

JudgeAgent: Beyond Static Benchmarks for Knowledge-Driven and Dynamic LLM Evaluation
by: Shi, Zhichao, et al.
Published: (2025)

Evaluating, Synthesizing, and Enhancing for Customer Support Conversation
by: Zhu, Jie, et al.
Published: (2025)

LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants?
by: Sun, Lu, et al.
Published: (2025)

Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows
by: Borovkov, Nikita, et al.
Published: (2026)

Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain
by: Li, Yue, et al.
Published: (2025)

Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies
by: Varshney, Prasoon, et al.
Published: (2025)

Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
by: Wu, Yisha, et al.
Published: (2025)

Sustainable Digitalization of Business with Multi-Agent RAG and LLM
by: Arslan, Muhammad, et al.
Published: (2025)

GUIDEQ: Framework for Guided Questioning for progressive informational collection and classification
by: Mishra, Priya, et al.
Published: (2024)

LLMRank: Understanding LLM Strengths for Model Routing
by: Agrawal, Shubham, et al.
Published: (2025)

AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents
by: Liu, Xuannan, et al.
Published: (2026)

When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)

Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care
by: Kumar, Saurabh, et al.
Published: (2025)

ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents
by: Fu, Xing, et al.
Published: (2026)

Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
by: Cao, Yixin, et al.
Published: (2025)

Benchmarking and Learning Real-World Customer Service Dialogue
by: Gao, Tianhong, et al.
Published: (2025)

Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs
by: Yang, Chen, et al.
Published: (2025)

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
by: Xu, Frank F., et al.
Published: (2024)

When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
by: Qian, Lingfei, et al.
Published: (2025)

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping
by: Wang, Ziyi, et al.
Published: (2025)

Distribution-Aware Companding Quantization of Large Language Models
by: Radhakrishnan, Athul, et al.
Published: (2026)

MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered
by: Mirza, Imran, et al.
Published: (2025)

Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions
by: Sachdeva, Rachneet, et al.
Published: (2025)

CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
by: Sachdeva, Rachneet, et al.
Published: (2023)