Saved in:
| Main Authors: | Balaji, Sumanth, Mishra, Piyush, Sachdeva, Aashraya, Agrawal, Suraj |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00596 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
by: Kumar, Abhishek, et al.
Published: (2026)
by: Kumar, Abhishek, et al.
Published: (2026)
Beyond IVR Touch-Tones: Customer Intent Routing using LLMs
by: Rojas-Galeano, Sergio
Published: (2025)
by: Rojas-Galeano, Sergio
Published: (2025)
Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models
by: Manduru, Sumanth, et al.
Published: (2025)
by: Manduru, Sumanth, et al.
Published: (2025)
BeliefShift: Benchmarking Temporal Belief Consistency and Opinion Drift in LLM Agents
by: Myakala, Praveen Kumar, et al.
Published: (2026)
by: Myakala, Praveen Kumar, et al.
Published: (2026)
Beyond Sentiment: A Multi-Agent Pipeline for Actionable Business Advice from Reviews
by: Bhandari, Kartikey Singh, et al.
Published: (2026)
by: Bhandari, Kartikey Singh, et al.
Published: (2026)
Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health
by: Dey, Sumon Kanti, et al.
Published: (2025)
by: Dey, Sumon Kanti, et al.
Published: (2025)
MindFlow: Revolutionizing E-commerce Customer Support with Multimodal LLM Agents
by: Gong, Ming, et al.
Published: (2025)
by: Gong, Ming, et al.
Published: (2025)
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars
by: Prabhu, Sumanth
Published: (2024)
by: Prabhu, Sumanth
Published: (2024)
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
by: Wang, Haoxin, et al.
Published: (2025)
by: Wang, Haoxin, et al.
Published: (2025)
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
by: Kang, Haoqiang, et al.
Published: (2025)
by: Kang, Haoqiang, et al.
Published: (2025)
Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications
by: Agrawal, Vishakha, et al.
Published: (2025)
by: Agrawal, Vishakha, et al.
Published: (2025)
Beyond Perplexity: Character Distribution Signatures and the MDTA Benchmark for AI Text Detection
by: Narayanasamy, Priyadarshan, et al.
Published: (2026)
by: Narayanasamy, Priyadarshan, et al.
Published: (2026)
A Benchmark Dataset and Evaluation Framework for Vietnamese Large Language Models in Customer Support
by: Nguyen, Long S. T., et al.
Published: (2025)
by: Nguyen, Long S. T., et al.
Published: (2025)
Improving LLM Safety and Helpfulness using SFT and DPO: A Study on OPT-350M
by: Pant, Piyush
Published: (2025)
by: Pant, Piyush
Published: (2025)
Accelerating Direct Preference Optimization with Prefix Sharing
by: Wang, Franklin, et al.
Published: (2024)
by: Wang, Franklin, et al.
Published: (2024)
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
by: Wei, Tianxin, et al.
Published: (2025)
by: Wei, Tianxin, et al.
Published: (2025)
JudgeAgent: Beyond Static Benchmarks for Knowledge-Driven and Dynamic LLM Evaluation
by: Shi, Zhichao, et al.
Published: (2025)
by: Shi, Zhichao, et al.
Published: (2025)
Evaluating, Synthesizing, and Enhancing for Customer Support Conversation
by: Zhu, Jie, et al.
Published: (2025)
by: Zhu, Jie, et al.
Published: (2025)
LLM Agent Meets Agentic AI: Can LLM Agents Simulate Customers to Evaluate Agentic-AI-based Shopping Assistants?
by: Sun, Lu, et al.
Published: (2025)
by: Sun, Lu, et al.
Published: (2025)
Learning Selective LLM Autonomy from Copilot Feedback in Enterprise Customer Support Workflows
by: Borovkov, Nikita, et al.
Published: (2026)
by: Borovkov, Nikita, et al.
Published: (2026)
Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies
by: Varshney, Prasoon, et al.
Published: (2025)
by: Varshney, Prasoon, et al.
Published: (2025)
Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
by: Wu, Yisha, et al.
Published: (2025)
by: Wu, Yisha, et al.
Published: (2025)
Sustainable Digitalization of Business with Multi-Agent RAG and LLM
by: Arslan, Muhammad, et al.
Published: (2025)
by: Arslan, Muhammad, et al.
Published: (2025)
GUIDEQ: Framework for Guided Questioning for progressive informational collection and classification
by: Mishra, Priya, et al.
Published: (2024)
by: Mishra, Priya, et al.
Published: (2024)
LLMRank: Understanding LLM Strengths for Model Routing
by: Agrawal, Shubham, et al.
Published: (2025)
by: Agrawal, Shubham, et al.
Published: (2025)
AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents
by: Liu, Xuannan, et al.
Published: (2026)
by: Liu, Xuannan, et al.
Published: (2026)
When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)
by: Penamakuri, Abhirama Subramanyam, et al.
Published: (2025)
Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care
by: Kumar, Saurabh, et al.
Published: (2025)
by: Kumar, Saurabh, et al.
Published: (2025)
ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents
by: Fu, Xing, et al.
Published: (2026)
by: Fu, Xing, et al.
Published: (2026)
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
by: Cao, Yixin, et al.
Published: (2025)
by: Cao, Yixin, et al.
Published: (2025)
Benchmarking and Learning Real-World Customer Service Dialogue
by: Gao, Tianhong, et al.
Published: (2025)
by: Gao, Tianhong, et al.
Published: (2025)
Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs
by: Yang, Chen, et al.
Published: (2025)
by: Yang, Chen, et al.
Published: (2025)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
by: Xu, Frank F., et al.
Published: (2024)
by: Xu, Frank F., et al.
Published: (2024)
When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
by: Qian, Lingfei, et al.
Published: (2025)
by: Qian, Lingfei, et al.
Published: (2025)
Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping
by: Wang, Ziyi, et al.
Published: (2025)
by: Wang, Ziyi, et al.
Published: (2025)
Distribution-Aware Companding Quantization of Large Language Models
by: Radhakrishnan, Athul, et al.
Published: (2026)
by: Radhakrishnan, Athul, et al.
Published: (2026)
MALIBU Benchmark: Multi-Agent LLM Implicit Bias Uncovered
by: Mirza, Imran, et al.
Published: (2025)
by: Mirza, Imran, et al.
Published: (2025)
Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions
by: Sachdeva, Rachneet, et al.
Published: (2025)
by: Sachdeva, Rachneet, et al.
Published: (2025)
CATfOOD: Counterfactual Augmented Training for Improving Out-of-Domain Performance and Calibration
by: Sachdeva, Rachneet, et al.
Published: (2023)
by: Sachdeva, Rachneet, et al.
Published: (2023)
Similar Items
-
RECOVER: Robust Entity Correction via agentic Orchestration of hypothesis Variants for Evidence-based Recovery
by: Kumar, Abhishek, et al.
Published: (2026) -
Beyond IVR Touch-Tones: Customer Intent Routing using LLMs
by: Rojas-Galeano, Sergio
Published: (2025) -
Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models
by: Manduru, Sumanth, et al.
Published: (2025) -
BeliefShift: Benchmarking Temporal Belief Consistency and Opinion Drift in LLM Agents
by: Myakala, Praveen Kumar, et al.
Published: (2026) -
Beyond Sentiment: A Multi-Agent Pipeline for Actionable Business Advice from Reviews
by: Bhandari, Kartikey Singh, et al.
Published: (2026)