Salvato in:
| Autori principali: | , , , , , |
|---|---|
| Natura: | Preprint |
| Pubblicazione: |
2026
|
| Soggetti: | |
| Accesso online: | https://arxiv.org/abs/2602.18914 |
| Tags: |
Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
|
| _version_ | 1866915811080273920 |
|---|---|
| author | Wang, Peiran Li, Ying Sun, Yuqiang Liu, Chengwei Liu, Yang Tian, Yuan |
| author_facet | Wang, Peiran Li, Ying Sun, Yuqiang Liu, Chengwei Liu, Yang Tian, Yuan |
| contents | The Model Context Protocol (MCP) has rapidly become a de facto standard for connecting LLM-based agents with external tools via reusable MCP servers. In practice, however, server selection and onboarding rely heavily on free-text tool descriptions that are intentionally loosely constrained. Although this flexibility largely ensures the scalability of MCP servers, it also creates a reliability gap that descriptions often misrepresent or omit key semantics, increasing trial-and-error integration, degrading agent behavior, and potentially introducing security risks. To this end, we present the first systematic study of description smells in MCP tool descriptions and their impact on usability. Specifically, we synthesize software/API documentation practices and agentic tool-use requirements into a four-dimensional quality standard: accuracy, functionality, information completeness, and conciseness, covering 18 specific smell categories. Using this standard, we conducted a large-scale empirical study on a well-constructed dataset of 10,831 MCP servers. We find that description smells are pervasive (e.g., 73% repeated tool names, thousands with incorrect parameter semantics or missing return descriptions), reflecting a "code-first, description-last" pattern. Through a controlled mutation-based study, we show these smells significantly affect LLM tool selection, with functionality and accuracy having the largest effects (+11.6% and +8.8%, p < 0.001). In competitive settings with functionally equivalent servers, standard-compliant descriptions reach 72% selection probability (260% over a 20% baseline), demonstrating that smell-guided remediation yields substantial practical benefits. We release our labeled dataset and standards to support future work on reliable and secure MCP ecosystems. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_18914 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | From Docs to Descriptions: Smell-Aware Evaluation of MCP Server Descriptions Wang, Peiran Li, Ying Sun, Yuqiang Liu, Chengwei Liu, Yang Tian, Yuan Software Engineering The Model Context Protocol (MCP) has rapidly become a de facto standard for connecting LLM-based agents with external tools via reusable MCP servers. In practice, however, server selection and onboarding rely heavily on free-text tool descriptions that are intentionally loosely constrained. Although this flexibility largely ensures the scalability of MCP servers, it also creates a reliability gap that descriptions often misrepresent or omit key semantics, increasing trial-and-error integration, degrading agent behavior, and potentially introducing security risks. To this end, we present the first systematic study of description smells in MCP tool descriptions and their impact on usability. Specifically, we synthesize software/API documentation practices and agentic tool-use requirements into a four-dimensional quality standard: accuracy, functionality, information completeness, and conciseness, covering 18 specific smell categories. Using this standard, we conducted a large-scale empirical study on a well-constructed dataset of 10,831 MCP servers. We find that description smells are pervasive (e.g., 73% repeated tool names, thousands with incorrect parameter semantics or missing return descriptions), reflecting a "code-first, description-last" pattern. Through a controlled mutation-based study, we show these smells significantly affect LLM tool selection, with functionality and accuracy having the largest effects (+11.6% and +8.8%, p < 0.001). In competitive settings with functionally equivalent servers, standard-compliant descriptions reach 72% selection probability (260% over a 20% baseline), demonstrating that smell-guided remediation yields substantial practical benefits. We release our labeled dataset and standards to support future work on reliable and secure MCP ecosystems. |
| title | From Docs to Descriptions: Smell-Aware Evaluation of MCP Server Descriptions |
| topic | Software Engineering |
| url | https://arxiv.org/abs/2602.18914 |