Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kou, Bonan, Zhou, Zijie, Chen, Muhao, Zhang, Tianyi
Format:	Preprint
Published:	2026
Subjects:	Software Engineering
Online Access:	https://arxiv.org/abs/2601.08036
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911371314069504
author	Kou, Bonan Zhou, Zijie Chen, Muhao Zhang, Tianyi
author_facet	Kou, Bonan Zhou, Zijie Chen, Muhao Zhang, Tianyi
contents	API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents. We also measured the sensitivity of AutoDoc to the choice of different LLMs. We found that while larger LLMs produce higher-quality API documents, AutoDoc enables smaller open-source models (e.g., Mistral-7B-v0.3) to achieve comparable results. Finally, we conducted a user study to evaluate the usefulness of the API documents generated by AutoDoc. All participants found API documents generated by AutoDoc to be more comprehensive, concise, and helpful than the comparison baselines. This highlights the feasibility of utilizing LLMs for API documentation with careful design to counter LLM hallucination and information redundancy.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_08036
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Automating API Documentation from Crowdsourced Knowledge Kou, Bonan Zhou, Zijie Chen, Muhao Zhang, Tianyi Software Engineering API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents. We also measured the sensitivity of AutoDoc to the choice of different LLMs. We found that while larger LLMs produce higher-quality API documents, AutoDoc enables smaller open-source models (e.g., Mistral-7B-v0.3) to achieve comparable results. Finally, we conducted a user study to evaluate the usefulness of the API documents generated by AutoDoc. All participants found API documents generated by AutoDoc to be more comprehensive, concise, and helpful than the comparison baselines. This highlights the feasibility of utilizing LLMs for API documentation with careful design to counter LLM hallucination and information redundancy.
title	Automating API Documentation from Crowdsourced Knowledge
topic	Software Engineering
url	https://arxiv.org/abs/2601.08036

Similar Items