Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Siddiqui, Kazi Amanul Islam, Kellogg, Martin
Format:	Preprint
Published:	2024
Subjects:	Software Engineering Artificial Intelligence
Online Access:	https://arxiv.org/abs/2406.15676
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908573928259584
author	Siddiqui, Kazi Amanul Islam Kellogg, Martin
author_facet	Siddiqui, Kazi Amanul Islam Kellogg, Martin
contents	Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.
format	Preprint
id	arxiv_https___arxiv_org_abs_2406_15676
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Inferring Pluggable Types with Machine Learning Siddiqui, Kazi Amanul Islam Kellogg, Martin Software Engineering Artificial Intelligence Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.
title	Inferring Pluggable Types with Machine Learning
topic	Software Engineering Artificial Intelligence
url	https://arxiv.org/abs/2406.15676

Similar Items