Saved in:
Bibliographic Details
Main Authors: Effendi, Sedick David Baker, Pinho, Xavier, Dreyer, Andrei Michael, Yamaguchi, Fabian
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2506.06247
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912417485684736
author Effendi, Sedick David Baker
Pinho, Xavier
Dreyer, Andrei Michael
Yamaguchi, Fabian
author_facet Effendi, Sedick David Baker
Pinho, Xavier
Dreyer, Andrei Michael
Yamaguchi, Fabian
contents Taint analysis using explicit whole-program data-dependence graphs is powerful for vulnerability discovery but faces two major challenges. First, accurately modeling taint propagation through calls to external library procedures requires extensive manual annotations, which becomes impractical for large ecosystems. Second, the sheer size of whole-program graph representations leads to serious scalability and performance issues, particularly when quick analysis is needed in continuous development pipelines. This paper presents the design and implementation of a system for a language-agnostic data-dependence representation. The system accommodates missing annotations describing the behavior of library procedures by over-approximating data flows, allowing annotations to be added later without recalculation. We contribute this data-flow analysis system to the open-source code analysis platform Joern making it available to the community.
format Preprint
id arxiv_https___arxiv_org_abs_2506_06247
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Scalable Language Agnostic Taint Tracking using Explicit Data Dependencies
Effendi, Sedick David Baker
Pinho, Xavier
Dreyer, Andrei Michael
Yamaguchi, Fabian
Software Engineering
D.2.4
Taint analysis using explicit whole-program data-dependence graphs is powerful for vulnerability discovery but faces two major challenges. First, accurately modeling taint propagation through calls to external library procedures requires extensive manual annotations, which becomes impractical for large ecosystems. Second, the sheer size of whole-program graph representations leads to serious scalability and performance issues, particularly when quick analysis is needed in continuous development pipelines. This paper presents the design and implementation of a system for a language-agnostic data-dependence representation. The system accommodates missing annotations describing the behavior of library procedures by over-approximating data flows, allowing annotations to be added later without recalculation. We contribute this data-flow analysis system to the open-source code analysis platform Joern making it available to the community.
title Scalable Language Agnostic Taint Tracking using Explicit Data Dependencies
topic Software Engineering
D.2.4
url https://arxiv.org/abs/2506.06247