Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Weideman, Nicolaas, Arasteh, Sima, Raghothaman, Mukund, Mirkovic, Jelena, Hauser, Christophe
Format:	Preprint
Published:	2025
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2506.00313
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908389761613824
author	Weideman, Nicolaas Arasteh, Sima Raghothaman, Mukund Mirkovic, Jelena Hauser, Christophe
author_facet	Weideman, Nicolaas Arasteh, Sima Raghothaman, Mukund Mirkovic, Jelena Hauser, Christophe
contents	Data-flow analysis is a critical component of security research. Theoretically, accurate data-flow analysis in binary executables is an undecidable problem, due to complexities of binary code. Practically, many binary analysis engines offer some data-flow analysis capability, but we lack understanding of the accuracy of these analyses, and their limitations. We address this problem by introducing a labeled benchmark data set, including 215,072 microbenchmark test cases, mapping to 277,072 binary executables, created specifically to evaluate data-flow analysis implementations. Additionally, we augment our benchmark set with dynamically-discovered data flows from 6 real-world executables. Using our benchmark data set, we evaluate three state of the art data-flow analysis implementations, in angr, Ghidra and Miasm and discuss their very low accuracy and reasons behind it. We further propose three model extensions to static data-flow analysis that significantly improve accuracy, achieving almost perfect recall (0.99) and increasing precision from 0.13 to 0.32. Finally, we show that leveraging these model extensions in a vulnerability-discovery context leads to a tangible improvement in vulnerable instruction identification.
format	Preprint
id	arxiv_https___arxiv_org_abs_2506_00313
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Data Flows in You: Benchmarking and Improving Static Data-flow Analysis on Binary Executables Weideman, Nicolaas Arasteh, Sima Raghothaman, Mukund Mirkovic, Jelena Hauser, Christophe Cryptography and Security Data-flow analysis is a critical component of security research. Theoretically, accurate data-flow analysis in binary executables is an undecidable problem, due to complexities of binary code. Practically, many binary analysis engines offer some data-flow analysis capability, but we lack understanding of the accuracy of these analyses, and their limitations. We address this problem by introducing a labeled benchmark data set, including 215,072 microbenchmark test cases, mapping to 277,072 binary executables, created specifically to evaluate data-flow analysis implementations. Additionally, we augment our benchmark set with dynamically-discovered data flows from 6 real-world executables. Using our benchmark data set, we evaluate three state of the art data-flow analysis implementations, in angr, Ghidra and Miasm and discuss their very low accuracy and reasons behind it. We further propose three model extensions to static data-flow analysis that significantly improve accuracy, achieving almost perfect recall (0.99) and increasing precision from 0.13 to 0.32. Finally, we show that leveraging these model extensions in a vulnerability-discovery context leads to a tangible improvement in vulnerable instruction identification.
title	Data Flows in You: Benchmarking and Improving Static Data-flow Analysis on Binary Executables
topic	Cryptography and Security
url	https://arxiv.org/abs/2506.00313

Similar Items