Saved in:
Bibliographic Details
Main Authors: Zhang, Xiaoling, Xu, Zhengzi, Yang, Shouguo, Li, Zhi, Shi, Zhiqiang, Sun, Limin
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2405.09112
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909203599196160
author Zhang, Xiaoling
Xu, Zhengzi
Yang, Shouguo
Li, Zhi
Shi, Zhiqiang
Sun, Limin
author_facet Zhang, Xiaoling
Xu, Zhengzi
Yang, Shouguo
Li, Zhi
Shi, Zhiqiang
Sun, Limin
contents Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 44.34%, 64.16%, and 54.44% in precision, recall, and F1 score, while also exhibiting superior generalizability.
format Preprint
id arxiv_https___arxiv_org_abs_2405_09112
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning
Zhang, Xiaoling
Xu, Zhengzi
Yang, Shouguo
Li, Zhi
Shi, Zhiqiang
Sun, Limin
Software Engineering
Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 44.34%, 64.16%, and 54.44% in precision, recall, and F1 score, while also exhibiting superior generalizability.
title Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning
topic Software Engineering
url https://arxiv.org/abs/2405.09112