Obsah: :: Library Catalog

Uloženo v:

Podrobná bibliografie
Hlavní autor:	Daniel Arp
Médium:	Recurso digital
Jazyk:
Vydáno:	Zenodo 2025
On-line přístup:	https://doi.org/10.5281/zenodo.18782865
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Obsah:

The AndMal2025 dataset provides a comprehensive behavioral representation of Android applications by integrating static code attributes and runtime execution indicators. Each record corresponds to a single APK instance, where the feature set captures permission usage, system interactions, service binding activities, resource consumption patterns, and network behavior. The dataset to support robust supervised learning for Android malware analysis, including both primary binary detection and auxiliary multi-label family characterization. The feature space is organized into six major groups to reflect different operational layers of Android applications. <h3>1. Application Metadata Features</h3> These attributes summarize high-level structural properties of the APK and provide contextual signals regarding application complexity. <ul> <li> App_ID — Unique identifier assigned to each application instance. </li> <li> APK_Size_KB — Size of the packaged APK file in kilobytes, reflecting application footprint. </li> <li> Dex_Method_Count — Total number of methods extracted from the DEX bytecode, indicating codebase complexity. </li> </ul> These metadata features help distinguish lightweight benign utilities from feature-dense malicious packages. <h3>2. Permission-Based Static Features</h3> Permission indicators capture declared capabilities within the Android manifest and are widely recognized as strong malware predictors. <ul> <li> perm_SEND_SMS — Indicates whether the application requests SMS transmission permission. </li> <li> perm_ACCESS_NETWORK_STATE — Reflects access to network connectivity state information. </li> <li> perm_WRITE_SETTINGS — Indicates permission to modify system settings. </li> <li> perm_INTERNET — Specifies whether the application can access network resources. </li> <li> perm_count_total — Total number of permissions requested by the application. </li> <li> perm_count_dangerous — Number of high-risk permissions requested. </li> <li> danger_perm_ratio — Ratio of dangerous permissions to total permissions. </li> </ul> These features characterize the privilege profile of each application and expose over-permission patterns commonly associated with malicious behavior. <h3>3. Intent and Broadcast Action Features</h3> Broadcast receivers and intent filters reveal persistence mechanisms and background execution strategies. <ul> <li> intent_BOOT_COMPLETED — Indicates registration for device boot completion events. </li> <li> intent_SCREEN_ON — Indicates monitoring of screen activation events. </li> </ul> Such signals are frequently linked to stealth persistence and opportunistic background activity. <h3>4. Class and API Usage Indicators</h3> These features capture the presence of sensitive framework classes that often appear in suspicious workflows. <ul> <li> class_java_lang_Class — Reflects dynamic class loading or reflection usage. </li> <li> class_android_telephony_SmsManager — Indicates access to SMS management APIs. </li> </ul> API-level evidence complements permission analysis by exposing how declared privileges are operationalized. <h3>5. Runtime Event and Service Interaction Features</h3> Dynamic execution monitoring provides insight into inter-process communication and service orchestration behaviors. <ul> <li> evt_Transact — Count of Binder transaction events. </li> <li> evt_onServiceConnected — Number of successful service connection callbacks. </li> <li> evt_bindService — Frequency of service binding requests. </li> <li> evt_attachInterface — Interface attachment operations observed at runtime. </li> <li> evt_ClassLoader — Dynamic class loading events. </li> <li> evt_total — Aggregate count of monitored runtime events. </li> <li> evt_entropy — Distribution entropy of runtime event types. </li> </ul> These features expose behavioral patterns that may remain hidden in purely static inspection. <h3>6. System Activity, Resource, and Network Features</h3> This group captures operational footprints generated during application execution. System and Resource Metrics <ul> <li> syscall_count_total — Total number of observed system calls. </li> <li> cpu_mean — Mean CPU utilization during execution. </li> <li> mem_mean_mb — Average memory consumption in megabytes. </li> <li> file_write_count — Number of file write operations. </li> <li> service_start_count — Count of service start invocations. </li> </ul> Network Behavior Metrics <ul> <li> net_conn_count — Number of outbound network connections. </li> <li> dns_query_count — DNS query frequency. </li> <li> net_tx_kb — Volume of transmitted network data (KB). </li> <li> net_rx_kb — Volume of received network data (KB). </li> </ul> Code Structure Indicators <ul> <li> opcode_ngram_entropy — Entropy of opcode n-gram distribution. </li> <li> dex_string_entropy — Entropy of embedded string constants. </li> </ul> These features jointly model communication intensity, resource usage patterns, and code-level irregularities associated with malicious workflows. <h2>Target Labels</h2> The dataset supports both primary detection and fine-grained family characterization. <h3>Primary Prediction Label</h3> <ul> <li> Malware_Binary — Binary ground truth where <ul> <li> <code>0</code> denotes benign applications </li> <li> <code>1</code> denotes malicious applications </li> </ul> </li> </ul> This label is intended for the main supervised malware detection task. <h3>Auxiliary Multi-Label Family Annotations</h3> To enable detailed behavioral analysis, the dataset provides non-exclusive malware family indicators: <ul> <li> y_scareware — Scareware activity indicator </li> <li> y_ransomware — Ransomware activity indicator </li> <li> y_adware — Adware activity indicator </li> <li> y_sms_malware — SMS-based malicious activity indicator </li> </ul> Multiple family labels may be active simultaneously for a single application, enabling multi-label learning and cross-family behavioral studies.

Podobné jednotky