Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cheng, Yutong, Li, Changze, Basuki, Raihan Sultan Pasha, Cui, Qian, Ding, Wei, Gao, Peng
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2605.25836
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917531646689280
author	Cheng, Yutong Li, Changze Basuki, Raihan Sultan Pasha Cui, Qian Ding, Wei Gao, Peng
author_facet	Cheng, Yutong Li, Changze Basuki, Raihan Sultan Pasha Cui, Qian Ding, Wei Gao, Peng
contents	Extracting MITRE ATT&CK techniques from cyber threat intelligence (CTI) reports is an open-set, multi-label problem requiring both high recall (not missing techniques) and high precision (not hallucinating unsupported ones). Existing methods--rule-based, supervised, and LLM-based--struggle to achieve both: rule-based and supervised approaches lack generalizability across diverse attack descriptions, while LLM-based approaches that couple candidate generation and validation within a single inference step suffer from limited recall and precision simultaneously. We propose TTPrint, which addresses this challenge through a diverge-then-converge design inspired by how human analysts work: first extracting broadly, then verifying rigorously. In the divergent phase, reports are decomposed into atomic behaviors and candidate techniques are proposed broadly. A deterministic span localization stage then anchors each candidate to a specific evidence window in the source text. A convergent verification stage retains only candidates supported by both the localized evidence and the authoritative MITRE definition. We contribute two evaluation resources--a cleaned TRAM benchmark (TRAM-Clean) and a new annotated dataset (TTPrint-Bench)--to address known annotation noise in existing benchmarks and elevate the task to document-level TTP extraction. On TRAM-Clean and TTPrint-Bench, TTPrint achieves 76.48% and 87.39% macro-F1 respectively, outperforming the leading baseline by 63.5% and 29.4%. A multi-backbone analysis across six LLMs and a threshold sensitivity study further demonstrate generalizability across model choices and provide practical guidance for parameter selection.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_25836
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	TTPrint: Evidence-Grounded TTP Extraction via Diverge-then-Converge Verification Cheng, Yutong Li, Changze Basuki, Raihan Sultan Pasha Cui, Qian Ding, Wei Gao, Peng Cryptography and Security Artificial Intelligence Computation and Language Extracting MITRE ATT&CK techniques from cyber threat intelligence (CTI) reports is an open-set, multi-label problem requiring both high recall (not missing techniques) and high precision (not hallucinating unsupported ones). Existing methods--rule-based, supervised, and LLM-based--struggle to achieve both: rule-based and supervised approaches lack generalizability across diverse attack descriptions, while LLM-based approaches that couple candidate generation and validation within a single inference step suffer from limited recall and precision simultaneously. We propose TTPrint, which addresses this challenge through a diverge-then-converge design inspired by how human analysts work: first extracting broadly, then verifying rigorously. In the divergent phase, reports are decomposed into atomic behaviors and candidate techniques are proposed broadly. A deterministic span localization stage then anchors each candidate to a specific evidence window in the source text. A convergent verification stage retains only candidates supported by both the localized evidence and the authoritative MITRE definition. We contribute two evaluation resources--a cleaned TRAM benchmark (TRAM-Clean) and a new annotated dataset (TTPrint-Bench)--to address known annotation noise in existing benchmarks and elevate the task to document-level TTP extraction. On TRAM-Clean and TTPrint-Bench, TTPrint achieves 76.48% and 87.39% macro-F1 respectively, outperforming the leading baseline by 63.5% and 29.4%. A multi-backbone analysis across six LLMs and a threshold sensitivity study further demonstrate generalizability across model choices and provide practical guidance for parameter selection.
title	TTPrint: Evidence-Grounded TTP Extraction via Diverge-then-Converge Verification
topic	Cryptography and Security Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2605.25836

Similar Items