Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.02571 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908646689996800 |
|---|---|
| author | Xu, Yongzhe Li, Weitong Umrani, Eeshan Chung, Taejoong |
| author_facet | Xu, Yongzhe Li, Weitong Umrani, Eeshan Chung, Taejoong |
| contents | Accurate AS-to-organization mapping underpins Internet measurement and security, yet registries are fragmented, PeeringDB is narrow, and routing views reflect connectivity rather than ownership. We take a pragmatic step: ASINT integrates curated web evidence with retrieval-guided LLM techniques and strict, evidence-cited validation to infer two relations (aliases and directed parent-child) and then revalidates them conservatively. To keep the dataset sustainable, we operate a public dashboard and API where operators can inspect per-ASN evidence and submit feedback that seeds refreshes.
At scale, ASINT maps 112,172 ASNs into 82,840 organization families and, on overlapping AS sets, yields fewer, larger families with 21-24% more multi-AS groups than prior datasets (i.e., CAIDA AS2Org [11], AS2ORG+ [4], AS-Sibling [10], and Borges [28]). Quality is high in practice: ASINT achieves a precision of 0.9608, a recall of 0.9915 and an accuracy of 0.9752 under manual validation. Public deployment further drew operator-submitted reports for 595 ASNs across 106 organizations, with only 6 errors (99.0% observed clustering accuracy), with feedback coming from network operators across all RIR regions.
Better organization context improves downstream analyses: +27.5% intra-organization RPKI misconfiguration detections, -9.4% benign hijack alerts, and -5.9% corrections to cases mislabeled as IP leasing.
We release code, datasets, and the operator platform with APIs; given persistent ambiguity in organizational names and the continual evolution of corporate structures, an operator-in-the-loop process is essential; the platform records per ASN feedback with provenance and incorporates it into periodic refreshes and retraining. The methodology is model-agnostic and stands to improve further as base LLMs advance. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2508_02571 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | ASINT: Learning AS-to-Organization Mapping from Internet Metadata Xu, Yongzhe Li, Weitong Umrani, Eeshan Chung, Taejoong Networking and Internet Architecture Accurate AS-to-organization mapping underpins Internet measurement and security, yet registries are fragmented, PeeringDB is narrow, and routing views reflect connectivity rather than ownership. We take a pragmatic step: ASINT integrates curated web evidence with retrieval-guided LLM techniques and strict, evidence-cited validation to infer two relations (aliases and directed parent-child) and then revalidates them conservatively. To keep the dataset sustainable, we operate a public dashboard and API where operators can inspect per-ASN evidence and submit feedback that seeds refreshes. At scale, ASINT maps 112,172 ASNs into 82,840 organization families and, on overlapping AS sets, yields fewer, larger families with 21-24% more multi-AS groups than prior datasets (i.e., CAIDA AS2Org [11], AS2ORG+ [4], AS-Sibling [10], and Borges [28]). Quality is high in practice: ASINT achieves a precision of 0.9608, a recall of 0.9915 and an accuracy of 0.9752 under manual validation. Public deployment further drew operator-submitted reports for 595 ASNs across 106 organizations, with only 6 errors (99.0% observed clustering accuracy), with feedback coming from network operators across all RIR regions. Better organization context improves downstream analyses: +27.5% intra-organization RPKI misconfiguration detections, -9.4% benign hijack alerts, and -5.9% corrections to cases mislabeled as IP leasing. We release code, datasets, and the operator platform with APIs; given persistent ambiguity in organizational names and the continual evolution of corporate structures, an operator-in-the-loop process is essential; the platform records per ASN feedback with provenance and incorporates it into periodic refreshes and retraining. The methodology is model-agnostic and stands to improve further as base LLMs advance. |
| title | ASINT: Learning AS-to-Organization Mapping from Internet Metadata |
| topic | Networking and Internet Architecture |
| url | https://arxiv.org/abs/2508.02571 |