Saved in:
Bibliographic Details
Main Authors: Xu, Yongzhe, Li, Weitong, Umrani, Eeshan, Chung, Taejoong
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.02571
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908646689996800
author Xu, Yongzhe
Li, Weitong
Umrani, Eeshan
Chung, Taejoong
author_facet Xu, Yongzhe
Li, Weitong
Umrani, Eeshan
Chung, Taejoong
contents Accurate AS-to-organization mapping underpins Internet measurement and security, yet registries are fragmented, PeeringDB is narrow, and routing views reflect connectivity rather than ownership. We take a pragmatic step: ASINT integrates curated web evidence with retrieval-guided LLM techniques and strict, evidence-cited validation to infer two relations (aliases and directed parent-child) and then revalidates them conservatively. To keep the dataset sustainable, we operate a public dashboard and API where operators can inspect per-ASN evidence and submit feedback that seeds refreshes. At scale, ASINT maps 112,172 ASNs into 82,840 organization families and, on overlapping AS sets, yields fewer, larger families with 21-24% more multi-AS groups than prior datasets (i.e., CAIDA AS2Org [11], AS2ORG+ [4], AS-Sibling [10], and Borges [28]). Quality is high in practice: ASINT achieves a precision of 0.9608, a recall of 0.9915 and an accuracy of 0.9752 under manual validation. Public deployment further drew operator-submitted reports for 595 ASNs across 106 organizations, with only 6 errors (99.0% observed clustering accuracy), with feedback coming from network operators across all RIR regions. Better organization context improves downstream analyses: +27.5% intra-organization RPKI misconfiguration detections, -9.4% benign hijack alerts, and -5.9% corrections to cases mislabeled as IP leasing. We release code, datasets, and the operator platform with APIs; given persistent ambiguity in organizational names and the continual evolution of corporate structures, an operator-in-the-loop process is essential; the platform records per ASN feedback with provenance and incorporates it into periodic refreshes and retraining. The methodology is model-agnostic and stands to improve further as base LLMs advance.
format Preprint
id arxiv_https___arxiv_org_abs_2508_02571
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ASINT: Learning AS-to-Organization Mapping from Internet Metadata
Xu, Yongzhe
Li, Weitong
Umrani, Eeshan
Chung, Taejoong
Networking and Internet Architecture
Accurate AS-to-organization mapping underpins Internet measurement and security, yet registries are fragmented, PeeringDB is narrow, and routing views reflect connectivity rather than ownership. We take a pragmatic step: ASINT integrates curated web evidence with retrieval-guided LLM techniques and strict, evidence-cited validation to infer two relations (aliases and directed parent-child) and then revalidates them conservatively. To keep the dataset sustainable, we operate a public dashboard and API where operators can inspect per-ASN evidence and submit feedback that seeds refreshes. At scale, ASINT maps 112,172 ASNs into 82,840 organization families and, on overlapping AS sets, yields fewer, larger families with 21-24% more multi-AS groups than prior datasets (i.e., CAIDA AS2Org [11], AS2ORG+ [4], AS-Sibling [10], and Borges [28]). Quality is high in practice: ASINT achieves a precision of 0.9608, a recall of 0.9915 and an accuracy of 0.9752 under manual validation. Public deployment further drew operator-submitted reports for 595 ASNs across 106 organizations, with only 6 errors (99.0% observed clustering accuracy), with feedback coming from network operators across all RIR regions. Better organization context improves downstream analyses: +27.5% intra-organization RPKI misconfiguration detections, -9.4% benign hijack alerts, and -5.9% corrections to cases mislabeled as IP leasing. We release code, datasets, and the operator platform with APIs; given persistent ambiguity in organizational names and the continual evolution of corporate structures, an operator-in-the-loop process is essential; the platform records per ASN feedback with provenance and incorporates it into periodic refreshes and retraining. The methodology is model-agnostic and stands to improve further as base LLMs advance.
title ASINT: Learning AS-to-Organization Mapping from Internet Metadata
topic Networking and Internet Architecture
url https://arxiv.org/abs/2508.02571