Saved in:
Bibliographic Details
Main Authors: Bakermans, Ingmar, De Pascale, Daniel, Marcelino, Gonçalo, Cascavilla, Giuseppe, Geradts, Zeno
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.02872
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910902910976000
author Bakermans, Ingmar
De Pascale, Daniel
Marcelino, Gonçalo
Cascavilla, Giuseppe
Geradts, Zeno
author_facet Bakermans, Ingmar
De Pascale, Daniel
Marcelino, Gonçalo
Cascavilla, Giuseppe
Geradts, Zeno
contents Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.
format Preprint
id arxiv_https___arxiv_org_abs_2504_02872
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence
Bakermans, Ingmar
De Pascale, Daniel
Marcelino, Gonçalo
Cascavilla, Giuseppe
Geradts, Zeno
Computation and Language
Artificial Intelligence
Computers and Society
Information Retrieval
Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.
title Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence
topic Computation and Language
Artificial Intelligence
Computers and Society
Information Retrieval
url https://arxiv.org/abs/2504.02872