Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Bakermans, Ingmar, De Pascale, Daniel, Marcelino, Gonçalo, Cascavilla, Giuseppe, Geradts, Zeno
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Computers and Society Information Retrieval
Online Access:	https://arxiv.org/abs/2504.02872
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910902910976000
author	Bakermans, Ingmar De Pascale, Daniel Marcelino, Gonçalo Cascavilla, Giuseppe Geradts, Zeno
author_facet	Bakermans, Ingmar De Pascale, Daniel Marcelino, Gonçalo Cascavilla, Giuseppe Geradts, Zeno
contents	Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_02872
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence Bakermans, Ingmar De Pascale, Daniel Marcelino, Gonçalo Cascavilla, Giuseppe Geradts, Zeno Computation and Language Artificial Intelligence Computers and Society Information Retrieval Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.
title	Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence
topic	Computation and Language Artificial Intelligence Computers and Society Information Retrieval
url	https://arxiv.org/abs/2504.02872

Similar Items