Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.02872 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910902910976000 |
|---|---|
| author | Bakermans, Ingmar De Pascale, Daniel Marcelino, Gonçalo Cascavilla, Giuseppe Geradts, Zeno |
| author_facet | Bakermans, Ingmar De Pascale, Daniel Marcelino, Gonçalo Cascavilla, Giuseppe Geradts, Zeno |
| contents | Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_02872 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence Bakermans, Ingmar De Pascale, Daniel Marcelino, Gonçalo Cascavilla, Giuseppe Geradts, Zeno Computation and Language Artificial Intelligence Computers and Society Information Retrieval Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance. |
| title | Scraping the Shadows: Deep Learning Breakthroughs in Dark Web Intelligence |
| topic | Computation and Language Artificial Intelligence Computers and Society Information Retrieval |
| url | https://arxiv.org/abs/2504.02872 |