Guardado en:
Detalles Bibliográficos
Autores principales: Eberhardinger, Manuel, Takenaka, Patrick, Grießhaber, Daniel, Maucher, Johannes
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2501.07334
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866909455134752768
author Eberhardinger, Manuel
Takenaka, Patrick
Grießhaber, Daniel
Maucher, Johannes
author_facet Eberhardinger, Manuel
Takenaka, Patrick
Grießhaber, Daniel
Maucher, Johannes
contents The steadily increasing utilization of data-driven methods and approaches in areas that handle sensitive personal information such as in law enforcement mandates an ever increasing effort in these institutions to comply with data protection guidelines. In this work, we present a system for automatically anonymizing images of scanned documents, reducing manual effort while ensuring data protection compliance. Our method considers the viability of further forensic processing after anonymization by minimizing automatically redacted areas by combining automatic detection of sensitive regions with knowledge from a manually anonymized reference document. Using a self-supervised image model for instance retrieval of the reference document, our approach requires only one anonymized example to efficiently redact all documents of the same type, significantly reducing processing time. We show that our approach outperforms both a purely automatic redaction system and also a naive copy-paste scheme of the reference anonymization to other documents on a hand-crafted dataset of ground truth redactions.
format Preprint
id arxiv_https___arxiv_org_abs_2501_07334
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Anonymization of Documents for Law Enforcement with Machine Learning
Eberhardinger, Manuel
Takenaka, Patrick
Grießhaber, Daniel
Maucher, Johannes
Artificial Intelligence
Computer Vision and Pattern Recognition
The steadily increasing utilization of data-driven methods and approaches in areas that handle sensitive personal information such as in law enforcement mandates an ever increasing effort in these institutions to comply with data protection guidelines. In this work, we present a system for automatically anonymizing images of scanned documents, reducing manual effort while ensuring data protection compliance. Our method considers the viability of further forensic processing after anonymization by minimizing automatically redacted areas by combining automatic detection of sensitive regions with knowledge from a manually anonymized reference document. Using a self-supervised image model for instance retrieval of the reference document, our approach requires only one anonymized example to efficiently redact all documents of the same type, significantly reducing processing time. We show that our approach outperforms both a purely automatic redaction system and also a naive copy-paste scheme of the reference anonymization to other documents on a hand-crafted dataset of ground truth redactions.
title Anonymization of Documents for Law Enforcement with Machine Learning
topic Artificial Intelligence
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2501.07334