Saved in:
Bibliographic Details
Main Authors: Skøien, Jon Olav, Lampach, Nicolas, Ramos, Helena, Seljak, Rudolf, Koeble, Renate, See, Linda, van der Velde, Marijn
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.17601
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913560920064000
author Skøien, Jon Olav
Lampach, Nicolas
Ramos, Helena
Seljak, Rudolf
Koeble, Renate
See, Linda
van der Velde, Marijn
author_facet Skøien, Jon Olav
Lampach, Nicolas
Ramos, Helena
Seljak, Rudolf
Koeble, Renate
See, Linda
van der Velde, Marijn
contents We develop a flexible approach by combining the Quadtree-based method with suppression to maximize the utility of the grid data and simultaneously to reduce the risk of disclosing private information from individual units. To protect data confidentiality, we produce a high resolution grid from geo-reference data with a minimum size of 1 km nested in grids with increasingly larger resolution on the basis of statistical disclosure control methods (i.e threshold and concentration rule). While our implementation overcomes certain weaknesses of Quadtree-based method by accounting for irregularly distributed and relatively isolated marginal units, it also allows creating joint aggregation of several variables. The method is illustrated by relying on synthetic data of the Danish agricultural census 2020 for a set of key agricultural indicators, such as the number of agricultural holdings, the utilized agricultural area and the number of organic farms. We demonstrate the need to assess the reliability of indicators when using a sub-sample of synthetic data followed by an example that presents the same approach for generating a ratio (i.e., the share of organic farming). The methodology is provided as the open-source \textit{R}-package \textit{MRG} that is adaptable to use with other geo-referenced survey data underlying confidentiality or other privacy restrictions.
format Preprint
id arxiv_https___arxiv_org_abs_2410_17601
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Flexible Approach for Statistical Disclosure Control in Geospatial Data
Skøien, Jon Olav
Lampach, Nicolas
Ramos, Helena
Seljak, Rudolf
Koeble, Renate
See, Linda
van der Velde, Marijn
Methodology
Applications
We develop a flexible approach by combining the Quadtree-based method with suppression to maximize the utility of the grid data and simultaneously to reduce the risk of disclosing private information from individual units. To protect data confidentiality, we produce a high resolution grid from geo-reference data with a minimum size of 1 km nested in grids with increasingly larger resolution on the basis of statistical disclosure control methods (i.e threshold and concentration rule). While our implementation overcomes certain weaknesses of Quadtree-based method by accounting for irregularly distributed and relatively isolated marginal units, it also allows creating joint aggregation of several variables. The method is illustrated by relying on synthetic data of the Danish agricultural census 2020 for a set of key agricultural indicators, such as the number of agricultural holdings, the utilized agricultural area and the number of organic farms. We demonstrate the need to assess the reliability of indicators when using a sub-sample of synthetic data followed by an example that presents the same approach for generating a ratio (i.e., the share of organic farming). The methodology is provided as the open-source \textit{R}-package \textit{MRG} that is adaptable to use with other geo-referenced survey data underlying confidentiality or other privacy restrictions.
title Flexible Approach for Statistical Disclosure Control in Geospatial Data
topic Methodology
Applications
url https://arxiv.org/abs/2410.17601