Saved in:
Bibliographic Details
Main Authors: Dasgupta, Subhadra, Dette, Holger
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2306.16821
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914718882463744
author Dasgupta, Subhadra
Dette, Holger
author_facet Dasgupta, Subhadra
Dette, Holger
contents We propose a novel two-stage subsampling algorithm based on optimal design principles. In the first stage, we use a density-based clustering algorithm to identify an approximating design space for the predictors from an initial subsample. Next, we determine an optimal approximate design on this design space. Finally, we use matrix distances such as the Procrustes, Frobenius, and square-root distance to define the remaining subsample, such that its points are "closest" to the support points of the optimal design. Our approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and applies to a large class of regression models including models where the Fisher information is of rank larger than $1$.
format Preprint
id arxiv_https___arxiv_org_abs_2306_16821
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Efficient subsampling for exponential family models
Dasgupta, Subhadra
Dette, Holger
Methodology
We propose a novel two-stage subsampling algorithm based on optimal design principles. In the first stage, we use a density-based clustering algorithm to identify an approximating design space for the predictors from an initial subsample. Next, we determine an optimal approximate design on this design space. Finally, we use matrix distances such as the Procrustes, Frobenius, and square-root distance to define the remaining subsample, such that its points are "closest" to the support points of the optimal design. Our approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and applies to a large class of regression models including models where the Fisher information is of rank larger than $1$.
title Efficient subsampling for exponential family models
topic Methodology
url https://arxiv.org/abs/2306.16821