Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2306.16821 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914718882463744 |
|---|---|
| author | Dasgupta, Subhadra Dette, Holger |
| author_facet | Dasgupta, Subhadra Dette, Holger |
| contents | We propose a novel two-stage subsampling algorithm based on optimal design principles. In the first stage, we use a density-based clustering algorithm to identify an approximating design space for the predictors from an initial subsample. Next, we determine an optimal approximate design on this design space. Finally, we use matrix distances such as the Procrustes, Frobenius, and square-root distance to define the remaining subsample, such that its points are "closest" to the support points of the optimal design. Our approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and applies to a large class of regression models including models where the Fisher information is of rank larger than $1$. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2306_16821 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | Efficient subsampling for exponential family models Dasgupta, Subhadra Dette, Holger Methodology We propose a novel two-stage subsampling algorithm based on optimal design principles. In the first stage, we use a density-based clustering algorithm to identify an approximating design space for the predictors from an initial subsample. Next, we determine an optimal approximate design on this design space. Finally, we use matrix distances such as the Procrustes, Frobenius, and square-root distance to define the remaining subsample, such that its points are "closest" to the support points of the optimal design. Our approach reflects the specific nature of the information matrix as a weighted sum of non-negative definite Fisher information matrices evaluated at the design points and applies to a large class of regression models including models where the Fisher information is of rank larger than $1$. |
| title | Efficient subsampling for exponential family models |
| topic | Methodology |
| url | https://arxiv.org/abs/2306.16821 |