Saved in:
Bibliographic Details
Main Authors: Sun, Mimi, Kamath, Chaitanya, Agarwal, Mohit, Muslim, Arbaaz, Yee, Hector, Schottlander, David, Bavadekar, Shailesh, Efron, Niv, Shetty, Shravya, Prasad, Gautam
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.22721
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914997787951104
author Sun, Mimi
Kamath, Chaitanya
Agarwal, Mohit
Muslim, Arbaaz
Yee, Hector
Schottlander, David
Bavadekar, Shailesh
Efron, Niv
Shetty, Shravya
Prasad, Gautam
author_facet Sun, Mimi
Kamath, Chaitanya
Agarwal, Mohit
Muslim, Arbaaz
Yee, Hector
Schottlander, David
Bavadekar, Shailesh
Efron, Niv
Shetty, Shravya
Prasad, Gautam
contents Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used in time series modeling across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In zip codes with a population greater than 3000 that cover over 95% of the contiguous US population, our models for predicting missing values in a 20% set of holdout counties achieve an average $R^2$ score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features.
format Preprint
id arxiv_https___arxiv_org_abs_2410_22721
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Community search signatures as foundation features for human-centered geospatial modeling
Sun, Mimi
Kamath, Chaitanya
Agarwal, Mohit
Muslim, Arbaaz
Yee, Hector
Schottlander, David
Bavadekar, Shailesh
Efron, Niv
Shetty, Shravya
Prasad, Gautam
Machine Learning
Aggregated relative search frequencies offer a unique composite signal reflecting people's habits, concerns, interests, intents, and general information needs, which are not found in other readily available datasets. Temporal search trends have been successfully used in time series modeling across a variety of domains such as infectious diseases, unemployment rates, and retail sales. However, most existing applications require curating specialized datasets of individual keywords, queries, or query clusters, and the search data need to be temporally aligned with the outcome variable of interest. We propose a novel approach for generating an aggregated and anonymized representation of search interest as foundation features at the community level for geospatial modeling. We benchmark these features using spatial datasets across multiple domains. In zip codes with a population greater than 3000 that cover over 95% of the contiguous US population, our models for predicting missing values in a 20% set of holdout counties achieve an average $R^2$ score of 0.74 across 21 health variables, and 0.80 across 6 demographic and environmental variables. Our results demonstrate that these search features can be used for spatial predictions without strict temporal alignment, and that the resulting models outperform spatial interpolation and state of the art methods using satellite imagery features.
title Community search signatures as foundation features for human-centered geospatial modeling
topic Machine Learning
url https://arxiv.org/abs/2410.22721