Saved in:
Bibliographic Details
Main Authors: Gümmer, Paul, Rosenberger, Julian, Kraus, Mathias, Zschech, Patrick, Hambauer, Nico
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2508.03156
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915428170727424
author Gümmer, Paul
Rosenberger, Julian
Kraus, Mathias
Zschech, Patrick
Hambauer, Nico
author_facet Gümmer, Paul
Rosenberger, Julian
Kraus, Mathias
Zschech, Patrick
Hambauer, Nico
contents House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These findings emphasize the importance of cluster-specific insights, enhancing interpretability and offering practical value for buyers, sellers, and real estate analysts seeking more reliable property valuations.
format Preprint
id arxiv_https___arxiv_org_abs_2508_03156
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions
Gümmer, Paul
Rosenberger, Julian
Kraus, Mathias
Zschech, Patrick
Hambauer, Nico
Machine Learning
House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These findings emphasize the importance of cluster-specific insights, enhancing interpretability and offering practical value for buyers, sellers, and real estate analysts seeking more reliable property valuations.
title Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions
topic Machine Learning
url https://arxiv.org/abs/2508.03156