Saved in:
Bibliographic Details
Main Authors: Farronato, Nicola, Avogaro, Niccolo, Frick, Thomas, Rigotti, Mattia, Khan, Rizwan Ullah, Magno, Michele, Schindler, Konrad, Malossi, Cristiano, Scheidegger, Florian
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.18413
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911698218123264
author Farronato, Nicola
Avogaro, Niccolo
Frick, Thomas
Rigotti, Mattia
Khan, Rizwan Ullah
Magno, Michele
Schindler, Konrad
Malossi, Cristiano
Scheidegger, Florian
author_facet Farronato, Nicola
Avogaro, Niccolo
Frick, Thomas
Rigotti, Mattia
Khan, Rizwan Ullah
Magno, Michele
Schindler, Konrad
Malossi, Cristiano
Scheidegger, Florian
contents Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity of data, which requires costly expert annotation. The need for data is accentuated by algorithmic hurdles intrinsic to the problem, including center-bias and the need to rely more on shape when inspecting nearly textureless building materials. To remove the bottleneck, we introduce Cracks in the Foundation (CiF), the largest and most detailed civil infrastructure (instance) segmentation dataset to date, comprising $\approx$150,000 high-resolution images meticulously curated over five years in collaboration with civil engineering experts. With the help of this unprecedented data source, we expose a blind spot of current visual AI: despite the advent of promptable Foundation Models (FMs) and Vision Language Models (VLMs), and despite the impressive abilities of today's specialised segmentation models, it turns out that dense image understanding in the built environment is nowhere near solved. Our evaluations indicate that even the most recent zero-shot FMs face significant challenges when deployed on real-world infrastructure and even the performance of specialised models with domain-specific supervision plateaus at $\approx$25% mAP. CiF establishes inspection of civil infrastructure, an elementary and seemingly easy perceptual task, as an open challenge that reveals fundamental weaknesses of present-day models trained predominantly on internet images, literally and figuratively highlighting cracks in the current foundation model paradigm.
format Preprint
id arxiv_https___arxiv_org_abs_2605_18413
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
Farronato, Nicola
Avogaro, Niccolo
Frick, Thomas
Rigotti, Mattia
Khan, Rizwan Ullah
Magno, Michele
Schindler, Konrad
Malossi, Cristiano
Scheidegger, Florian
Computer Vision and Pattern Recognition
Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity of data, which requires costly expert annotation. The need for data is accentuated by algorithmic hurdles intrinsic to the problem, including center-bias and the need to rely more on shape when inspecting nearly textureless building materials. To remove the bottleneck, we introduce Cracks in the Foundation (CiF), the largest and most detailed civil infrastructure (instance) segmentation dataset to date, comprising $\approx$150,000 high-resolution images meticulously curated over five years in collaboration with civil engineering experts. With the help of this unprecedented data source, we expose a blind spot of current visual AI: despite the advent of promptable Foundation Models (FMs) and Vision Language Models (VLMs), and despite the impressive abilities of today's specialised segmentation models, it turns out that dense image understanding in the built environment is nowhere near solved. Our evaluations indicate that even the most recent zero-shot FMs face significant challenges when deployed on real-world infrastructure and even the performance of specialised models with domain-specific supervision plateaus at $\approx$25% mAP. CiF establishes inspection of civil infrastructure, an elementary and seemingly easy perceptual task, as an open challenge that reveals fundamental weaknesses of present-day models trained predominantly on internet images, literally and figuratively highlighting cracks in the current foundation model paradigm.
title Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2605.18413