Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Farronato, Nicola, Avogaro, Niccolo, Frick, Thomas, Rigotti, Mattia, Khan, Rizwan Ullah, Magno, Michele, Schindler, Konrad, Malossi, Cristiano, Scheidegger, Florian
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2605.18413
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911698218123264
author	Farronato, Nicola Avogaro, Niccolo Frick, Thomas Rigotti, Mattia Khan, Rizwan Ullah Magno, Michele Schindler, Konrad Malossi, Cristiano Scheidegger, Florian
author_facet	Farronato, Nicola Avogaro, Niccolo Frick, Thomas Rigotti, Mattia Khan, Rizwan Ullah Magno, Michele Schindler, Konrad Malossi, Cristiano Scheidegger, Florian
contents	Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity of data, which requires costly expert annotation. The need for data is accentuated by algorithmic hurdles intrinsic to the problem, including center-bias and the need to rely more on shape when inspecting nearly textureless building materials. To remove the bottleneck, we introduce Cracks in the Foundation (CiF), the largest and most detailed civil infrastructure (instance) segmentation dataset to date, comprising $\approx$150,000 high-resolution images meticulously curated over five years in collaboration with civil engineering experts. With the help of this unprecedented data source, we expose a blind spot of current visual AI: despite the advent of promptable Foundation Models (FMs) and Vision Language Models (VLMs), and despite the impressive abilities of today's specialised segmentation models, it turns out that dense image understanding in the built environment is nowhere near solved. Our evaluations indicate that even the most recent zero-shot FMs face significant challenges when deployed on real-world infrastructure and even the performance of specialised models with domain-specific supervision plateaus at $\approx$25% mAP. CiF establishes inspection of civil infrastructure, an elementary and seemingly easy perceptual task, as an open challenge that reveals fundamental weaknesses of present-day models trained predominantly on internet images, literally and figuratively highlighting cracks in the current foundation model paradigm.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_18413
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models Farronato, Nicola Avogaro, Niccolo Frick, Thomas Rigotti, Mattia Khan, Rizwan Ullah Magno, Michele Schindler, Konrad Malossi, Cristiano Scheidegger, Florian Computer Vision and Pattern Recognition Automated structural health monitoring is essential to prevent catastrophic infrastructure failures. Precise, pixel-level defect segmentation is needed to accurately assess structural integrity, but progress in defect segmentation for civil infrastructures has been held back by an extreme scarcity of data, which requires costly expert annotation. The need for data is accentuated by algorithmic hurdles intrinsic to the problem, including center-bias and the need to rely more on shape when inspecting nearly textureless building materials. To remove the bottleneck, we introduce Cracks in the Foundation (CiF), the largest and most detailed civil infrastructure (instance) segmentation dataset to date, comprising $\approx$150,000 high-resolution images meticulously curated over five years in collaboration with civil engineering experts. With the help of this unprecedented data source, we expose a blind spot of current visual AI: despite the advent of promptable Foundation Models (FMs) and Vision Language Models (VLMs), and despite the impressive abilities of today's specialised segmentation models, it turns out that dense image understanding in the built environment is nowhere near solved. Our evaluations indicate that even the most recent zero-shot FMs face significant challenges when deployed on real-world infrastructure and even the performance of specialised models with domain-specific supervision plateaus at $\approx$25% mAP. CiF establishes inspection of civil infrastructure, an elementary and seemingly easy perceptual task, as an open challenge that reveals fundamental weaknesses of present-day models trained predominantly on internet images, literally and figuratively highlighting cracks in the current foundation model paradigm.
title	Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2605.18413

Similar Items