Salvato in:
Dettagli Bibliografici
Autori principali: Zhang, Xinyao, Wang, Rui, Cui, Jinhao, Huang, Haotian, Xue, Wei, Hu, Wenhua, Xiang, Jianwen, Hao, Rui
Natura: Preprint
Pubblicazione: 2026
Soggetti:
Accesso online:https://arxiv.org/abs/2604.19081
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866914494351933440
author Zhang, Xinyao
Wang, Rui
Cui, Jinhao
Huang, Haotian
Xue, Wei
Hu, Wenhua
Xiang, Jianwen
Hao, Rui
author_facet Zhang, Xinyao
Wang, Rui
Cui, Jinhao
Huang, Haotian
Xue, Wei
Hu, Wenhua
Xiang, Jianwen
Hao, Rui
contents Multi-window mobile scenarios, such as split-screen and foldable modes, make GUI display defects more likely by forcing applications to adapt to changing window sizes and dynamic layout reflow. Existing detection techniques are limited in two ways: they are largely passive, analyzing screenshots only after problematic states have been reached, and they are mainly designed for conventional full-screen interfaces, making them less effective in multi-window settings.We propose an end-to-end framework for GUI display defect detection in multi-window mobile scenarios. The framework proactively triggers split-screen, foldable, and window-transition states during app exploration, uses Set-of-Mark (SoM) to align screenshots with widget-level interface elements, and leverages multimodal large language models with chain-of-thought prompting to detect, localize, and explain display defects. We also construct a benchmark of GUI display defects using 50 real-world Android applications.Experimental results show that multi-window settings substantially increase the exposure of layout-related defects, with text truncation increasing by 184% compared with conventional full-screen settings. At the application level, our method detects 40 defect-prone apps with a false positive rate of 10.00% and a false negative rate of 11.11%, outperforming OwlEye and YOLO-based baselines. At the fine-grained level, it achieves the best F1 score of 87.2% for widget occlusion detection.
format Preprint
id arxiv_https___arxiv_org_abs_2604_19081
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning
Zhang, Xinyao
Wang, Rui
Cui, Jinhao
Huang, Haotian
Xue, Wei
Hu, Wenhua
Xiang, Jianwen
Hao, Rui
Software Engineering
Multi-window mobile scenarios, such as split-screen and foldable modes, make GUI display defects more likely by forcing applications to adapt to changing window sizes and dynamic layout reflow. Existing detection techniques are limited in two ways: they are largely passive, analyzing screenshots only after problematic states have been reached, and they are mainly designed for conventional full-screen interfaces, making them less effective in multi-window settings.We propose an end-to-end framework for GUI display defect detection in multi-window mobile scenarios. The framework proactively triggers split-screen, foldable, and window-transition states during app exploration, uses Set-of-Mark (SoM) to align screenshots with widget-level interface elements, and leverages multimodal large language models with chain-of-thought prompting to detect, localize, and explain display defects. We also construct a benchmark of GUI display defects using 50 real-world Android applications.Experimental results show that multi-window settings substantially increase the exposure of layout-related defects, with text truncation increasing by 184% compared with conventional full-screen settings. At the application level, our method detects 40 defect-prone apps with a false positive rate of 10.00% and a false negative rate of 11.11%, outperforming OwlEye and YOLO-based baselines. At the fine-grained level, it achieves the best F1 score of 87.2% for widget occlusion detection.
title Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning
topic Software Engineering
url https://arxiv.org/abs/2604.19081