Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lee, Jeongah, Sarvghad, Ali
Format:	Preprint
Published:	2025
Subjects:	Human-Computer Interaction
Online Access:	https://arxiv.org/abs/2511.03478
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915600095248384
author	Lee, Jeongah Sarvghad, Ali
author_facet	Lee, Jeongah Sarvghad, Ali
contents	Large multimodal models (LMMs) are increasingly capable of interpreting visualizations, yet they continue to struggle with spatial reasoning. One proposed strategy is decomposition, which breaks down complex visualizations into structured components. In this work, we examine the efficacy of scalable vector graphics (SVGs) as a decomposition strategy for improving LMMs' performance on floor plans comprehension. Floor plans serve as a valuable testbed because they combine geometry, topology, and semantics, and their reliable comprehension has real-world applications, such as accessibility for blind and low-vision individuals. We conducted an exploratory study with three LMMs (GPT-4o, Claude 3.7 Sonnet, and Llama 3.2 11B Vision Instruct) across 75 floor plans. Results show that combining SVG with raster input (SVG+PNG) improves performance on spatial understanding tasks but often hinders spatial reasoning, particularly in pathfinding. These findings highlight both the promise and limitations of decomposition as a strategy for advancing spatial visualization comprehension.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_03478
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	SVG Decomposition for Enhancing Large Multimodal Models Visualization Comprehension: A Study with Floor Plans Lee, Jeongah Sarvghad, Ali Human-Computer Interaction Large multimodal models (LMMs) are increasingly capable of interpreting visualizations, yet they continue to struggle with spatial reasoning. One proposed strategy is decomposition, which breaks down complex visualizations into structured components. In this work, we examine the efficacy of scalable vector graphics (SVGs) as a decomposition strategy for improving LMMs' performance on floor plans comprehension. Floor plans serve as a valuable testbed because they combine geometry, topology, and semantics, and their reliable comprehension has real-world applications, such as accessibility for blind and low-vision individuals. We conducted an exploratory study with three LMMs (GPT-4o, Claude 3.7 Sonnet, and Llama 3.2 11B Vision Instruct) across 75 floor plans. Results show that combining SVG with raster input (SVG+PNG) improves performance on spatial understanding tasks but often hinders spatial reasoning, particularly in pathfinding. These findings highlight both the promise and limitations of decomposition as a strategy for advancing spatial visualization comprehension.
title	SVG Decomposition for Enhancing Large Multimodal Models Visualization Comprehension: A Study with Floor Plans
topic	Human-Computer Interaction
url	https://arxiv.org/abs/2511.03478

Similar Items