Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Zhicong, Xu, Hongbin, Xu, Gang, Nie, Ping, Yan, Zhixin, Zheng, Jinkai, Qu, Liangqiong, Li, Ming, Nie, Liqiang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.09588
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908496591585280
author	Wu, Zhicong Xu, Hongbin Xu, Gang Nie, Ping Yan, Zhixin Zheng, Jinkai Qu, Liangqiong Li, Ming Nie, Liqiang
author_facet	Wu, Zhicong Xu, Hongbin Xu, Gang Nie, Ping Yan, Zhixin Zheng, Jinkai Qu, Liangqiong Li, Ming Nie, Liqiang
contents	Recent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accurately reconstructing fine-grained details in complex scenes. To address this limitation, we propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework. By employing a text-guided fusion of diverse semantic cues, our framework learns robust cross-modal feature representations that improve the alignment of geometric and semantic information, producing high-fidelity 3D reconstructions. Specifically, our framework employs three parallel modules to obtain complementary representations: the Diffusion Prior Depth Estimator for accurate depth information, the Semantic Aware Segmentation Network for detailed semantic information, and the Multi-View Interaction Network for refined cross-view features. Then, in the Text-Guided Semantic Fusion Module, these representations are integrated via the text-guided and attention-based feature aggregation mechanism, resulting in enhanced 3D Gaussian parameters enriched with detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance compared to existing methods across multiple evaluation metrics, validating the effectiveness of our framework. The code will be publicly available.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_09588
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting Wu, Zhicong Xu, Hongbin Xu, Gang Nie, Ping Yan, Zhixin Zheng, Jinkai Qu, Liangqiong Li, Ming Nie, Liqiang Computer Vision and Pattern Recognition Artificial Intelligence Recent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accurately reconstructing fine-grained details in complex scenes. To address this limitation, we propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework. By employing a text-guided fusion of diverse semantic cues, our framework learns robust cross-modal feature representations that improve the alignment of geometric and semantic information, producing high-fidelity 3D reconstructions. Specifically, our framework employs three parallel modules to obtain complementary representations: the Diffusion Prior Depth Estimator for accurate depth information, the Semantic Aware Segmentation Network for detailed semantic information, and the Multi-View Interaction Network for refined cross-view features. Then, in the Text-Guided Semantic Fusion Module, these representations are integrated via the text-guided and attention-based feature aggregation mechanism, resulting in enhanced 3D Gaussian parameters enriched with detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance compared to existing methods across multiple evaluation metrics, validating the effectiveness of our framework. The code will be publicly available.
title	TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2504.09588

Similar Items