Saved in:
Bibliographic Details
Main Authors: Wu, Zhicong, Xu, Hongbin, Xu, Gang, Nie, Ping, Yan, Zhixin, Zheng, Jinkai, Qu, Liangqiong, Li, Ming, Nie, Liqiang
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.09588
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908496591585280
author Wu, Zhicong
Xu, Hongbin
Xu, Gang
Nie, Ping
Yan, Zhixin
Zheng, Jinkai
Qu, Liangqiong
Li, Ming
Nie, Liqiang
author_facet Wu, Zhicong
Xu, Hongbin
Xu, Gang
Nie, Ping
Yan, Zhixin
Zheng, Jinkai
Qu, Liangqiong
Li, Ming
Nie, Liqiang
contents Recent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accurately reconstructing fine-grained details in complex scenes. To address this limitation, we propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework. By employing a text-guided fusion of diverse semantic cues, our framework learns robust cross-modal feature representations that improve the alignment of geometric and semantic information, producing high-fidelity 3D reconstructions. Specifically, our framework employs three parallel modules to obtain complementary representations: the Diffusion Prior Depth Estimator for accurate depth information, the Semantic Aware Segmentation Network for detailed semantic information, and the Multi-View Interaction Network for refined cross-view features. Then, in the Text-Guided Semantic Fusion Module, these representations are integrated via the text-guided and attention-based feature aggregation mechanism, resulting in enhanced 3D Gaussian parameters enriched with detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance compared to existing methods across multiple evaluation metrics, validating the effectiveness of our framework. The code will be publicly available.
format Preprint
id arxiv_https___arxiv_org_abs_2504_09588
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
Wu, Zhicong
Xu, Hongbin
Xu, Gang
Nie, Ping
Yan, Zhixin
Zheng, Jinkai
Qu, Liangqiong
Li, Ming
Nie, Liqiang
Computer Vision and Pattern Recognition
Artificial Intelligence
Recent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accurately reconstructing fine-grained details in complex scenes. To address this limitation, we propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework. By employing a text-guided fusion of diverse semantic cues, our framework learns robust cross-modal feature representations that improve the alignment of geometric and semantic information, producing high-fidelity 3D reconstructions. Specifically, our framework employs three parallel modules to obtain complementary representations: the Diffusion Prior Depth Estimator for accurate depth information, the Semantic Aware Segmentation Network for detailed semantic information, and the Multi-View Interaction Network for refined cross-view features. Then, in the Text-Guided Semantic Fusion Module, these representations are integrated via the text-guided and attention-based feature aggregation mechanism, resulting in enhanced 3D Gaussian parameters enriched with detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance compared to existing methods across multiple evaluation metrics, validating the effectiveness of our framework. The code will be publicly available.
title TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2504.09588