:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Bieri, Valentin, Zamboni, Marco, Blumer, Nicolas S., Chen, Qingxuan, Engelmann, Francis
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2503.16776
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild
por: Bieri, Valentin, et al.
Publicado: (2025)

OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
por: Yilmaz, Gonca, et al.
Publicado: (2024)

What does CLIP know about peeling a banana?
por: Cuttano, Claudia, et al.
Publicado: (2024)

CityCube: Benchmarking Cross-view Spatial Reasoning on Vision-Language Models in Urban Environments
por: Xu, Haotian, et al.
Publicado: (2026)

Search3D: Hierarchical Open-Vocabulary 3D Segmentation
por: Takmaz, Ayca, et al.
Publicado: (2024)

WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
por: Zheng, Jianhao, et al.
Publicado: (2025)

UrbanWorld: An Urban World Model for 3D City Generation
por: Shang, Yu, et al.
Publicado: (2024)

OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views
por: Engelmann, Francis, et al.
Publicado: (2024)

Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
por: Ramos, Ryan, et al.
Publicado: (2025)

Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
por: Zhang, Chenyangguang, et al.
Publicado: (2025)

Hierarchical and Holistic Open-Vocabulary Functional 3D Scene Graphs for Indoor Spaces
por: Hu, Xinggang, et al.
Publicado: (2026)

What Matters for Grocery Product Retrieval with Open Source Vision Language Models
por: Maminta, Emmanuel G., et al.
Publicado: (2026)

Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
por: Etchegaray, Djamahl, et al.
Publicado: (2024)

OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
por: Li, Zhonghang, et al.
Publicado: (2024)

OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents
por: Yan, Yuwei, et al.
Publicado: (2024)

VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
por: Chang, Fuhao, et al.
Publicado: (2025)

StyleCity: Large-Scale 3D Urban Scenes Stylization
por: Chen, Yingshu, et al.
Publicado: (2024)

Improving 2D Feature Representations by 3D-Aware Fine-Tuning
por: Yue, Yuanwen, et al.
Publicado: (2024)

SuperDec: 3D Scene Decomposition with Superquadric Primitives
por: Fedele, Elisabetta, et al.
Publicado: (2025)

CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios
por: Xu, Jialei, et al.
Publicado: (2025)

Video Perception Models for 3D Scene Synthesis
por: Huang, Rui, et al.
Publicado: (2025)

VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing
por: Zong, Junyi, et al.
Publicado: (2026)

SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
por: Fedele, Elisabetta, et al.
Publicado: (2025)

A Review of 3D Object Detection with Vision-Language Models
por: Sapkota, Ranjan, et al.
Publicado: (2025)

SLAG: Scalable Language-Augmented Gaussian Splatting
por: Szilagyi, Laszlo, et al.
Publicado: (2025)

OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds
por: Wang, Chongyu, et al.
Publicado: (2025)

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
por: Xiao, Zihao, et al.
Publicado: (2024)

Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
por: Guruprasad, Pranav, et al.
Publicado: (2025)

Vision-Based Localization in Dense Urban Environments: A Case Study of an Urban Village in China
por: Wu, Menglin, et al.
Publicado: (2026)

CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning
por: Liu, Tianhui, et al.
Publicado: (2025)

De-rendering, Reasoning, and Repairing Charts with Vision-Language Models
por: Bonas, Valentin, et al.
Publicado: (2026)

UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
por: Li, Anqi, et al.
Publicado: (2025)

A Training-Free Guess What Vision Language Model from Snippets to Open-Vocabulary Object Detection
por: Zhu, Guiying, et al.
Publicado: (2026)

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
por: Qiao, Yanyuan, et al.
Publicado: (2024)

Risk Assessment for Autonomous Landing in Urban Environments using Semantic Segmentation
por: Loera-Ponce, Jesús Alejandro, et al.
Publicado: (2024)

SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments
por: Jiang, Wen, et al.
Publicado: (2026)

ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
por: Ji, Guangda, et al.
Publicado: (2024)

3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
por: Gao, Jianzhe, et al.
Publicado: (2026)

DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments
por: Mohamad, Mahmud A., et al.
Publicado: (2024)

CityDreamer: Compositional Generative Model of Unbounded 3D Cities
por: Xie, Haozhe, et al.
Publicado: (2023)