Guardado en:
| Autores principales: | Bieri, Valentin, Zamboni, Marco, Blumer, Nicolas S., Chen, Qingxuan, Engelmann, Francis |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2503.16776 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild
por: Bieri, Valentin, et al.
Publicado: (2025)
por: Bieri, Valentin, et al.
Publicado: (2025)
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
por: Yilmaz, Gonca, et al.
Publicado: (2024)
por: Yilmaz, Gonca, et al.
Publicado: (2024)
What does CLIP know about peeling a banana?
por: Cuttano, Claudia, et al.
Publicado: (2024)
por: Cuttano, Claudia, et al.
Publicado: (2024)
CityCube: Benchmarking Cross-view Spatial Reasoning on Vision-Language Models in Urban Environments
por: Xu, Haotian, et al.
Publicado: (2026)
por: Xu, Haotian, et al.
Publicado: (2026)
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
por: Takmaz, Ayca, et al.
Publicado: (2024)
por: Takmaz, Ayca, et al.
Publicado: (2024)
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
por: Zheng, Jianhao, et al.
Publicado: (2025)
por: Zheng, Jianhao, et al.
Publicado: (2025)
UrbanWorld: An Urban World Model for 3D City Generation
por: Shang, Yu, et al.
Publicado: (2024)
por: Shang, Yu, et al.
Publicado: (2024)
OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views
por: Engelmann, Francis, et al.
Publicado: (2024)
por: Engelmann, Francis, et al.
Publicado: (2024)
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
por: Ramos, Ryan, et al.
Publicado: (2025)
por: Ramos, Ryan, et al.
Publicado: (2025)
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
por: Zhang, Chenyangguang, et al.
Publicado: (2025)
por: Zhang, Chenyangguang, et al.
Publicado: (2025)
Hierarchical and Holistic Open-Vocabulary Functional 3D Scene Graphs for Indoor Spaces
por: Hu, Xinggang, et al.
Publicado: (2026)
por: Hu, Xinggang, et al.
Publicado: (2026)
What Matters for Grocery Product Retrieval with Open Source Vision Language Models
por: Maminta, Emmanuel G., et al.
Publicado: (2026)
por: Maminta, Emmanuel G., et al.
Publicado: (2026)
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
por: Etchegaray, Djamahl, et al.
Publicado: (2024)
por: Etchegaray, Djamahl, et al.
Publicado: (2024)
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
por: Li, Zhonghang, et al.
Publicado: (2024)
por: Li, Zhonghang, et al.
Publicado: (2024)
OpenCity: A Scalable Platform to Simulate Urban Activities with Massive LLM Agents
por: Yan, Yuwei, et al.
Publicado: (2024)
por: Yan, Yuwei, et al.
Publicado: (2024)
VLM-3D:End-to-End Vision-Language Models for Open-World 3D Perception
por: Chang, Fuhao, et al.
Publicado: (2025)
por: Chang, Fuhao, et al.
Publicado: (2025)
StyleCity: Large-Scale 3D Urban Scenes Stylization
por: Chen, Yingshu, et al.
Publicado: (2024)
por: Chen, Yingshu, et al.
Publicado: (2024)
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
por: Yue, Yuanwen, et al.
Publicado: (2024)
por: Yue, Yuanwen, et al.
Publicado: (2024)
SuperDec: 3D Scene Decomposition with Superquadric Primitives
por: Fedele, Elisabetta, et al.
Publicado: (2025)
por: Fedele, Elisabetta, et al.
Publicado: (2025)
CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios
por: Xu, Jialei, et al.
Publicado: (2025)
por: Xu, Jialei, et al.
Publicado: (2025)
Video Perception Models for 3D Scene Synthesis
por: Huang, Rui, et al.
Publicado: (2025)
por: Huang, Rui, et al.
Publicado: (2025)
VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing
por: Zong, Junyi, et al.
Publicado: (2026)
por: Zong, Junyi, et al.
Publicado: (2026)
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
por: Fedele, Elisabetta, et al.
Publicado: (2025)
por: Fedele, Elisabetta, et al.
Publicado: (2025)
A Review of 3D Object Detection with Vision-Language Models
por: Sapkota, Ranjan, et al.
Publicado: (2025)
por: Sapkota, Ranjan, et al.
Publicado: (2025)
SLAG: Scalable Language-Augmented Gaussian Splatting
por: Szilagyi, Laszlo, et al.
Publicado: (2025)
por: Szilagyi, Laszlo, et al.
Publicado: (2025)
OpenUrban3D: Annotation-Free Open-Vocabulary Semantic Segmentation of Large-Scale Urban Point Clouds
por: Wang, Chongyu, et al.
Publicado: (2025)
por: Wang, Chongyu, et al.
Publicado: (2025)
3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
por: Xiao, Zihao, et al.
Publicado: (2024)
por: Xiao, Zihao, et al.
Publicado: (2024)
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
por: Guruprasad, Pranav, et al.
Publicado: (2025)
por: Guruprasad, Pranav, et al.
Publicado: (2025)
Vision-Based Localization in Dense Urban Environments: A Case Study of an Urban Village in China
por: Wu, Menglin, et al.
Publicado: (2026)
por: Wu, Menglin, et al.
Publicado: (2026)
CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning
por: Liu, Tianhui, et al.
Publicado: (2025)
por: Liu, Tianhui, et al.
Publicado: (2025)
De-rendering, Reasoning, and Repairing Charts with Vision-Language Models
por: Bonas, Valentin, et al.
Publicado: (2026)
por: Bonas, Valentin, et al.
Publicado: (2026)
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
por: Li, Anqi, et al.
Publicado: (2025)
por: Li, Anqi, et al.
Publicado: (2025)
A Training-Free Guess What Vision Language Model from Snippets to Open-Vocabulary Object Detection
por: Zhu, Guiying, et al.
Publicado: (2026)
por: Zhu, Guiying, et al.
Publicado: (2026)
Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs
por: Qiao, Yanyuan, et al.
Publicado: (2024)
por: Qiao, Yanyuan, et al.
Publicado: (2024)
Risk Assessment for Autonomous Landing in Urban Environments using Semantic Segmentation
por: Loera-Ponce, Jesús Alejandro, et al.
Publicado: (2024)
por: Loera-Ponce, Jesús Alejandro, et al.
Publicado: (2024)
SpatialFly: Geometry-Guided Representation Alignment for UAV Vision-and-Language Navigation in Urban Environments
por: Jiang, Wen, et al.
Publicado: (2026)
por: Jiang, Wen, et al.
Publicado: (2026)
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
por: Ji, Guangda, et al.
Publicado: (2024)
por: Ji, Guangda, et al.
Publicado: (2024)
3D Gaussian Map with Open-Set Semantic Grouping for Vision-Language Navigation
por: Gao, Jianzhe, et al.
Publicado: (2026)
por: Gao, Jianzhe, et al.
Publicado: (2026)
DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments
por: Mohamad, Mahmud A., et al.
Publicado: (2024)
por: Mohamad, Mahmud A., et al.
Publicado: (2024)
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
por: Xie, Haozhe, et al.
Publicado: (2023)
por: Xie, Haozhe, et al.
Publicado: (2023)
Ejemplares similares
-
HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild
por: Bieri, Valentin, et al.
Publicado: (2025) -
OpenDAS: Open-Vocabulary Domain Adaptation for 2D and 3D Segmentation
por: Yilmaz, Gonca, et al.
Publicado: (2024) -
What does CLIP know about peeling a banana?
por: Cuttano, Claudia, et al.
Publicado: (2024) -
CityCube: Benchmarking Cross-view Spatial Reasoning on Vision-Language Models in Urban Environments
por: Xu, Haotian, et al.
Publicado: (2026) -
Search3D: Hierarchical Open-Vocabulary 3D Segmentation
por: Takmaz, Ayca, et al.
Publicado: (2024)