:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chharia, Aviral, Ren, Tianyu, Furuhata, Tomotake, Shimada, Kenji
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Robotics
Online-Zugang:	https://arxiv.org/abs/2504.10880
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
von: Chharia, Aviral, et al.
Veröffentlicht: (2025)

Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation
von: Chharia, Aviral, et al.
Veröffentlicht: (2026)

Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
von: Dong, Haoye, et al.
Veröffentlicht: (2024)

Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models
von: Liu, Zhibin, et al.
Veröffentlicht: (2024)

Mono-Hydra++: Real-Time Monocular Scene Graph Construction with Multi-Task Learning for 3D Indoor Mapping
von: Udugama, U. V. B. L., et al.
Veröffentlicht: (2026)

A Computer Vision Approach for Autonomous Cars to Drive Safe at Construction Zone
von: Ahammed, Abu Shad, et al.
Veröffentlicht: (2024)

DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
von: Xie, Qinghongbing, et al.
Veröffentlicht: (2025)

DVPE: Divided View Position Embedding for Multi-View 3D Object Detection
von: Wang, Jiasen, et al.
Veröffentlicht: (2024)

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images
von: Yu, Junqiu, et al.
Veröffentlicht: (2024)

UniScale: Unified Scale-Aware 3D Reconstruction for Multi-View Understanding via Prior Injection for Robotic Perception
von: Mahdavian, Mohammad, et al.
Veröffentlicht: (2026)

MM-Nav: Multi-View VLA Model for Robust Visual Navigation via Multi-Expert Learning
von: Xu, Tianyu, et al.
Veröffentlicht: (2025)

LS-HAR: Language Supervised Human Action Recognition with Salient Fusion, Construction Sites as a Use-Case
von: Mahdavian, Mohammad, et al.
Veröffentlicht: (2024)

MAG-VLAQ: Multi-modal Aerial-Ground Query Aggregation for Cross-View Place Recognition
von: Xu, Zhengyi, et al.
Veröffentlicht: (2026)

Segmentation Dataset for Reinforced Concrete Construction
von: Schmidt, Patrick, et al.
Veröffentlicht: (2024)

SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection
von: Papais, Sandro, et al.
Veröffentlicht: (2026)

LiDAR-EVS: Enhance Extrapolated View Synthesis for 3D Gaussian Splatting with Pseudo-LiDAR Supervision
von: Huang, Yiming, et al.
Veröffentlicht: (2026)

Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition
von: Shang, Tianyi, et al.
Veröffentlicht: (2025)

Robotic Arm Platform for Multi-View Image Acquisition and 3D Reconstruction in Minimally Invasive Surgery
von: Saikia, Alexander, et al.
Veröffentlicht: (2024)

Systematic Evaluation of Novel View Synthesis for Video Place Recognition
von: Mahmud, Muhammad Zawad, et al.
Veröffentlicht: (2026)

FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View
von: Hou, Jiawei, et al.
Veröffentlicht: (2024)

SLAM for Indoor Mapping of Wide Area Construction Environments
von: Ress, Vincent, et al.
Veröffentlicht: (2024)

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model
von: Li, Peiyan, et al.
Veröffentlicht: (2026)

Efficient Multi-Task Scene Analysis with RGB-D Transformers
von: Fischedick, Söhnke Benedikt, et al.
Veröffentlicht: (2023)

LM-MCVT: A Lightweight Multi-modal Multi-view Convolutional-Vision Transformer Approach for 3D Object Recognition
von: Xiong, Songsong, et al.
Veröffentlicht: (2025)

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching
von: Yao, Gongxin, et al.
Veröffentlicht: (2024)

InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios
von: Liu, Zeyi, et al.
Veröffentlicht: (2026)

MrGS: Multi-modal Radiance Fields with 3D Gaussian Splatting for RGB-Thermal Novel View Synthesis
von: Kweon, Minseong, et al.
Veröffentlicht: (2025)

DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation
von: Kim, Young Hun, et al.
Veröffentlicht: (2025)

LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving
von: Mohapatra, Sambit, et al.
Veröffentlicht: (2023)

Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
von: Huang, Tianyu, et al.
Veröffentlicht: (2024)

DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection
von: Huang, Zhe, et al.
Veröffentlicht: (2024)

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
von: Ma, Xianzheng, et al.
Veröffentlicht: (2024)

Are Open-Vocabulary Models Ready for Detection of MEP Elements on Construction Sites
von: Abdalwhab, Abdalwhab, et al.
Veröffentlicht: (2025)

GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats
von: Adebola, Simeon, et al.
Veröffentlicht: (2025)

BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring
von: Bikandi-Noya, Asier, et al.
Veröffentlicht: (2025)

Direct Robot Configuration Space Construction using Convolutional Encoder-Decoders
von: Benka, Christopher, et al.
Veröffentlicht: (2023)

Impact of Localization Errors on Label Quality for Online HD Map Construction
von: Blumberg, Alexander, et al.
Veröffentlicht: (2026)

HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
von: Banerjee, Prithviraj, et al.
Veröffentlicht: (2024)

Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation
von: Bai, Yongjie, et al.
Veröffentlicht: (2025)

Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames
von: Yang, Jun, et al.
Veröffentlicht: (2025)