Guardado en:
| Autores principales: | Shi, Shuyao, Shin, Kang G. |
|---|---|
| Formato: | Preprint |
| Publicado: |
2026
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2603.17980 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
por: Zheng, Duo, et al.
Publicado: (2024)
por: Zheng, Duo, et al.
Publicado: (2024)
EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent
por: Li, Jiaao, et al.
Publicado: (2025)
por: Li, Jiaao, et al.
Publicado: (2025)
3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
por: Huang, Xiaohu, et al.
Publicado: (2025)
por: Huang, Xiaohu, et al.
Publicado: (2025)
Fast SceneScript: Fast and Accurate Language-Based 3D Scene Understanding via Multi-Token Prediction
por: Yin, Ruihong, et al.
Publicado: (2025)
por: Yin, Ruihong, et al.
Publicado: (2025)
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding
por: Wang, Yunsong, et al.
Publicado: (2024)
por: Wang, Yunsong, et al.
Publicado: (2024)
Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion
por: Xiong, Tianyi, et al.
Publicado: (2024)
por: Xiong, Tianyi, et al.
Publicado: (2024)
SurgCUT3R: Surgical Scene-Aware Continuous Understanding of Temporal 3D Representation
por: Xu, Kaiyuan, et al.
Publicado: (2026)
por: Xu, Kaiyuan, et al.
Publicado: (2026)
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation
por: Huang, Kaiyi, et al.
Publicado: (2026)
por: Huang, Kaiyi, et al.
Publicado: (2026)
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
por: Bar, Amir, et al.
Publicado: (2024)
por: Bar, Amir, et al.
Publicado: (2024)
Sketch3DVE: Sketch-based 3D-Aware Scene Video Editing
por: Liu, Feng-Lin, et al.
Publicado: (2025)
por: Liu, Feng-Lin, et al.
Publicado: (2025)
Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes
por: Shi, Yuang, et al.
Publicado: (2025)
por: Shi, Yuang, et al.
Publicado: (2025)
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning
por: Gao, Zhe, et al.
Publicado: (2026)
por: Gao, Zhe, et al.
Publicado: (2026)
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
por: Qi, Zhangyang, et al.
Publicado: (2025)
por: Qi, Zhangyang, et al.
Publicado: (2025)
Seeing the Scene Matters: Revealing Forgetting in Video Understanding Models with a Scene-Aware Long-Video Benchmark
por: Chen, Seng Nam, et al.
Publicado: (2026)
por: Chen, Seng Nam, et al.
Publicado: (2026)
Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers
por: Zhao, Ji, et al.
Publicado: (2025)
por: Zhao, Ji, et al.
Publicado: (2025)
A Unified Framework for 3D Scene Understanding
por: Xu, Wei, et al.
Publicado: (2024)
por: Xu, Wei, et al.
Publicado: (2024)
Motion Segmentation and Egomotion Estimation from Event-Based Normal Flow
por: Hua, Zhiyuan, et al.
Publicado: (2025)
por: Hua, Zhiyuan, et al.
Publicado: (2025)
HexPlane Representation for 3D Semantic Scene Understanding
por: Chen, Zeren, et al.
Publicado: (2025)
por: Chen, Zeren, et al.
Publicado: (2025)
UrbanFeel: A Comprehensive Benchmark for Temporal and Perceptual Understanding of City Scenes through Human Perspective
por: He, Jun, et al.
Publicado: (2025)
por: He, Jun, et al.
Publicado: (2025)
iMOVE: Instance-Motion-Aware Video Understanding
por: Li, Jiaze, et al.
Publicado: (2025)
por: Li, Jiaze, et al.
Publicado: (2025)
VideoMamba: State Space Model for Efficient Video Understanding
por: Li, Kunchang, et al.
Publicado: (2024)
por: Li, Kunchang, et al.
Publicado: (2024)
Drag4D: Align Your Motion with Text-Driven 3D Scene Generation
por: Kang, Minjun, et al.
Publicado: (2025)
por: Kang, Minjun, et al.
Publicado: (2025)
Timeliness-Fidelity Tradeoff in 3D Scene Representations
por: Xu, Xiangmin, et al.
Publicado: (2024)
por: Xu, Xiangmin, et al.
Publicado: (2024)
HMR3D: Hierarchical Multimodal Representation for 3D Scene Understanding with Large Vision-Language Model
por: Li, Chen, et al.
Publicado: (2025)
por: Li, Chen, et al.
Publicado: (2025)
CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting
por: Sun, Wei, et al.
Publicado: (2025)
por: Sun, Wei, et al.
Publicado: (2025)
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
por: Li, Jinlong, et al.
Publicado: (2025)
por: Li, Jinlong, et al.
Publicado: (2025)
Learning Monocular Depth from Events via Egomotion Compensation
por: Meng, Haitao, et al.
Publicado: (2024)
por: Meng, Haitao, et al.
Publicado: (2024)
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
por: Yu, Hanxun, et al.
Publicado: (2025)
por: Yu, Hanxun, et al.
Publicado: (2025)
GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting
por: Hao, Junlin, et al.
Publicado: (2025)
por: Hao, Junlin, et al.
Publicado: (2025)
Efficient and Accurate Scene Text Recognition with Cascaded-Transformers
por: Ozkan, Savas, et al.
Publicado: (2025)
por: Ozkan, Savas, et al.
Publicado: (2025)
Modality-Aware Shot Relating and Comparing for Video Scene Detection
por: Tan, Jiawei, et al.
Publicado: (2024)
por: Tan, Jiawei, et al.
Publicado: (2024)
SceneRAG: Scene-level Retrieval-Augmented Generation for Video Understanding
por: Zeng, Nianbo, et al.
Publicado: (2025)
por: Zeng, Nianbo, et al.
Publicado: (2025)
SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields
por: Li, Qijing, et al.
Publicado: (2025)
por: Li, Qijing, et al.
Publicado: (2025)
3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding
por: Wang, Xiaoye, et al.
Publicado: (2025)
por: Wang, Xiaoye, et al.
Publicado: (2025)
DC-Scene: Data-Centric Learning for 3D Scene Understanding
por: Huang, Ting, et al.
Publicado: (2025)
por: Huang, Ting, et al.
Publicado: (2025)
SceneGPT: A Language Model for 3D Scene Understanding
por: Chandhok, Shivam
Publicado: (2024)
por: Chandhok, Shivam
Publicado: (2024)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting
por: Zhang, Daiwei, et al.
Publicado: (2024)
por: Zhang, Daiwei, et al.
Publicado: (2024)
E-MoFlow: Learning Egomotion and Optical Flow from Event Data via Implicit Regularization
por: Li, Wenpu, et al.
Publicado: (2025)
por: Li, Wenpu, et al.
Publicado: (2025)
Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
por: Yarram, Sudhir, et al.
Publicado: (2024)
por: Yarram, Sudhir, et al.
Publicado: (2024)
Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding
por: Wu, Xianjin, et al.
Publicado: (2026)
por: Wu, Xianjin, et al.
Publicado: (2026)
Ejemplares similares
-
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
por: Zheng, Duo, et al.
Publicado: (2024) -
EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent
por: Li, Jiaao, et al.
Publicado: (2025) -
3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
por: Huang, Xiaohu, et al.
Publicado: (2025) -
Fast SceneScript: Fast and Accurate Language-Based 3D Scene Understanding via Multi-Token Prediction
por: Yin, Ruihong, et al.
Publicado: (2025) -
Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding
por: Wang, Yunsong, et al.
Publicado: (2024)