Saved in:
| Main Authors: | Dong, Mingyu, Xia, Chong, Jia, Mingyuan, Lyu, Weichen, Xu, Long, Zhu, Zheng, Duan, Yueqi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.10789 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos
by: Xia, Chong, et al.
Published: (2026)
by: Xia, Chong, et al.
Published: (2026)
ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
by: Xia, Chong, et al.
Published: (2025)
by: Xia, Chong, et al.
Published: (2025)
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
by: Liu, Fangfu, et al.
Published: (2024)
by: Liu, Fangfu, et al.
Published: (2024)
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
by: Wang, Hanyang, et al.
Published: (2025)
by: Wang, Hanyang, et al.
Published: (2025)
GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval
by: Jeon, Mingyu, et al.
Published: (2026)
by: Jeon, Mingyu, et al.
Published: (2026)
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
by: Sun, Wenqiang, et al.
Published: (2024)
by: Sun, Wenqiang, et al.
Published: (2024)
Memory-based Adapters for Online 3D Scene Perception
by: Xu, Xiuwei, et al.
Published: (2024)
by: Xu, Xiuwei, et al.
Published: (2024)
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
by: Wu, Diankun, et al.
Published: (2025)
by: Wu, Diankun, et al.
Published: (2025)
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
by: Zhang, Shengjun, et al.
Published: (2025)
by: Zhang, Shengjun, et al.
Published: (2025)
Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene Classification
by: Xu, Wenjia, et al.
Published: (2024)
by: Xu, Wenjia, et al.
Published: (2024)
SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis
by: Chen, Weiliang, et al.
Published: (2025)
by: Chen, Weiliang, et al.
Published: (2025)
Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
by: Song, Yeji, et al.
Published: (2024)
by: Song, Yeji, et al.
Published: (2024)
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
by: Ahn, Sunghyun, et al.
Published: (2025)
by: Ahn, Sunghyun, et al.
Published: (2025)
Zero-Shot Scene Change Detection
by: Cho, Kyusik, et al.
Published: (2024)
by: Cho, Kyusik, et al.
Published: (2024)
Semantic Flow: Learning Semantic Field of Dynamic Scenes from Monocular Videos
by: Tian, Fengrui, et al.
Published: (2024)
by: Tian, Fengrui, et al.
Published: (2024)
Revisiting 3D Reconstruction Kernels as Low-Pass Filters
by: Zhang, Shengjun, et al.
Published: (2026)
by: Zhang, Shengjun, et al.
Published: (2026)
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
by: Zhang, Xiao, et al.
Published: (2025)
by: Zhang, Xiao, et al.
Published: (2025)
Zero-Shot Hashing Based on Reconstruction With Part Alignment
by: Jiang, Yan, et al.
Published: (2025)
by: Jiang, Yan, et al.
Published: (2025)
OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution
by: Xia, Chong, et al.
Published: (2026)
by: Xia, Chong, et al.
Published: (2026)
Zero-Shot Fake Video Detection by Audio-Visual Consistency
by: Li, Xiaolou, et al.
Published: (2024)
by: Li, Xiaolou, et al.
Published: (2024)
SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
by: Tang, Yijie, et al.
Published: (2025)
by: Tang, Yijie, et al.
Published: (2025)
Learning by Imagining: Debiased Feature Augmentation for Compositional Zero-Shot Learning
by: Zhang, Haozhe, et al.
Published: (2025)
by: Zhang, Haozhe, et al.
Published: (2025)
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
by: Liu, Fangfu, et al.
Published: (2025)
by: Liu, Fangfu, et al.
Published: (2025)
SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)
by: Zhang, Jiwen, et al.
Published: (2026)
Zero-Shot Personalization of Objects via Textual Inversion
by: Roy, Aniket, et al.
Published: (2026)
by: Roy, Aniket, et al.
Published: (2026)
SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos
by: Li, Joshua, et al.
Published: (2025)
by: Li, Joshua, et al.
Published: (2025)
Anything in Any Scene: Photorealistic Video Object Insertion
by: Bai, Chen, et al.
Published: (2024)
by: Bai, Chen, et al.
Published: (2024)
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
by: Gkanatsios, Nikolaos, et al.
Published: (2023)
by: Gkanatsios, Nikolaos, et al.
Published: (2023)
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
by: Zhao, Chengyang, et al.
Published: (2023)
by: Zhao, Chengyang, et al.
Published: (2023)
Zero-Shot Temporal Interaction Localization for Egocentric Videos
by: Zhang, Erhang, et al.
Published: (2025)
by: Zhang, Erhang, et al.
Published: (2025)
ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation
by: Li, Hongjie, et al.
Published: (2024)
by: Li, Hongjie, et al.
Published: (2024)
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
by: Guo, Yuliang, et al.
Published: (2025)
by: Guo, Yuliang, et al.
Published: (2025)
Learning Visual Proxy for Compositional Zero-Shot Learning
by: Zhang, Shiyu, et al.
Published: (2025)
by: Zhang, Shiyu, et al.
Published: (2025)
Visual Adaptive Prompting for Compositional Zero-Shot Learning
by: Stein, Kyle, et al.
Published: (2025)
by: Stein, Kyle, et al.
Published: (2025)
RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning
by: Zheng, Chengyu, et al.
Published: (2025)
by: Zheng, Chengyu, et al.
Published: (2025)
EmboAlign: Aligning Video Generation with Compositional Constraints for Zero-Shot Manipulation
by: Zhang, Gehao, et al.
Published: (2026)
by: Zhang, Gehao, et al.
Published: (2026)
Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning
by: Duan, Yihang, et al.
Published: (2026)
by: Duan, Yihang, et al.
Published: (2026)
Zero-Shot Temporal Action Localization Through Textual Guidance
by: Liberatori, Benedetta, et al.
Published: (2026)
by: Liberatori, Benedetta, et al.
Published: (2026)
Coherent Zero-Shot Visual Instruction Generation
by: Phung, Quynh, et al.
Published: (2024)
by: Phung, Quynh, et al.
Published: (2024)
Similar Items
-
SimRecon: SimReady Compositional Scene Reconstruction from Real Videos
by: Xia, Chong, et al.
Published: (2026) -
ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
by: Xia, Chong, et al.
Published: (2025) -
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
by: Liu, Fangfu, et al.
Published: (2024) -
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
by: Wang, Hanyang, et al.
Published: (2025) -
GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval
by: Jeon, Mingyu, et al.
Published: (2026)