:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dong, Mingyu, Xia, Chong, Jia, Mingyuan, Lyu, Weichen, Xu, Long, Zhu, Zheng, Duan, Yueqi
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.10789
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

SimRecon: SimReady Compositional Scene Reconstruction from Real Videos
by: Xia, Chong, et al.
Published: (2026)

ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
by: Xia, Chong, et al.
Published: (2025)

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
by: Liu, Fangfu, et al.
Published: (2024)

VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
by: Wang, Hanyang, et al.
Published: (2025)

GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval
by: Jeon, Mingyu, et al.
Published: (2026)

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
by: Sun, Wenqiang, et al.
Published: (2024)

Memory-based Adapters for Online 3D Scene Perception
by: Xu, Xiuwei, et al.
Published: (2024)

Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
by: Wu, Diankun, et al.
Published: (2025)

Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
by: Zhang, Shengjun, et al.
Published: (2025)

Deep Semantic-Visual Alignment for Zero-Shot Remote Sensing Image Scene Classification
by: Xu, Wenjia, et al.
Published: (2024)

SceneCompleter: Dense 3D Scene Completion for Generative Novel View Synthesis
by: Chen, Weiliang, et al.
Published: (2025)

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
by: Song, Yeji, et al.
Published: (2024)

AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
by: Ahn, Sunghyun, et al.
Published: (2025)

Zero-Shot Scene Change Detection
by: Cho, Kyusik, et al.
Published: (2024)

Semantic Flow: Learning Semantic Field of Dynamic Scenes from Monocular Videos
by: Tian, Fengrui, et al.
Published: (2024)

Revisiting 3D Reconstruction Kernels as Low-Pass Filters
by: Zhang, Shengjun, et al.
Published: (2026)

EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning
by: Zhang, Xiao, et al.
Published: (2025)

Zero-Shot Hashing Based on Reconstruction With Part Alignment
by: Jiang, Yan, et al.
Published: (2025)

OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution
by: Xia, Chong, et al.
Published: (2026)

Zero-Shot Fake Video Detection by Audio-Visual Consistency
by: Li, Xiaolou, et al.
Published: (2024)

SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation
by: Zhang, Jiwen, et al.
Published: (2026)

OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
by: Tang, Yijie, et al.
Published: (2025)

Learning by Imagining: Debiased Feature Augmentation for Compositional Zero-Shot Learning
by: Zhang, Haozhe, et al.
Published: (2025)

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
by: Liu, Fangfu, et al.
Published: (2025)

SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation
by: Zhang, Jiwen, et al.
Published: (2026)

Zero-Shot Personalization of Objects via Textual Inversion
by: Roy, Aniket, et al.
Published: (2026)

SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos
by: Li, Joshua, et al.
Published: (2025)

Anything in Any Scene: Photorealistic Video Object Insertion
by: Bai, Chen, et al.
Published: (2024)

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
by: Gkanatsios, Nikolaos, et al.
Published: (2023)

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
by: Zhao, Chengyang, et al.
Published: (2023)

Zero-Shot Temporal Interaction Localization for Egocentric Videos
by: Zhang, Erhang, et al.
Published: (2025)

ZeroHSI: Zero-Shot 4D Human-Scene Interaction by Video Generation
by: Li, Hongjie, et al.
Published: (2024)

Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
by: Guo, Yuliang, et al.
Published: (2025)

Learning Visual Proxy for Compositional Zero-Shot Learning
by: Zhang, Shiyu, et al.
Published: (2025)

Visual Adaptive Prompting for Compositional Zero-Shot Learning
by: Stein, Kyle, et al.
Published: (2025)

RARE: Refine Any Registration of Pairwise Point Clouds via Zero-Shot Learning
by: Zheng, Chengyu, et al.
Published: (2025)

EmboAlign: Aligning Video Generation with Compositional Constraints for Zero-Shot Manipulation
by: Zhang, Gehao, et al.
Published: (2026)

Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning
by: Duan, Yihang, et al.
Published: (2026)

Zero-Shot Temporal Action Localization Through Textual Guidance
by: Liberatori, Benedetta, et al.
Published: (2026)

Coherent Zero-Shot Visual Instruction Generation
by: Phung, Quynh, et al.
Published: (2024)