Saved in:
| Main Authors: | Tu, Yuanpeng, Luo, Hao, Chen, Xi, Bai, Xiang, Wang, Fan, Zhao, Hengshuang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.09995 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
by: Tu, Yuanpeng, et al.
Published: (2025)
by: Tu, Yuanpeng, et al.
Published: (2025)
LayerFlow: A Unified Model for Layer-aware Video Generation
by: Ji, Sihui, et al.
Published: (2025)
by: Ji, Sihui, et al.
Published: (2025)
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
by: Tu, Yuanpeng, et al.
Published: (2025)
by: Tu, Yuanpeng, et al.
Published: (2025)
Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery
by: Tu, Yuanpeng, et al.
Published: (2024)
by: Tu, Yuanpeng, et al.
Published: (2024)
FocalClick-XL: Towards Unified and High-quality Interactive Segmentation
by: Chen, Xi, et al.
Published: (2025)
by: Chen, Xi, et al.
Published: (2025)
Domain Camera Adaptation and Collaborative Multiple Feature Clustering for Unsupervised Person Re-ID
by: Tu, Yuanpeng
Published: (2022)
by: Tu, Yuanpeng
Published: (2022)
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
by: Zhou, Xin, et al.
Published: (2025)
by: Zhou, Xin, et al.
Published: (2025)
FashionComposer: Compositional Fashion Image Generation
by: Ji, Sihui, et al.
Published: (2024)
by: Ji, Sihui, et al.
Published: (2024)
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
by: Zhou, Xin, et al.
Published: (2026)
by: Zhou, Xin, et al.
Published: (2026)
One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
by: Wang, Zhenyu, et al.
Published: (2024)
by: Wang, Zhenyu, et al.
Published: (2024)
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
by: Chen, Xi, et al.
Published: (2025)
by: Chen, Xi, et al.
Published: (2025)
EgoSim: Egocentric World Simulator for Embodied Interaction Generation
by: Hao, Jinkun, et al.
Published: (2026)
by: Hao, Jinkun, et al.
Published: (2026)
UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs
by: Liu, Zhe, et al.
Published: (2025)
by: Liu, Zhe, et al.
Published: (2025)
DiffCamera: Arbitrary Refocusing on Images
by: Wang, Yiyang, et al.
Published: (2025)
by: Wang, Yiyang, et al.
Published: (2025)
LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
by: Zhu, Mingkang, et al.
Published: (2024)
by: Zhu, Mingkang, et al.
Published: (2024)
LION: Linear Group RNN for 3D Object Detection in Point Clouds
by: Liu, Zhe, et al.
Published: (2024)
by: Liu, Zhe, et al.
Published: (2024)
GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation
by: Yang, Zhenya, et al.
Published: (2025)
by: Yang, Zhenya, et al.
Published: (2025)
EgoForge: Goal-Directed Egocentric World Simulator
by: Shen, Yifan, et al.
Published: (2026)
by: Shen, Yifan, et al.
Published: (2026)
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
by: Ji, Sihui, et al.
Published: (2025)
by: Ji, Sihui, et al.
Published: (2025)
A Lightweight Clustering Framework for Unsupervised Semantic Segmentation
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023)
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023)
Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
by: Gan, Chaofan, et al.
Published: (2025)
by: Gan, Chaofan, et al.
Published: (2025)
Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
by: Zheng, Rongkun, et al.
Published: (2025)
by: Zheng, Rongkun, et al.
Published: (2025)
Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations
by: Gan, Chaofan, et al.
Published: (2025)
by: Gan, Chaofan, et al.
Published: (2025)
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation
by: Zhu, Mingkang, et al.
Published: (2025)
by: Zhu, Mingkang, et al.
Published: (2025)
Utonia: Toward One Encoder for All Point Clouds
by: Zhang, Yujia, et al.
Published: (2026)
by: Zhang, Yujia, et al.
Published: (2026)
AnyDoor: Zero-shot Object-level Image Customization
by: Chen, Xi, et al.
Published: (2023)
by: Chen, Xi, et al.
Published: (2023)
GDRO: Group-level Reward Post-training Suitable for Diffusion Models
by: Wang, Yiyang, et al.
Published: (2026)
by: Wang, Yiyang, et al.
Published: (2026)
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
by: Cao, Zidong, et al.
Published: (2024)
by: Cao, Zidong, et al.
Published: (2024)
Liquid: Language Models are Scalable and Unified Multi-modal Generators
by: Wu, Junfeng, et al.
Published: (2024)
by: Wu, Junfeng, et al.
Published: (2024)
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
by: Yuan, Yuqian, et al.
Published: (2025)
by: Yuan, Yuqian, et al.
Published: (2025)
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
by: Zheng, Rongkun, et al.
Published: (2023)
by: Zheng, Rongkun, et al.
Published: (2023)
SyncVIS: Synchronized Video Instance Segmentation
by: Zheng, Rongkun, et al.
Published: (2024)
by: Zheng, Rongkun, et al.
Published: (2024)
ViLLa: Video Reasoning Segmentation with Large Language Model
by: Zheng, Rongkun, et al.
Published: (2024)
by: Zheng, Rongkun, et al.
Published: (2024)
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
by: Ji, Sihui, et al.
Published: (2025)
by: Ji, Sihui, et al.
Published: (2025)
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
by: Huang, Zhening, et al.
Published: (2023)
by: Huang, Zhening, et al.
Published: (2023)
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction
by: Gan, Chaofan, et al.
Published: (2024)
by: Gan, Chaofan, et al.
Published: (2024)
UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
by: Yang, Lihe, et al.
Published: (2024)
by: Yang, Lihe, et al.
Published: (2024)
Animate-X++: Universal Character Image Animation with Dynamic Backgrounds
by: Tan, Shuai, et al.
Published: (2025)
by: Tan, Shuai, et al.
Published: (2025)
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
by: Wang, Yiyang, et al.
Published: (2025)
by: Wang, Yiyang, et al.
Published: (2025)
Being-H0.7: A Latent World-Action Model from Egocentric Videos
by: Luo, Hao, et al.
Published: (2026)
by: Luo, Hao, et al.
Published: (2026)
Similar Items
-
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
by: Tu, Yuanpeng, et al.
Published: (2025) -
LayerFlow: A Unified Model for Layer-aware Video Generation
by: Ji, Sihui, et al.
Published: (2025) -
DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
by: Tu, Yuanpeng, et al.
Published: (2025) -
Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery
by: Tu, Yuanpeng, et al.
Published: (2024) -
FocalClick-XL: Towards Unified and High-quality Interactive Segmentation
by: Chen, Xi, et al.
Published: (2025)