:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tu, Yuanpeng, Luo, Hao, Chen, Xi, Bai, Xiang, Wang, Fan, Zhao, Hengshuang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.09995
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
by: Tu, Yuanpeng, et al.
Published: (2025)

LayerFlow: A Unified Model for Layer-aware Video Generation
by: Ji, Sihui, et al.
Published: (2025)

DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data
by: Tu, Yuanpeng, et al.
Published: (2025)

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery
by: Tu, Yuanpeng, et al.
Published: (2024)

FocalClick-XL: Towards Unified and High-quality Interactive Segmentation
by: Chen, Xi, et al.
Published: (2025)

Domain Camera Adaptation and Collaborative Multiple Feature Clustering for Unsupervised Person Re-ID
by: Tu, Yuanpeng
Published: (2022)

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
by: Zhou, Xin, et al.
Published: (2025)

FashionComposer: Compositional Fashion Image Generation
by: Ji, Sihui, et al.
Published: (2024)

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
by: Zhou, Xin, et al.
Published: (2026)

One for All: Multi-Domain Joint Training for Point Cloud Based 3D Object Detection
by: Wang, Zhenyu, et al.
Published: (2024)

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
by: Chen, Xi, et al.
Published: (2025)

EgoSim: Egocentric World Simulator for Embodied Interaction Generation
by: Hao, Jinkun, et al.
Published: (2026)

UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs
by: Liu, Zhe, et al.
Published: (2025)

DiffCamera: Arbitrary Refocusing on Images
by: Wang, Yiyang, et al.
Published: (2025)

LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
by: Zhu, Mingkang, et al.
Published: (2024)

LION: Linear Group RNN for 3D Object Detection in Point Clouds
by: Liu, Zhe, et al.
Published: (2024)

GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation
by: Yang, Zhenya, et al.
Published: (2025)

EgoForge: Goal-Directed Egocentric World Simulator
by: Shen, Yifan, et al.
Published: (2026)

PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
by: Ji, Sihui, et al.
Published: (2025)

A Lightweight Clustering Framework for Unsupervised Semantic Segmentation
by: Cheung, Yau Shing Jonathan, et al.
Published: (2023)

Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
by: Gan, Chaofan, et al.
Published: (2025)

Seg-VAR: Image Segmentation with Visual Autoregressive Modeling
by: Zheng, Rongkun, et al.
Published: (2025)

Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations
by: Gan, Chaofan, et al.
Published: (2025)

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation
by: Zhu, Mingkang, et al.
Published: (2025)

Utonia: Toward One Encoder for All Point Clouds
by: Zhang, Yujia, et al.
Published: (2026)

AnyDoor: Zero-shot Object-level Image Customization
by: Chen, Xi, et al.
Published: (2023)

GDRO: Group-level Reward Post-training Suitable for Diffusion Models
by: Wang, Yiyang, et al.
Published: (2026)

PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
by: Cao, Zidong, et al.
Published: (2024)

Liquid: Language Models are Scalable and Unified Multi-modal Generators
by: Wu, Junfeng, et al.
Published: (2024)

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
by: Yuan, Yuqian, et al.
Published: (2025)

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
by: Zheng, Rongkun, et al.
Published: (2023)

SyncVIS: Synchronized Video Instance Segmentation
by: Zheng, Rongkun, et al.
Published: (2024)

ViLLa: Video Reasoning Segmentation with Large Language Model
by: Zheng, Rongkun, et al.
Published: (2024)

MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
by: Ji, Sihui, et al.
Published: (2025)

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
by: Huang, Zhening, et al.
Published: (2023)

DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction
by: Gan, Chaofan, et al.
Published: (2024)

UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation
by: Yang, Lihe, et al.
Published: (2024)

Animate-X++: Universal Character Image Animation with Dynamic Backgrounds
by: Tan, Shuai, et al.
Published: (2025)

DiffDoctor: Diagnosing Image Diffusion Models Before Treating
by: Wang, Yiyang, et al.
Published: (2025)

Being-H0.7: A Latent World-Action Model from Egocentric Videos
by: Luo, Hao, et al.
Published: (2026)