:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Feng, Haoran, Huang, Zehuan, Li, Lin, Lv, Hairong, Sheng, Lu
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.12590
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
by: Li, Lin, et al.
Published: (2025)

AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models
by: Huang, Zehuan, et al.
Published: (2025)

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion
by: Wen, Hao, et al.
Published: (2024)

Repurposing 3D Generative Model for Autoregressive Layout Generation
by: Feng, Haoran, et al.
Published: (2026)

SegviGen: Repurposing 3D Generative Model for Part Segmentation
by: Li, Lin, et al.
Published: (2026)

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
by: Huang, Zehuan, et al.
Published: (2024)

MV-Adapter: Multi-view Consistent Image Generation Made Easy
by: Huang, Zehuan, et al.
Published: (2024)

EAM: Enhancing Anything with Diffusion Transformers for Blind Super-Resolution
by: Xie, Haizhen, et al.
Published: (2025)

MFP-VTON: Enhancing Mask-Free Person-to-Person Virtual Try-On via Diffusion Transformer
by: Shen, Le, et al.
Published: (2025)

Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects
by: Li, Wei, et al.
Published: (2025)

SkyReels-A2: Compose Anything in Video Diffusion Transformers
by: Fei, Zhengcong, et al.
Published: (2025)

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
by: Huang, Zehuan, et al.
Published: (2024)

InterMoE: Individual-Specific 3D Human Interaction Generation via Dynamic Temporal-Selective MoE
by: Wang, Lipeng, et al.
Published: (2025)

ASAM: Boosting Segment Anything Model with Adversarial Tuning
by: Li, Bo, et al.
Published: (2024)

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
by: Huang, Zehuan, et al.
Published: (2023)

GA-Drive: Geometry-Appearance Decoupled Modeling for Free-viewpoint Driving Scene Generation
by: Zhang, Hao, et al.
Published: (2026)

Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance
by: Fan, Hongxing, et al.
Published: (2025)

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
by: Chen, Xi, et al.
Published: (2024)

MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation
by: Song, Yiren, et al.
Published: (2025)

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation
by: Shi, Hairong, et al.
Published: (2024)

InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework
by: Tao, Jiale, et al.
Published: (2025)

Detect Anything 3D in the Wild
by: Zhang, Hanxue, et al.
Published: (2025)

Learning to Prompt Segment Anything Models
by: Huang, Jiaxing, et al.
Published: (2024)

SAM3-I: Segment Anything with Instructions
by: Li, Jingjing, et al.
Published: (2025)

Segment and Caption Anything
by: Huang, Xiaoke, et al.
Published: (2023)

SayAnything: Audio-Driven Lip Synchronization with Conditional Video Diffusion
by: Ma, Junxian, et al.
Published: (2025)

Move Anything with Layered Scene Diffusion
by: Ren, Jiawei, et al.
Published: (2024)

Depth Anything V2
by: Yang, Lihe, et al.
Published: (2024)

Fossil Image Identification using Deep Learning Ensembles of Data Augmented Multiviews
by: Hou, Chengbin, et al.
Published: (2023)

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation
by: Gan, Qijun, et al.
Published: (2025)

PolarAnything: Diffusion-based Polarimetric Image Synthesis
by: Zhang, Kailong, et al.
Published: (2025)

Matching Anything by Segmenting Anything
by: Li, Siyuan, et al.
Published: (2024)

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers
by: Gao, Daiheng, et al.
Published: (2024)

SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
by: Wang, Yuji, et al.
Published: (2025)

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
by: Lin, Weifeng, et al.
Published: (2025)

Register Anything: Estimating "Corresponding Prompts" for Segment Anything Model
by: Huang, Shiqi, et al.
Published: (2025)

FastDrag: Manipulate Anything in One Step
by: Zhao, Xuanjia, et al.
Published: (2024)

pFedSAM: Personalized Federated Learning of Segment Anything Model for Medical Image Segmentation
by: Wang, Tong, et al.
Published: (2025)

Unsegment Anything by Simulating Deformation
by: Lu, Jiahao, et al.
Published: (2024)

GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
by: Chen, Li-Heng, et al.
Published: (2025)