:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Teng, Zhang, Jiangning, Huang, Hongrui, Yi, Ran, Su, Zihan, Weng, Jieyu, Xue, Zhucun, Ma, Lizhuang, Yang, Ming-Hsuan, Tao, Dacheng
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.06339
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)

Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
by: Hu, Teng, et al.
Published: (2025)

MotionMaster: Training-free Camera Motion Transfer For Video Generation
by: Hu, Teng, et al.
Published: (2024)

IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction
by: Yi, Ran, et al.
Published: (2025)

SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
by: Hu, Teng, et al.
Published: (2024)

UltraGen: High-Resolution Video Generation with Hierarchical Attention
by: Hu, Teng, et al.
Published: (2025)

UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy
by: Xu, Yicheng, et al.
Published: (2026)

PoseAnything: Universal Pose-guided Video Generation with Part-aware Temporal Coherence
by: Wang, Ruiyan, et al.
Published: (2025)

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
by: Xue, Zhucun, et al.
Published: (2025)

UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
by: Xue, Zhucun, et al.
Published: (2025)

Semantic Frame Interpolation
by: Hong, Yijia, et al.
Published: (2025)

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)

EMOv2: Pushing 5M Vision Model Frontier
by: Zhang, Jiangning, et al.
Published: (2024)

Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory
by: Liu, Jinzhuo, et al.
Published: (2026)

Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark
by: Zhang, Jiangning, et al.
Published: (2024)

Image Inversion: A Survey from GANs to Diffusion and Beyond
by: Chen, Yinan, et al.
Published: (2025)

Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)

SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
by: Jiang, Wencan, et al.
Published: (2026)

Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10$\times$
by: Zhang, Jiangning, et al.
Published: (2025)

IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
by: Chen, Yinan, et al.
Published: (2025)

Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
by: Wang, Yuji, et al.
Published: (2025)

Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations
by: Wang, Yuji, et al.
Published: (2025)

ImitDiff: Transferring Foundation-Model Priors for Distraction Robust Visuomotor Policy
by: Dong, Yuhang, et al.
Published: (2025)

Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration
by: Du, Yuzhen, et al.
Published: (2024)

Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations
by: Zhang, Tong, et al.
Published: (2026)

Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval
by: Xue, Zhucun, et al.
Published: (2026)

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
by: Hu, Teng, et al.
Published: (2025)

OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
by: He, Haoyang, et al.
Published: (2025)

TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
by: Wang, Yabiao, et al.
Published: (2024)

AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius
by: Wang, Xinzhe, et al.
Published: (2024)

Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
by: Fan, Ke, et al.
Published: (2024)

Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection
by: Zhang, Jiangning, et al.
Published: (2023)

M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising
by: Wang, Chengjie, et al.
Published: (2024)

Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
by: Hu, Teng, et al.
Published: (2025)

ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image
by: Hao, Jinkun, et al.
Published: (2024)

UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation
by: Fan, Linxuan, et al.
Published: (2025)

Generative Classifier for Domain Generalization
by: Long, Shaocong, et al.
Published: (2025)

HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)

BEAR: A Video Dataset For Fine-grained Behaviors Recognition Oriented with Action and Environment Factors
by: Hu, Chengyang, et al.
Published: (2025)

RWKV-UNet: Improving UNet with Long-Range Cooperation for Effective Medical Image Segmentation
by: Jiang, Juntao, et al.
Published: (2025)