Saved in:
| Main Authors: | Hu, Teng, Zhang, Jiangning, Huang, Hongrui, Yi, Ran, Su, Zihan, Weng, Jieyu, Xue, Zhucun, Ma, Lizhuang, Yang, Ming-Hsuan, Tao, Dacheng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.06339 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)
by: Chen, Yuheng, et al.
Published: (2025)
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
by: Hu, Teng, et al.
Published: (2025)
by: Hu, Teng, et al.
Published: (2025)
MotionMaster: Training-free Camera Motion Transfer For Video Generation
by: Hu, Teng, et al.
Published: (2024)
by: Hu, Teng, et al.
Published: (2024)
IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction
by: Yi, Ran, et al.
Published: (2025)
by: Yi, Ran, et al.
Published: (2025)
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
by: Hu, Teng, et al.
Published: (2024)
by: Hu, Teng, et al.
Published: (2024)
UltraGen: High-Resolution Video Generation with Hierarchical Attention
by: Hu, Teng, et al.
Published: (2025)
by: Hu, Teng, et al.
Published: (2025)
UniICL: Systematizing Unified Multimodal In-context Learning through a Capability-Oriented Taxonomy
by: Xu, Yicheng, et al.
Published: (2026)
by: Xu, Yicheng, et al.
Published: (2026)
PoseAnything: Universal Pose-guided Video Generation with Part-aware Temporal Coherence
by: Wang, Ruiyan, et al.
Published: (2025)
by: Wang, Ruiyan, et al.
Published: (2025)
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
by: Xue, Zhucun, et al.
Published: (2025)
by: Xue, Zhucun, et al.
Published: (2025)
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
by: Xue, Zhucun, et al.
Published: (2025)
by: Xue, Zhucun, et al.
Published: (2025)
Semantic Frame Interpolation
by: Hong, Yijia, et al.
Published: (2025)
by: Hong, Yijia, et al.
Published: (2025)
Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)
by: Chen, Yuheng, et al.
Published: (2026)
EMOv2: Pushing 5M Vision Model Frontier
by: Zhang, Jiangning, et al.
Published: (2024)
by: Zhang, Jiangning, et al.
Published: (2024)
Advancing Narrative Long Video Generation via Training-Free Identity-Aware Memory
by: Liu, Jinzhuo, et al.
Published: (2026)
by: Liu, Jinzhuo, et al.
Published: (2026)
Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark
by: Zhang, Jiangning, et al.
Published: (2024)
by: Zhang, Jiangning, et al.
Published: (2024)
Image Inversion: A Survey from GANs to Diffusion and Beyond
by: Chen, Yinan, et al.
Published: (2025)
by: Chen, Yinan, et al.
Published: (2025)
Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)
by: Chen, Yuheng, et al.
Published: (2026)
SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
by: Jiang, Wencan, et al.
Published: (2026)
by: Jiang, Wencan, et al.
Published: (2026)
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10$\times$
by: Zhang, Jiangning, et al.
Published: (2025)
by: Zhang, Jiangning, et al.
Published: (2025)
IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
by: Chen, Yinan, et al.
Published: (2025)
by: Chen, Yinan, et al.
Published: (2025)
Collaborative Face Experts Fusion in Video Generation: Boosting Identity Consistency Across Large Face Poses
by: Wang, Yuji, et al.
Published: (2025)
by: Wang, Yuji, et al.
Published: (2025)
Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations
by: Wang, Yuji, et al.
Published: (2025)
by: Wang, Yuji, et al.
Published: (2025)
ImitDiff: Transferring Foundation-Model Priors for Distraction Robust Visuomotor Policy
by: Dong, Yuhang, et al.
Published: (2025)
by: Dong, Yuhang, et al.
Published: (2025)
Exploring Real&Synthetic Dataset and Linear Attention in Image Restoration
by: Du, Yuzhen, et al.
Published: (2024)
by: Du, Yuzhen, et al.
Published: (2024)
Evolution of Optimization Methods: Algorithms, Scenarios, and Evaluations
by: Zhang, Tong, et al.
Published: (2026)
by: Zhang, Tong, et al.
Published: (2026)
Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval
by: Xue, Zhucun, et al.
Published: (2026)
by: Xue, Zhucun, et al.
Published: (2026)
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
by: Hu, Teng, et al.
Published: (2025)
by: Hu, Teng, et al.
Published: (2025)
OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing
by: He, Haoyang, et al.
Published: (2025)
by: He, Haoyang, et al.
Published: (2025)
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
by: Wang, Yabiao, et al.
Published: (2024)
by: Wang, Yabiao, et al.
Published: (2024)
AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius
by: Wang, Xinzhe, et al.
Published: (2024)
by: Wang, Xinzhe, et al.
Published: (2024)
Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation
by: Fan, Ke, et al.
Published: (2024)
by: Fan, Ke, et al.
Published: (2024)
Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection
by: Zhang, Jiangning, et al.
Published: (2023)
by: Zhang, Jiangning, et al.
Published: (2023)
M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising
by: Wang, Chengjie, et al.
Published: (2024)
by: Wang, Chengjie, et al.
Published: (2024)
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
by: Hu, Teng, et al.
Published: (2025)
by: Hu, Teng, et al.
Published: (2025)
ID-Sculpt: ID-aware 3D Head Generation from Single In-the-wild Portrait Image
by: Hao, Jinkun, et al.
Published: (2024)
by: Hao, Jinkun, et al.
Published: (2024)
UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation
by: Fan, Linxuan, et al.
Published: (2025)
by: Fan, Linxuan, et al.
Published: (2025)
Generative Classifier for Domain Generalization
by: Long, Shaocong, et al.
Published: (2025)
by: Long, Shaocong, et al.
Published: (2025)
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)
by: Cai, Yuxuan, et al.
Published: (2025)
BEAR: A Video Dataset For Fine-grained Behaviors Recognition Oriented with Action and Environment Factors
by: Hu, Chengyang, et al.
Published: (2025)
by: Hu, Chengyang, et al.
Published: (2025)
RWKV-UNet: Improving UNet with Long-Range Cooperation for Effective Medical Image Segmentation
by: Jiang, Juntao, et al.
Published: (2025)
by: Jiang, Juntao, et al.
Published: (2025)
Similar Items
-
InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025) -
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
by: Hu, Teng, et al.
Published: (2025) -
MotionMaster: Training-free Camera Motion Transfer For Video Generation
by: Hu, Teng, et al.
Published: (2024) -
IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction
by: Yi, Ran, et al.
Published: (2025) -
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation
by: Hu, Teng, et al.
Published: (2024)