Saved in:
| Main Authors: | Wu, Yilu, Zhu, Chenhui, Wang, Shuai, Wang, Hanlin, Wang, Jing, Zhang, Zhaoxiang, Wang, Limin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.08234 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
by: Zhu, Chenhui, et al.
Published: (2025)
by: Zhu, Chenhui, et al.
Published: (2025)
Open-Event Procedure Planning in Instructional Videos
by: Wu, Yilu, et al.
Published: (2024)
by: Wu, Yilu, et al.
Published: (2024)
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023)
by: Wang, Hanlin, et al.
Published: (2023)
PixNerd: Pixel Neural Field Diffusion
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
by: Chen, Yang, et al.
Published: (2025)
by: Chen, Yang, et al.
Published: (2025)
CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation
by: Wang, Yisen, et al.
Published: (2024)
by: Wang, Yisen, et al.
Published: (2024)
Contextual AD Narration with Interleaved Multimodal Sequence
by: Wang, Hanlin, et al.
Published: (2024)
by: Wang, Hanlin, et al.
Published: (2024)
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Dual DETRs for Multi-Label Temporal Action Detection
by: Zhu, Yuhan, et al.
Published: (2024)
by: Zhu, Yuhan, et al.
Published: (2024)
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
by: Wang, Pengfei, et al.
Published: (2024)
by: Wang, Pengfei, et al.
Published: (2024)
Linearized Coupling Flow with Shortcut Constraints for One-Step Face Restoration
by: Sun, Xiaohui, et al.
Published: (2026)
by: Sun, Xiaohui, et al.
Published: (2026)
FreeVS: Generative View Synthesis on Free Driving Trajectory
by: Wang, Qitai, et al.
Published: (2024)
by: Wang, Qitai, et al.
Published: (2024)
OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains
by: Zhang, Yixuan, et al.
Published: (2024)
by: Zhang, Yixuan, et al.
Published: (2024)
HLG: Comprehensive 3D Room Construction via Hierarchical Layout Generation
by: Wang, Xiping, et al.
Published: (2025)
by: Wang, Xiping, et al.
Published: (2025)
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
by: Wang, Hanlin, et al.
Published: (2024)
by: Wang, Hanlin, et al.
Published: (2024)
Weakly Supervised 3D Object Detection with Multi-Stage Generalization
by: He, Jiawei, et al.
Published: (2023)
by: He, Jiawei, et al.
Published: (2023)
RoomCraft: Controllable and Complete 3D Indoor Scene Generation
by: Zhou, Mengqi, et al.
Published: (2025)
by: Zhou, Mengqi, et al.
Published: (2025)
DDT: Decoupled Diffusion Transformer
by: Wang, Shuai, et al.
Published: (2025)
by: Wang, Shuai, et al.
Published: (2025)
Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution
by: Wang, Shuai, et al.
Published: (2024)
by: Wang, Shuai, et al.
Published: (2024)
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging
by: Song, Tianhui, et al.
Published: (2025)
by: Song, Tianhui, et al.
Published: (2025)
Arbitrary Generative Video Interpolation
by: Zhang, Guozhen, et al.
Published: (2025)
by: Zhang, Guozhen, et al.
Published: (2025)
Concept Unlearning by Modeling Key Steps of Diffusion Process
by: Zhang, Chaoshuo, et al.
Published: (2025)
by: Zhang, Chaoshuo, et al.
Published: (2025)
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
by: Chen, Yuntao, et al.
Published: (2024)
by: Chen, Yuntao, et al.
Published: (2024)
A Curriculum-style Self-training Approach for Source-Free Semantic Segmentation
by: Wang, Yuxi, et al.
Published: (2021)
by: Wang, Yuxi, et al.
Published: (2021)
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
by: Li, Xinhao, et al.
Published: (2023)
by: Li, Xinhao, et al.
Published: (2023)
Learning Multi-dimensional Human Preference for Text-to-Image Generation
by: Zhang, Sixian, et al.
Published: (2024)
by: Zhang, Sixian, et al.
Published: (2024)
Recovering 3D Human Mesh from Monocular Images: A Survey
by: Tian, Yating, et al.
Published: (2022)
by: Tian, Yating, et al.
Published: (2022)
Motion-Aware Generative Frame Interpolation
by: Zhang, Guozhen, et al.
Published: (2025)
by: Zhang, Guozhen, et al.
Published: (2025)
LoD-Loc v2: Aerial Visual Localization over Low Level-of-Detail City Models using Explicit Silhouette Alignment
by: Zhu, Juelin, et al.
Published: (2025)
by: Zhu, Juelin, et al.
Published: (2025)
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
by: Li, Xinhao, et al.
Published: (2024)
by: Li, Xinhao, et al.
Published: (2024)
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data
by: Zhang, Chenhui, et al.
Published: (2024)
by: Zhang, Chenhui, et al.
Published: (2024)
Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation
by: Zhang, Jiaming, et al.
Published: (2023)
by: Zhang, Jiaming, et al.
Published: (2023)
OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning
by: Gong, Yuan, et al.
Published: (2025)
by: Gong, Yuan, et al.
Published: (2025)
PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process
by: Jiang, Shiqi, et al.
Published: (2025)
by: Jiang, Shiqi, et al.
Published: (2025)
MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
by: Gao, Ruopeng, et al.
Published: (2023)
by: Gao, Ruopeng, et al.
Published: (2023)
Denoising Diffusion Step-aware Models
by: Yang, Shuai, et al.
Published: (2023)
by: Yang, Shuai, et al.
Published: (2023)
Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation
by: Wang, Haochen, et al.
Published: (2023)
by: Wang, Haochen, et al.
Published: (2023)
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision
by: Wang, Chenting, et al.
Published: (2025)
by: Wang, Chenting, et al.
Published: (2025)
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
by: Zhu, Yuhan, et al.
Published: (2024)
by: Zhu, Yuhan, et al.
Published: (2024)
Similar Items
-
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
by: Zhu, Chenhui, et al.
Published: (2025) -
Open-Event Procedure Planning in Instructional Videos
by: Wu, Yilu, et al.
Published: (2024) -
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
by: Wang, Hanlin, et al.
Published: (2023) -
PixNerd: Pixel Neural Field Diffusion
by: Wang, Shuai, et al.
Published: (2025) -
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
by: Chen, Yang, et al.
Published: (2025)