Saved in:
| Main Authors: | Dou, Huanzhang, Li, Ruixiang, Su, Wei, Li, Xi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.01921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
by: Su, Wei, et al.
Published: (2024)
by: Su, Wei, et al.
Published: (2024)
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
by: Yuan, Yike, et al.
Published: (2024)
by: Yuan, Yike, et al.
Published: (2024)
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
by: Dou, Huanzhang, et al.
Published: (2024)
by: Dou, Huanzhang, et al.
Published: (2024)
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
by: Wang, Zhao, et al.
Published: (2024)
by: Wang, Zhao, et al.
Published: (2024)
Efficient Long-Context Modeling in Diffusion Language Models via Block Approximate Sparse Attention
by: Zhang, Wenhu, et al.
Published: (2026)
by: Zhang, Wenhu, et al.
Published: (2026)
SAVE: Speech-Aware Video Representation Learning for Video-Text Retrieval
by: Zhao, Ruixiang, et al.
Published: (2026)
by: Zhao, Ruixiang, et al.
Published: (2026)
Group Diffusion Transformers are Unsupervised Multitask Learners
by: Huang, Lianghua, et al.
Published: (2024)
by: Huang, Lianghua, et al.
Published: (2024)
Hybrid-Tower: Fine-grained Pseudo-query Interaction and Generation for Text-to-Video Retrieval
by: Lan, Bangxiang, et al.
Published: (2025)
by: Lan, Bangxiang, et al.
Published: (2025)
In-Context LoRA for Diffusion Transformers
by: Huang, Lianghua, et al.
Published: (2024)
by: Huang, Lianghua, et al.
Published: (2024)
Decoupled Video Generation with Chain of Training-free Diffusion Model Experts
by: Li, Wenhao, et al.
Published: (2024)
by: Li, Wenhao, et al.
Published: (2024)
IDEA-Bench: How Far are Generative Models from Professional Designing?
by: Liang, Chen, et al.
Published: (2024)
by: Liang, Chen, et al.
Published: (2024)
Text-Audio-Visual-conditioned Diffusion Model for Video Saliency Prediction
by: Yu, Li, et al.
Published: (2025)
by: Yu, Li, et al.
Published: (2025)
Grid Diffusion Models for Text-to-Video Generation
by: Lee, Taegyeong, et al.
Published: (2024)
by: Lee, Taegyeong, et al.
Published: (2024)
Anchored Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models
by: Hassan, Mariam, et al.
Published: (2025)
by: Hassan, Mariam, et al.
Published: (2025)
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
by: Huang, Lianghua, et al.
Published: (2024)
by: Huang, Lianghua, et al.
Published: (2024)
Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
by: Li, Hongyu, et al.
Published: (2024)
by: Li, Hongyu, et al.
Published: (2024)
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
by: Zhang, Yabo, et al.
Published: (2024)
by: Zhang, Yabo, et al.
Published: (2024)
HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View
by: Wu, Yiming, et al.
Published: (2023)
by: Wu, Yiming, et al.
Published: (2023)
On Semiotic-Grounded Interpretive Evaluation of Generative Art
by: Jiang, Ruixiang, et al.
Published: (2026)
by: Jiang, Ruixiang, et al.
Published: (2026)
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
by: Wang, Chen, et al.
Published: (2025)
by: Wang, Chen, et al.
Published: (2025)
VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
by: Yang, Lehan, et al.
Published: (2025)
by: Yang, Lehan, et al.
Published: (2025)
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding
by: Xi, Dianbing, et al.
Published: (2025)
by: Xi, Dianbing, et al.
Published: (2025)
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
by: Zhang, David Junhao, et al.
Published: (2023)
by: Zhang, David Junhao, et al.
Published: (2023)
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
by: Guo, Yongxin, et al.
Published: (2024)
by: Guo, Yongxin, et al.
Published: (2024)
TGT: Text-Grounded Trajectories for Locally Controlled Video Generation
by: Zhang, Guofeng, et al.
Published: (2025)
by: Zhang, Guofeng, et al.
Published: (2025)
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
by: Li, Changzhen, et al.
Published: (2025)
by: Li, Changzhen, et al.
Published: (2025)
LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
by: Zheng, Guangcong, et al.
Published: (2023)
by: Zheng, Guangcong, et al.
Published: (2023)
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
by: Jeong, Hyeonho, et al.
Published: (2023)
by: Jeong, Hyeonho, et al.
Published: (2023)
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
by: Kara, Ozgur, et al.
Published: (2025)
by: Kara, Ozgur, et al.
Published: (2025)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
by: Li, Xiaomin, et al.
Published: (2024)
by: Li, Xiaomin, et al.
Published: (2024)
Exploring Iterative Refinement with Diffusion Models for Video Grounding
by: Liang, Xiao, et al.
Published: (2023)
by: Liang, Xiao, et al.
Published: (2023)
Temporal-Conditional Referring Video Object Segmentation with Noise-Free Text-to-Video Diffusion Model
by: Zhang, Ruixin, et al.
Published: (2025)
by: Zhang, Ruixin, et al.
Published: (2025)
Multi-sentence Video Grounding for Long Video Generation
by: Feng, Wei, et al.
Published: (2024)
by: Feng, Wei, et al.
Published: (2024)
Disciplined Diffusion: Text-to-Image Diffusion Model against NSFW Generation
by: Zhang, Chi, et al.
Published: (2026)
by: Zhang, Chi, et al.
Published: (2026)
Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models
by: Jiang, Rui, et al.
Published: (2025)
by: Jiang, Rui, et al.
Published: (2025)
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation
by: Jagpal, Diljeet, et al.
Published: (2025)
by: Jagpal, Diljeet, et al.
Published: (2025)
CamI2V: Camera-Controlled Image-to-Video Diffusion Model
by: Zheng, Guangcong, et al.
Published: (2024)
by: Zheng, Guangcong, et al.
Published: (2024)
Dual-Stream Diffusion Net for Text-to-Video Generation
by: Liu, Binhui, et al.
Published: (2023)
by: Liu, Binhui, et al.
Published: (2023)
Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
by: Wang, Wenjing, et al.
Published: (2023)
by: Wang, Wenjing, et al.
Published: (2023)
Similar Items
-
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
by: Su, Wei, et al.
Published: (2024) -
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
by: Yuan, Yike, et al.
Published: (2024) -
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024) -
CLASH: Complementary Learning with Neural Architecture Search for Gait Recognition
by: Dou, Huanzhang, et al.
Published: (2024) -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
by: Wang, Zhao, et al.
Published: (2024)