Saved in:
| Main Authors: | Dedhia, Bhishma, Bourgin, David, Singh, Krishna Kumar, Li, Yuheng, Kang, Yan, Xu, Zhan, Jha, Niraj K., Liu, Yuchen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.17539 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations
by: Dedhia, Bhishma, et al.
Published: (2024)
by: Dedhia, Bhishma, et al.
Published: (2024)
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
by: Wang, Hongjie, et al.
Published: (2023)
by: Wang, Hongjie, et al.
Published: (2023)
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
by: Dedhia, Bhishma, et al.
Published: (2025)
by: Dedhia, Bhishma, et al.
Published: (2025)
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
by: Ding, Zihan, et al.
Published: (2024)
by: Ding, Zihan, et al.
Published: (2024)
Video-Guided Foley Sound Generation with Multimodal Controls
by: Chen, Ziyang, et al.
Published: (2024)
by: Chen, Ziyang, et al.
Published: (2024)
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
by: Hong, Yining, et al.
Published: (2024)
by: Hong, Yining, et al.
Published: (2024)
ActAnywhere: Subject-Aware Video Background Generation
by: Pan, Boxiao, et al.
Published: (2024)
by: Pan, Boxiao, et al.
Published: (2024)
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
by: Zhang, Yitian, et al.
Published: (2025)
by: Zhang, Yitian, et al.
Published: (2025)
Slot-VLM: SlowFast Slots for Video-Language Modeling
by: Xu, Jiaqi, et al.
Published: (2024)
by: Xu, Jiaqi, et al.
Published: (2024)
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
by: Mahapatra, Aniruddha, et al.
Published: (2025)
by: Mahapatra, Aniruddha, et al.
Published: (2025)
RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
by: Fan, Xiang, et al.
Published: (2026)
by: Fan, Xiang, et al.
Published: (2026)
FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation
by: Ding, Ganggui, et al.
Published: (2026)
by: Ding, Ganggui, et al.
Published: (2026)
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
by: Li, Wei, et al.
Published: (2025)
by: Li, Wei, et al.
Published: (2025)
HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation
by: Liang, Yihao, et al.
Published: (2026)
by: Liang, Yihao, et al.
Published: (2026)
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
by: Kara, Ozgur, et al.
Published: (2025)
by: Kara, Ozgur, et al.
Published: (2025)
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
by: Xu, Mingze, et al.
Published: (2024)
by: Xu, Mingze, et al.
Published: (2024)
Seeing Fast and Slow: Learning the Flow of Time in Videos
by: Wu, Yen-Siang, et al.
Published: (2026)
by: Wu, Yen-Siang, et al.
Published: (2026)
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
by: Yin, Tianwei, et al.
Published: (2024)
by: Yin, Tianwei, et al.
Published: (2024)
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
by: Wang, Hongjie, et al.
Published: (2024)
by: Wang, Hongjie, et al.
Published: (2024)
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
by: Xu, Mingze, et al.
Published: (2025)
by: Xu, Mingze, et al.
Published: (2025)
LinMU: Multimodal Understanding Made Linear
by: Wang, Hongjie, et al.
Published: (2026)
by: Wang, Hongjie, et al.
Published: (2026)
Slow-Fast Architecture for Video Multi-Modal Large Language Models
by: Shi, Min, et al.
Published: (2025)
by: Shi, Min, et al.
Published: (2025)
InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)
by: Chen, Yuheng, et al.
Published: (2025)
YoChameleon: Personalized Vision and Language Generation
by: Nguyen, Thao, et al.
Published: (2025)
by: Nguyen, Thao, et al.
Published: (2025)
SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
by: Nahin, Shahriar Kabir, et al.
Published: (2026)
by: Nahin, Shahriar Kabir, et al.
Published: (2026)
Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)
by: Tan, Wenhui, et al.
Published: (2026)
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
by: Wang, Hongjie, et al.
Published: (2024)
by: Wang, Hongjie, et al.
Published: (2024)
Video-T1: Test-Time Scaling for Video Generation
by: Liu, Fangfu, et al.
Published: (2025)
by: Liu, Fangfu, et al.
Published: (2025)
Fast Video Generation with Sliding Tile Attention
by: Zhang, Peiyuan, et al.
Published: (2025)
by: Zhang, Peiyuan, et al.
Published: (2025)
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
by: Yan, Tianyi, et al.
Published: (2025)
by: Yan, Tianyi, et al.
Published: (2025)
Boximator: Generating Rich and Controllable Motions for Video Synthesis
by: Wang, Jiawei, et al.
Published: (2024)
by: Wang, Jiawei, et al.
Published: (2024)
MoVideo: Motion-Aware Video Generation with Diffusion Models
by: Liang, Jingyun, et al.
Published: (2023)
by: Liang, Jingyun, et al.
Published: (2023)
MOVA: Towards Scalable and Synchronized Video-Audio Generation
by: OpenMOSS Team, et al.
Published: (2026)
by: OpenMOSS Team, et al.
Published: (2026)
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
by: Li, Zhengang, et al.
Published: (2024)
by: Li, Zhengang, et al.
Published: (2024)
SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos
by: Li, Joshua, et al.
Published: (2025)
by: Li, Joshua, et al.
Published: (2025)
Fast Autoregressive Video Generation with Diagonal Decoding
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
by: Chen, Bolin, et al.
Published: (2025)
by: Chen, Bolin, et al.
Published: (2025)
VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
by: Yang, Lehan, et al.
Published: (2025)
by: Yang, Lehan, et al.
Published: (2025)
AtomoVideo: High Fidelity Image-to-Video Generation
by: Gong, Litong, et al.
Published: (2024)
by: Gong, Litong, et al.
Published: (2024)
Transition Matching Distillation for Fast Video Generation
by: Nie, Weili, et al.
Published: (2026)
by: Nie, Weili, et al.
Published: (2026)
Similar Items
-
Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations
by: Dedhia, Bhishma, et al.
Published: (2024) -
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
by: Wang, Hongjie, et al.
Published: (2023) -
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
by: Dedhia, Bhishma, et al.
Published: (2025) -
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
by: Ding, Zihan, et al.
Published: (2024) -
Video-Guided Foley Sound Generation with Multimodal Controls
by: Chen, Ziyang, et al.
Published: (2024)