:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dedhia, Bhishma, Bourgin, David, Singh, Krishna Kumar, Li, Yuheng, Kang, Yan, Xu, Zhan, Jha, Niraj K., Liu, Yuchen
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2503.17539
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Neural Slot Interpreters: Grounding Object Semantics in Emergent Slot Representations
by: Dedhia, Bhishma, et al.
Published: (2024)

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
by: Wang, Hongjie, et al.
Published: (2023)

Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
by: Dedhia, Bhishma, et al.
Published: (2025)

DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
by: Ding, Zihan, et al.
Published: (2024)

Video-Guided Foley Sound Generation with Multimodal Controls
by: Chen, Ziyang, et al.
Published: (2024)

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
by: Hong, Yining, et al.
Published: (2024)

ActAnywhere: Subject-Aware Video Background Generation
by: Pan, Boxiao, et al.
Published: (2024)

REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
by: Zhang, Yitian, et al.
Published: (2025)

Slot-VLM: SlowFast Slots for Video-Language Modeling
by: Xu, Jiaqi, et al.
Published: (2024)

Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
by: Mahapatra, Aniruddha, et al.
Published: (2025)

RefDecoder: Enhancing Visual Generation with Conditional Video Decoding
by: Fan, Xiang, et al.
Published: (2026)

FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation
by: Ding, Ganggui, et al.
Published: (2026)

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
by: Li, Wei, et al.
Published: (2025)

HEED: Density-Weighted Residual Alignment for Hybrid Vision-Language Model Distillation
by: Liang, Yihao, et al.
Published: (2026)

ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
by: Kara, Ozgur, et al.
Published: (2025)

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
by: Xu, Mingze, et al.
Published: (2024)

Seeing Fast and Slow: Learning the Flow of Time in Videos
by: Wu, Yen-Siang, et al.
Published: (2026)

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
by: Yin, Tianwei, et al.
Published: (2024)

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
by: Wang, Hongjie, et al.
Published: (2024)

SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
by: Xu, Mingze, et al.
Published: (2025)

LinMU: Multimodal Understanding Made Linear
by: Wang, Hongjie, et al.
Published: (2026)

Slow-Fast Architecture for Video Multi-Modal Large Language Models
by: Shi, Min, et al.
Published: (2025)

InstanceV: Instance-Level Video Generation
by: Chen, Yuheng, et al.
Published: (2025)

YoChameleon: Personalized Vision and Language Generation
by: Nguyen, Thao, et al.
Published: (2025)

SafeLens: Deliberate and Efficient Video Guardrails with Fast-and-Slow Screening
by: Nahin, Shahriar Kabir, et al.
Published: (2026)

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding
by: Tan, Wenhui, et al.
Published: (2026)

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
by: Wang, Hongjie, et al.
Published: (2024)

Video-T1: Test-Time Scaling for Video Generation
by: Liu, Fangfu, et al.
Published: (2025)

Fast Video Generation with Sliding Tile Attention
by: Zhang, Peiyuan, et al.
Published: (2025)

RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
by: Yan, Tianyi, et al.
Published: (2025)

Boximator: Generating Rich and Controllable Motions for Video Synthesis
by: Wang, Jiawei, et al.
Published: (2024)

MoVideo: Motion-Aware Video Generation with Diffusion Models
by: Liang, Jingyun, et al.
Published: (2023)

MOVA: Towards Scalable and Synchronized Video-Audio Generation
by: OpenMOSS Team, et al.
Published: (2026)

SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
by: Li, Zhengang, et al.
Published: (2024)

SAMJAM: Zero-Shot Video Scene Graph Generation for Egocentric Kitchen Videos
by: Li, Joshua, et al.
Published: (2025)

Fast Autoregressive Video Generation with Diagonal Decoding
by: Ye, Yang, et al.
Published: (2025)

Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
by: Chen, Bolin, et al.
Published: (2025)

VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion
by: Yang, Lehan, et al.
Published: (2025)

AtomoVideo: High Fidelity Image-to-Video Generation
by: Gong, Litong, et al.
Published: (2024)

Transition Matching Distillation for Fast Video Generation
by: Nie, Weili, et al.
Published: (2026)