:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yiran, Zhang, Jinchao, Deng, Ying, Zhou, Jie
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2407.06617
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
by: Xie, Rui, et al.
Published: (2025)

RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2025)

Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025)

Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm
by: Xu, Xiaogang, et al.
Published: (2026)

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)

A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
by: Jia, Zexi, et al.
Published: (2025)

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)

StyleDecoupler: Generalizable Artistic Style Disentanglement
by: Jia, Zexi, et al.
Published: (2026)

CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection
by: Jia, Zexi, et al.
Published: (2026)

TAVGBench: Benchmarking Text to Audible-Video Generation
by: Mao, Yuxin, et al.
Published: (2024)

Evaluating Generative Models via One-Dimensional Code Distributions
by: Jia, Zexi, et al.
Published: (2026)

PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
by: Hao, Yanbin, et al.
Published: (2024)

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
by: Yang, Min, et al.
Published: (2024)

Described Spatial-Temporal Video Detection
by: Ji, Wei, et al.
Published: (2024)

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
by: Xiong, Yizhe, et al.
Published: (2024)

Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations
by: Wang, Yuji, et al.
Published: (2025)

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)

VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
by: Wang, Shaobo, et al.
Published: (2025)

Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods
by: Hayun, Omer Ben, et al.
Published: (2026)

A Synthetic-to-Real Dehazing Method based on Domain Unification
by: Yuan, Zhiqiang, et al.
Published: (2025)

PixelWizard: Towards Efficient High-Fidelity Video Generation at Ultra-Large Spatial Resolution
by: Li, Wenxue, et al.
Published: (2026)

TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
by: Yang, Shuai, et al.
Published: (2024)

How Should Video LLMs Output Time? An Analysis of Efficient Temporal Grounding Paradigms
by: Jin, Shengji, et al.
Published: (2026)

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
by: Fu, Yonggan, et al.
Published: (2020)

DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control
by: Chen, Hong, et al.
Published: (2024)

A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos
by: He, Allen, et al.
Published: (2026)

VideoTetris: Towards Compositional Text-to-Video Generation
by: Tian, Ye, et al.
Published: (2024)

Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos
by: Shi, Zhiyi, et al.
Published: (2025)

Apollo: Unified Multi-Task Audio-Video Joint Generation
by: Wang, Jun, et al.
Published: (2026)

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
by: Wang, Yutong, et al.
Published: (2024)

4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation
by: Liu, Mengmeng, et al.
Published: (2025)

Two Frames Matter: A Temporal Attack for Text-to-Video Model Jailbreaking
by: Chen, Moyang, et al.
Published: (2026)

Manifold-Optimal Guidance: A Unified Riemannian Control View of Diffusion Guidance
by: Jia, Zexi, et al.
Published: (2026)

STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models
by: Liang, Yiming, et al.
Published: (2026)

AtomoVideo: High Fidelity Image-to-Video Generation
by: Gong, Litong, et al.
Published: (2024)

Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
by: Liang, Rongqin, et al.
Published: (2024)

WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
by: Yuan, Zhiqiang, et al.
Published: (2024)

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm
by: Zhang, Lin, et al.
Published: (2025)

TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
by: Sun, Guanxiong, et al.
Published: (2024)