Saved in:
| Main Authors: | Yang, Yiran, Zhang, Jinchao, Deng, Ying, Zhou, Jie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.06617 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2025)
by: Yuan, Zhiqiang, et al.
Published: (2025)
Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025)
by: Bi, Xiuli, et al.
Published: (2025)
Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm
by: Xu, Xiaogang, et al.
Published: (2026)
by: Xu, Xiaogang, et al.
Published: (2026)
Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)
by: Chen, Yuheng, et al.
Published: (2026)
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
by: Jia, Zexi, et al.
Published: (2025)
by: Jia, Zexi, et al.
Published: (2025)
OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)
by: Zheng, Minghang, et al.
Published: (2026)
StyleDecoupler: Generalizable Artistic Style Disentanglement
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
CoDA: Color Distribution Probing for Efficient and Generalizable AI-Generated Image Detection
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
TAVGBench: Benchmarking Text to Audible-Video Generation
by: Mao, Yuxin, et al.
Published: (2024)
by: Mao, Yuxin, et al.
Published: (2024)
Evaluating Generative Models via One-Dimensional Code Distributions
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
by: Hao, Yanbin, et al.
Published: (2024)
by: Hao, Yanbin, et al.
Published: (2024)
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks
by: Yang, Min, et al.
Published: (2024)
by: Yang, Min, et al.
Published: (2024)
Described Spatial-Temporal Video Detection
by: Ji, Wei, et al.
Published: (2024)
by: Ji, Wei, et al.
Published: (2024)
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
by: Xiong, Yizhe, et al.
Published: (2024)
by: Xiong, Yizhe, et al.
Published: (2024)
Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations
by: Wang, Yuji, et al.
Published: (2025)
by: Wang, Yuji, et al.
Published: (2025)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)
by: Yang, Shaoshu, et al.
Published: (2024)
VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
by: Wang, Shaobo, et al.
Published: (2025)
by: Wang, Shaobo, et al.
Published: (2025)
Training-free Detection of Generated Videos via Spatial-Temporal Likelihoods
by: Hayun, Omer Ben, et al.
Published: (2026)
by: Hayun, Omer Ben, et al.
Published: (2026)
A Synthetic-to-Real Dehazing Method based on Domain Unification
by: Yuan, Zhiqiang, et al.
Published: (2025)
by: Yuan, Zhiqiang, et al.
Published: (2025)
PixelWizard: Towards Efficient High-Fidelity Video Generation at Ultra-Large Spatial Resolution
by: Li, Wenxue, et al.
Published: (2026)
by: Li, Wenxue, et al.
Published: (2026)
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
by: Shen, Leqi, et al.
Published: (2024)
by: Shen, Leqi, et al.
Published: (2024)
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
by: Yang, Shuai, et al.
Published: (2024)
by: Yang, Shuai, et al.
Published: (2024)
How Should Video LLMs Output Time? An Analysis of Efficient Temporal Grounding Paradigms
by: Jin, Shengji, et al.
Published: (2026)
by: Jin, Shengji, et al.
Published: (2026)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
by: Fu, Yonggan, et al.
Published: (2020)
by: Fu, Yonggan, et al.
Published: (2020)
DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control
by: Chen, Hong, et al.
Published: (2024)
by: Chen, Hong, et al.
Published: (2024)
A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos
by: He, Allen, et al.
Published: (2026)
by: He, Allen, et al.
Published: (2026)
VideoTetris: Towards Compositional Text-to-Video Generation
by: Tian, Ye, et al.
Published: (2024)
by: Tian, Ye, et al.
Published: (2024)
Spatial-Temporal Pre-Training for Embryo Viability Prediction Using Time-Lapse Videos
by: Shi, Zhiyi, et al.
Published: (2025)
by: Shi, Zhiyi, et al.
Published: (2025)
Apollo: Unified Multi-Task Audio-Video Joint Generation
by: Wang, Jun, et al.
Published: (2026)
by: Wang, Jun, et al.
Published: (2026)
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
by: Wang, Yutong, et al.
Published: (2024)
by: Wang, Yutong, et al.
Published: (2024)
4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation
by: Liu, Mengmeng, et al.
Published: (2025)
by: Liu, Mengmeng, et al.
Published: (2025)
Two Frames Matter: A Temporal Attack for Text-to-Video Model Jailbreaking
by: Chen, Moyang, et al.
Published: (2026)
by: Chen, Moyang, et al.
Published: (2026)
Manifold-Optimal Guidance: A Unified Riemannian Control View of Diffusion Guidance
by: Jia, Zexi, et al.
Published: (2026)
by: Jia, Zexi, et al.
Published: (2026)
STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models
by: Liang, Yiming, et al.
Published: (2026)
by: Liang, Yiming, et al.
Published: (2026)
AtomoVideo: High Fidelity Image-to-Video Generation
by: Gong, Litong, et al.
Published: (2024)
by: Gong, Litong, et al.
Published: (2024)
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
by: Liang, Rongqin, et al.
Published: (2024)
by: Liang, Rongqin, et al.
Published: (2024)
WalkVLM:Aid Visually Impaired People Walking by Vision Language Model
by: Yuan, Zhiqiang, et al.
Published: (2024)
by: Yuan, Zhiqiang, et al.
Published: (2024)
Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm
by: Zhang, Lin, et al.
Published: (2025)
by: Zhang, Lin, et al.
Published: (2025)
TDViT: Temporal Dilated Video Transformer for Dense Video Tasks
by: Sun, Guanxiong, et al.
Published: (2024)
by: Sun, Guanxiong, et al.
Published: (2024)
Similar Items
-
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
by: Xie, Rui, et al.
Published: (2025) -
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation
by: Yuan, Zhiqiang, et al.
Published: (2025) -
Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025) -
Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm
by: Xu, Xiaogang, et al.
Published: (2026) -
Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)