:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Guiqin, Zhao, Peng, Zhao, Cong, Huang, Jing, Guo, Siyan, Yang, Shusen
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.13565
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift
by: Zhao, Peng, et al.
Published: (2024)

EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning
by: Donga, Runchu, et al.
Published: (2025)

LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
by: Yue, Tongtian, et al.
Published: (2025)

A CT Image Denoising Method Based on Projection Domain Feature
by: Sun, Mengyu, et al.
Published: (2024)

Accelerating Inference of Masked Image Generators via Reinforcement Learning
by: Subbaraman, Pranav, et al.
Published: (2025)

Precise Action-to-Video Generation Through Visual Action Prompts
by: Wang, Yuang, et al.
Published: (2025)

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
by: Wang, Yubin, et al.
Published: (2024)

PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
by: Wang, Jiangshan, et al.
Published: (2026)

Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)

EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024)

FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation
by: Wang, Huihan, et al.
Published: (2025)

Real-Time Video Generation with Pyramid Attention Broadcast
by: Zhao, Xuanlei, et al.
Published: (2024)

Predicting Video Slot Attention Queries from Random Slot-Feature Pairs
by: Zhao, Rongzhen, et al.
Published: (2025)

Foundation Model for Skeleton-Based Human Action Understanding
by: Wang, Hongsong, et al.
Published: (2025)

From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation
by: Li, Bohan, et al.
Published: (2026)

Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
by: Zheng, Jun, et al.
Published: (2024)

D3: Training-Free AI-Generated Video Detection Using Second-Order Features
by: Zheng, Chende, et al.
Published: (2025)

VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
by: Yang, Xiangpeng, et al.
Published: (2025)

Understanding Attention Mechanism in Video Diffusion Models
by: Liu, Bingyan, et al.
Published: (2025)

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
by: Zhu, Shangwen, et al.
Published: (2026)

Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition
by: Guo, Hanyu, et al.
Published: (2024)

Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery
by: Moretti, Francesco, et al.
Published: (2026)

Action Images: End-to-End Policy Learning via Multiview Video Generation
by: Zhen, Haoyu, et al.
Published: (2026)

Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
by: Wang, Cong, et al.
Published: (2024)

ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models
by: Xue, Haotian, et al.
Published: (2026)

ResDynUNet++: A nested U-Net with residual dynamic convolution blocks for dual-spectral CT
by: Yuan, Ze, et al.
Published: (2025)

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
by: Luo, Yang, et al.
Published: (2025)

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection
by: Lyu, Jiahao, et al.
Published: (2024)

Repetitive Action Counting with Hybrid Temporal Relation Modeling
by: Li, Kun, et al.
Published: (2024)

ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models
by: Zhao, Qinyu, et al.
Published: (2025)

ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
by: Wang, Xiaodong, et al.
Published: (2025)

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
by: Yang, Min, et al.
Published: (2023)

A Large-Scale Study on Video Action Dataset Condensation
by: Chen, Yang, et al.
Published: (2024)

Matten: Video Generation with Mamba-Attention
by: Gao, Yu, et al.
Published: (2024)

Vamos: Versatile Action Models for Video Understanding
by: Wang, Shijie, et al.
Published: (2023)

Comp-Attn: Present-and-Align Attention for Compositional Video Generation
by: Zhang, Hongyu, et al.
Published: (2025)

HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models
by: He, Yeqi, et al.
Published: (2026)

Offline Signature Verification Based on Feature Disentangling Aided Variational Autoencoder
by: Zhang, Hansong, et al.
Published: (2024)

GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models
by: Gu, Zekai, et al.
Published: (2026)