Saved in:
| Main Authors: | Wang, Guiqin, Zhao, Peng, Zhao, Cong, Huang, Jing, Guo, Siyan, Yang, Shusen |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.13565 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift
by: Zhao, Peng, et al.
Published: (2024)
by: Zhao, Peng, et al.
Published: (2024)
EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning
by: Donga, Runchu, et al.
Published: (2025)
by: Donga, Runchu, et al.
Published: (2025)
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
by: Yue, Tongtian, et al.
Published: (2025)
by: Yue, Tongtian, et al.
Published: (2025)
A CT Image Denoising Method Based on Projection Domain Feature
by: Sun, Mengyu, et al.
Published: (2024)
by: Sun, Mengyu, et al.
Published: (2024)
Accelerating Inference of Masked Image Generators via Reinforcement Learning
by: Subbaraman, Pranav, et al.
Published: (2025)
by: Subbaraman, Pranav, et al.
Published: (2025)
Precise Action-to-Video Generation Through Visual Action Prompts
by: Wang, Yuang, et al.
Published: (2025)
by: Wang, Yuang, et al.
Published: (2025)
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
by: Wang, Yubin, et al.
Published: (2024)
by: Wang, Yubin, et al.
Published: (2024)
PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation
by: Wang, Jiangshan, et al.
Published: (2026)
by: Wang, Jiangshan, et al.
Published: (2026)
Action-Guided Attention for Video Action Anticipation
by: Tai, Tsung-Ming, et al.
Published: (2026)
by: Tai, Tsung-Ming, et al.
Published: (2026)
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
by: Wang, Xiaofeng, et al.
Published: (2024)
by: Wang, Xiaofeng, et al.
Published: (2024)
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation
by: Wang, Huihan, et al.
Published: (2025)
by: Wang, Huihan, et al.
Published: (2025)
Real-Time Video Generation with Pyramid Attention Broadcast
by: Zhao, Xuanlei, et al.
Published: (2024)
by: Zhao, Xuanlei, et al.
Published: (2024)
Predicting Video Slot Attention Queries from Random Slot-Feature Pairs
by: Zhao, Rongzhen, et al.
Published: (2025)
by: Zhao, Rongzhen, et al.
Published: (2025)
Foundation Model for Skeleton-Based Human Action Understanding
by: Wang, Hongsong, et al.
Published: (2025)
by: Wang, Hongsong, et al.
Published: (2025)
From Articulated Kinematics to Routed Visual Control for Action-Conditioned Surgical Video Generation
by: Li, Bohan, et al.
Published: (2026)
by: Li, Bohan, et al.
Published: (2026)
Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism
by: Zheng, Jun, et al.
Published: (2024)
by: Zheng, Jun, et al.
Published: (2024)
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
by: Zheng, Chende, et al.
Published: (2025)
by: Zheng, Chende, et al.
Published: (2025)
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing
by: Yang, Xiangpeng, et al.
Published: (2025)
by: Yang, Xiangpeng, et al.
Published: (2025)
Understanding Attention Mechanism in Video Diffusion Models
by: Liu, Bingyan, et al.
Published: (2025)
by: Liu, Bingyan, et al.
Published: (2025)
Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models
by: Zhu, Shangwen, et al.
Published: (2026)
by: Zhu, Shangwen, et al.
Published: (2026)
Video-to-Task Learning via Motion-Guided Attention for Few-Shot Action Recognition
by: Guo, Hanyu, et al.
Published: (2024)
by: Guo, Hanyu, et al.
Published: (2024)
Adaptive Slicing-Assisted Hyper Inference for Enhanced Small Object Detection in High-Resolution Imagery
by: Moretti, Francesco, et al.
Published: (2026)
by: Moretti, Francesco, et al.
Published: (2026)
Action Images: End-to-End Policy Learning via Multiview Video Generation
by: Zhen, Haoyu, et al.
Published: (2026)
by: Zhen, Haoyu, et al.
Published: (2026)
Multi-Level LVLM Guidance for Untrimmed Video Action Recognition
by: Peng, Liyang, et al.
Published: (2025)
by: Peng, Liyang, et al.
Published: (2025)
EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation
by: Wang, Cong, et al.
Published: (2024)
by: Wang, Cong, et al.
Published: (2024)
ACWM-Phys: Investigating Generalized Physical Interaction in Action-Conditioned Video World Models
by: Xue, Haotian, et al.
Published: (2026)
by: Xue, Haotian, et al.
Published: (2026)
ResDynUNet++: A nested U-Net with residual dynamic convolution blocks for dual-spectral CT
by: Yuan, Ze, et al.
Published: (2025)
by: Yuan, Ze, et al.
Published: (2025)
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
by: Luo, Yang, et al.
Published: (2025)
by: Luo, Yang, et al.
Published: (2025)
Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection
by: Lyu, Jiahao, et al.
Published: (2024)
by: Lyu, Jiahao, et al.
Published: (2024)
Repetitive Action Counting with Hybrid Temporal Relation Modeling
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
ARINAR: Bi-Level Autoregressive Feature-by-Feature Generative Models
by: Zhao, Qinyu, et al.
Published: (2025)
by: Zhao, Qinyu, et al.
Published: (2025)
ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos
by: Wang, Xiaodong, et al.
Published: (2025)
by: Wang, Xiaodong, et al.
Published: (2025)
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
by: Yang, Min, et al.
Published: (2023)
by: Yang, Min, et al.
Published: (2023)
A Large-Scale Study on Video Action Dataset Condensation
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
Matten: Video Generation with Mamba-Attention
by: Gao, Yu, et al.
Published: (2024)
by: Gao, Yu, et al.
Published: (2024)
Vamos: Versatile Action Models for Video Understanding
by: Wang, Shijie, et al.
Published: (2023)
by: Wang, Shijie, et al.
Published: (2023)
Comp-Attn: Present-and-Align Attention for Compositional Video Generation
by: Zhang, Hongyu, et al.
Published: (2025)
by: Zhang, Hongyu, et al.
Published: (2025)
HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models
by: He, Yeqi, et al.
Published: (2026)
by: He, Yeqi, et al.
Published: (2026)
Offline Signature Verification Based on Feature Disentangling Aided Variational Autoencoder
by: Zhang, Hansong, et al.
Published: (2024)
by: Zhang, Hansong, et al.
Published: (2024)
GO-Renderer: Generative Object Rendering with 3D-aware Controllable Video Diffusion Models
by: Gu, Zekai, et al.
Published: (2026)
by: Gu, Zekai, et al.
Published: (2026)
Similar Items
-
EdgeSync: Faster Edge-model Updating via Adaptive Continuous Learning for Video Data Drift
by: Zhao, Peng, et al.
Published: (2024) -
EdgeSync: Accelerating Edge-Model Updates for Data Drift through Adaptive Continuous Learning
by: Donga, Runchu, et al.
Published: (2025) -
LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation
by: Yue, Tongtian, et al.
Published: (2025) -
A CT Image Denoising Method Based on Projection Domain Feature
by: Sun, Mengyu, et al.
Published: (2024) -
Accelerating Inference of Masked Image Generators via Reinforcement Learning
by: Subbaraman, Pranav, et al.
Published: (2025)