:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Songping, Liu, Hanqing, Lyu, Yueming, Hu, Xiantao, He, Ziwen, Wang, Wei, Shan, Caifeng, Wang, Liang
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2504.14921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

An Effective End-to-End Solution for Multimodal Action Recognition
by: Wang, Songping, et al.
Published: (2025)

Exploring Adversarial Transferability between Kolmogorov-arnold Networks
by: Wang, Songping, et al.
Published: (2025)

Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts
by: Wang, Songping, et al.
Published: (2026)

RunawayEvil: Jailbreaking the Image-to-Video Generative Models
by: Wang, Songping, et al.
Published: (2025)

Anti-Aesthetics: Protecting Facial Privacy against Customized Text-to-Image Synthesis
by: Wang, Songping, et al.
Published: (2025)

GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection
by: Gao, Xin, et al.
Published: (2025)

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
by: Zhu, Xiaoyu, et al.
Published: (2024)

One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control
by: Rao, Haoxiang, et al.
Published: (2026)

Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition
by: Xu, Xiaogang, et al.
Published: (2024)

Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models
by: Liang, Hanwen, et al.
Published: (2024)

Evaluating Adversarial Robustness in the Spatial Frequency Domain
by: Liao, Keng-Hsin, et al.
Published: (2024)

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
by: Yang, Ruoliu, et al.
Published: (2026)

Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
by: Wang, Yutong, et al.
Published: (2024)

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
by: Lyu, Yueming, et al.
Published: (2023)

ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection
by: Deng, Runzhi, et al.
Published: (2025)

Uncertainty-Aware Concept and Motion Segmentation for Semi-Supervised Angiography Videos
by: Luo, Yu, et al.
Published: (2026)

Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring
by: Wang, Jieying, et al.
Published: (2026)

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
by: Wei, Jiazhe, et al.
Published: (2025)

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion
by: Cao, Mingdeng, et al.
Published: (2024)

NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors
by: Bin, Yanrui, et al.
Published: (2025)

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
by: Ge, Xingtong, et al.
Published: (2026)

Leveraging Information Consistency in Frequency and Spatial Domain for Adversarial Attacks
by: Jin, Zhibo, et al.
Published: (2024)

Spatial-Frequency Discriminability for Revealing Adversarial Perturbations
by: Wang, Chao, et al.
Published: (2023)

Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
by: Li, Yang, et al.
Published: (2024)

FastInit: Fast Noise Initialization for Temporally Consistent Video Generation
by: Bai, Chengyu, et al.
Published: (2025)

Spatial-Temporal Decoupled Reference Conditioning for Identity-Preserving Text-to-Video Generation
by: Chen, Yuheng, et al.
Published: (2026)

Frequency-Guided Diffusion Model with Perturbation Training for Skeleton-Based Video Anomaly Detection
by: Tan, Xiaofeng, et al.
Published: (2024)

Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
by: Zhao, Weichao, et al.
Published: (2024)

Self-Attentive Spatio-Temporal Calibration for Precise Intermediate Layer Matching in ANN-to-SNN Distillation
by: Hong, Di, et al.
Published: (2025)

VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
by: Wang, Shaobo, et al.
Published: (2025)

Mixture of Weak & Strong Experts on Graphs
by: Zeng, Hanqing, et al.
Published: (2023)

InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution
by: Hu, Jintong, et al.
Published: (2026)

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training
by: Wang, Yanyun, et al.
Published: (2026)

VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment
by: Cong, Wenyan, et al.
Published: (2025)

Temporal-Consistent Video Restoration with Pre-trained Diffusion Models
by: Wang, Hengkang, et al.
Published: (2025)

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
by: Shi, Jiapeng, et al.
Published: (2026)

DirectSwap: Mask-Free Cross-Identity Training and Benchmarking for Expression-Consistent Video Head Swapping
by: Wang, Yanan, et al.
Published: (2025)

Weak to Strong: VLM-Based Pseudo-Labeling as a Weakly Supervised Training Strategy in Multimodal Video-based Hidden Emotion Understanding Tasks
by: Wang, Yufei, et al.
Published: (2026)

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
by: Hu, Xiantao, et al.
Published: (2024)

Dual Frequency Branch Framework with Reconstructed Sliding Windows Attention for AI-Generated Image Detection
by: Yan, Jiazhen, et al.
Published: (2025)