:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fu, Fengyi, Fang, Shancheng, Chen, Weidong, Mao, Zhendong
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2404.12782
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement
by: Guo, Junrong, et al.
Published: (2026)

Latent-Compressed Variational Autoencoder for Video Diffusion Models
by: Guan, Jiarui, et al.
Published: (2026)

DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
by: Wang, Wenchuan, et al.
Published: (2025)

RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
by: Lin, Yijing, et al.
Published: (2025)

Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
by: Jia, Qi, et al.
Published: (2024)

Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
by: Wu, Wenhan, et al.
Published: (2025)

Lance: Unified Multimodal Modeling by Multi-Task Synergy
by: Fu, Fengyi, et al.
Published: (2026)

An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)

Latent Diffusion Model without Variational Autoencoder
by: Shi, Minglei, et al.
Published: (2025)

Discrete Wavelet Transform as a Facilitator for Expressive Latent Space Representation in Variational Autoencoders in Satellite Imagery
by: Mahara, Arpan, et al.
Published: (2025)

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
by: Chen, Junyu, et al.
Published: (2025)

Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos
by: Xu, Qingyu, et al.
Published: (2024)

HOTVCOM: Generating Buzzworthy Comments for Videos
by: Chen, Yuyan, et al.
Published: (2024)

Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
by: Mao, Shunqi, et al.
Published: (2025)

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning
by: Fu, Fengyi, et al.
Published: (2025)

Dual-path Collaborative Generation Network for Emotional Video Captioning
by: Ye, Cheng, et al.
Published: (2024)

HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating
by: Hao, Weidong
Published: (2026)

TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction
by: Li, Haoran, et al.
Published: (2023)

An Efficient LiDAR-Camera Fusion Network for Multi-Class 3D Dynamic Object Detection and Trajectory Prediction
by: He, Yushen, et al.
Published: (2025)

SentiFormer: Metadata Enhanced Transformer for Image Sentiment Analysis
by: Feng, Bin, et al.
Published: (2025)

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)

Unsupervised Tomato Split Anomaly Detection using Hyperspectral Imaging and Variational Autoencoders
by: Abdulsalam, Mahmoud, et al.
Published: (2025)

TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
by: Liu, Feng, et al.
Published: (2025)

Multi-Task Adversarial Variational Autoencoder for Estimating Biological Brain Age with Multimodal Neuroimaging
by: Usman, Muhammad, et al.
Published: (2024)

DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation
by: Fu, Junhu, et al.
Published: (2026)

LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
by: Yang, Zhenyu, et al.
Published: (2025)

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
by: Huang, Victor Shea-Jay, et al.
Published: (2025)

LOLGORITHM: Funny Comment Generation Agent For Short Videos
by: Ouyang, Xuan, et al.
Published: (2026)

Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization
by: Chen, Tianyu, et al.
Published: (2024)

Biological Brain Age Estimation using Sex-Aware Adversarial Variational Autoencoder with Multimodal Neuroimages
by: Rehman, Abd Ur, et al.
Published: (2024)

DeepIcon: A Hierarchical Network for Layer-wise Icon Vectorization
by: Bing, Qi, et al.
Published: (2024)

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing
by: Li, Hao, et al.
Published: (2024)

Multi-Knowledge-oriented Nighttime Haze Imaging Enhancer for Vision-driven Intelligent Systems
by: Chen, Ai, et al.
Published: (2025)

Gaussian Masked Autoencoders
by: Rajasegaran, Jathushan, et al.
Published: (2025)

Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder for Image Translation of Dotted Arabic Expiration Dates
by: Zidane, Ahmed, et al.
Published: (2023)

Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
by: Cho, Wonwoong, et al.
Published: (2025)

Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
by: Xu, Guoyang, et al.
Published: (2024)

AccidentSim: Generating Vehicle Collision Videos with Physically Realistic Collision Trajectories from Real-World Accident Reports
by: Zhang, Xiangwen, et al.
Published: (2025)

MeanFlow Transformers with Representation Autoencoders
by: Hu, Zheyuan, et al.
Published: (2025)

Knowledge Amalgamation for Object Detection with Transformers
by: Zhang, Haofei, et al.
Published: (2022)