Saved in:
| Main Authors: | Fu, Fengyi, Fang, Shancheng, Chen, Weidong, Mao, Zhendong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.12782 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement
by: Guo, Junrong, et al.
Published: (2026)
by: Guo, Junrong, et al.
Published: (2026)
Latent-Compressed Variational Autoencoder for Video Diffusion Models
by: Guan, Jiarui, et al.
Published: (2026)
by: Guan, Jiarui, et al.
Published: (2026)
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
by: Wang, Wenchuan, et al.
Published: (2025)
by: Wang, Wenchuan, et al.
Published: (2025)
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
by: Lin, Yijing, et al.
Published: (2025)
by: Lin, Yijing, et al.
Published: (2025)
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
by: Jia, Qi, et al.
Published: (2024)
by: Jia, Qi, et al.
Published: (2024)
Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
by: Wu, Wenhan, et al.
Published: (2025)
by: Wu, Wenhan, et al.
Published: (2025)
Lance: Unified Multimodal Modeling by Multi-Task Synergy
by: Fu, Fengyi, et al.
Published: (2026)
by: Fu, Fengyi, et al.
Published: (2026)
An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)
by: Noghre, Ghazal Alinezhad, et al.
Published: (2024)
Latent Diffusion Model without Variational Autoencoder
by: Shi, Minglei, et al.
Published: (2025)
by: Shi, Minglei, et al.
Published: (2025)
Discrete Wavelet Transform as a Facilitator for Expressive Latent Space Representation in Variational Autoencoders in Satellite Imagery
by: Mahara, Arpan, et al.
Published: (2025)
by: Mahara, Arpan, et al.
Published: (2025)
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder
by: Chen, Junyu, et al.
Published: (2025)
by: Chen, Junyu, et al.
Published: (2025)
Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos
by: Xu, Qingyu, et al.
Published: (2024)
by: Xu, Qingyu, et al.
Published: (2024)
HOTVCOM: Generating Buzzworthy Comments for Videos
by: Chen, Yuyan, et al.
Published: (2024)
by: Chen, Yuyan, et al.
Published: (2024)
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
by: Mao, Shunqi, et al.
Published: (2025)
by: Mao, Shunqi, et al.
Published: (2025)
LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning
by: Fu, Fengyi, et al.
Published: (2025)
by: Fu, Fengyi, et al.
Published: (2025)
Dual-path Collaborative Generation Network for Emotional Video Captioning
by: Ye, Cheng, et al.
Published: (2024)
by: Ye, Cheng, et al.
Published: (2024)
HoloEv-Net: Efficient Event-based Action Recognition via Holographic Spatial Embedding and Global Spectral Gating
by: Hao, Weidong
Published: (2026)
by: Hao, Weidong
Published: (2026)
TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction
by: Li, Haoran, et al.
Published: (2023)
by: Li, Haoran, et al.
Published: (2023)
An Efficient LiDAR-Camera Fusion Network for Multi-Class 3D Dynamic Object Detection and Trajectory Prediction
by: He, Yushen, et al.
Published: (2025)
by: He, Yushen, et al.
Published: (2025)
SentiFormer: Metadata Enhanced Transformer for Image Sentiment Analysis
by: Feng, Bin, et al.
Published: (2025)
by: Feng, Bin, et al.
Published: (2025)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
by: Menapace, Willi, et al.
Published: (2024)
by: Menapace, Willi, et al.
Published: (2024)
Unsupervised Tomato Split Anomaly Detection using Hyperspectral Imaging and Variational Autoencoders
by: Abdulsalam, Mahmoud, et al.
Published: (2025)
by: Abdulsalam, Mahmoud, et al.
Published: (2025)
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
by: Liu, Feng, et al.
Published: (2025)
by: Liu, Feng, et al.
Published: (2025)
Multi-Task Adversarial Variational Autoencoder for Estimating Biological Brain Age with Multimodal Neuroimaging
by: Usman, Muhammad, et al.
Published: (2024)
by: Usman, Muhammad, et al.
Published: (2024)
DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation
by: Fu, Junhu, et al.
Published: (2026)
by: Fu, Junhu, et al.
Published: (2026)
LiveStar: Live Streaming Assistant for Real-World Online Video Understanding
by: Yang, Zhenyu, et al.
Published: (2025)
by: Yang, Zhenyu, et al.
Published: (2025)
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation
by: Huang, Victor Shea-Jay, et al.
Published: (2025)
by: Huang, Victor Shea-Jay, et al.
Published: (2025)
LOLGORITHM: Funny Comment Generation Agent For Short Videos
by: Ouyang, Xuan, et al.
Published: (2026)
by: Ouyang, Xuan, et al.
Published: (2026)
Enhancing and Accelerating Diffusion-Based Inverse Problem Solving through Measurements Optimization
by: Chen, Tianyu, et al.
Published: (2024)
by: Chen, Tianyu, et al.
Published: (2024)
Biological Brain Age Estimation using Sex-Aware Adversarial Variational Autoencoder with Multimodal Neuroimages
by: Rehman, Abd Ur, et al.
Published: (2024)
by: Rehman, Abd Ur, et al.
Published: (2024)
DeepIcon: A Hierarchical Network for Layer-wise Icon Vectorization
by: Bing, Qi, et al.
Published: (2024)
by: Bing, Qi, et al.
Published: (2024)
Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Multi-Knowledge-oriented Nighttime Haze Imaging Enhancer for Vision-driven Intelligent Systems
by: Chen, Ai, et al.
Published: (2025)
by: Chen, Ai, et al.
Published: (2025)
Gaussian Masked Autoencoders
by: Rajasegaran, Jathushan, et al.
Published: (2025)
by: Rajasegaran, Jathushan, et al.
Published: (2025)
Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder for Image Translation of Dotted Arabic Expiration Dates
by: Zidane, Ahmed, et al.
Published: (2023)
by: Zidane, Ahmed, et al.
Published: (2023)
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
by: Cho, Wonwoong, et al.
Published: (2025)
by: Cho, Wonwoong, et al.
Published: (2025)
Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
by: Xu, Guoyang, et al.
Published: (2024)
by: Xu, Guoyang, et al.
Published: (2024)
AccidentSim: Generating Vehicle Collision Videos with Physically Realistic Collision Trajectories from Real-World Accident Reports
by: Zhang, Xiangwen, et al.
Published: (2025)
by: Zhang, Xiangwen, et al.
Published: (2025)
MeanFlow Transformers with Representation Autoencoders
by: Hu, Zheyuan, et al.
Published: (2025)
by: Hu, Zheyuan, et al.
Published: (2025)
Knowledge Amalgamation for Object Detection with Transformers
by: Zhang, Haofei, et al.
Published: (2022)
by: Zhang, Haofei, et al.
Published: (2022)
Similar Items
-
Seeing is Improving: Visual Feedback for Iterative Text Layout Refinement
by: Guo, Junrong, et al.
Published: (2026) -
Latent-Compressed Variational Autoencoder for Video Diffusion Models
by: Guan, Jiarui, et al.
Published: (2026) -
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
by: Wang, Wenchuan, et al.
Published: (2025) -
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
by: Lin, Yijing, et al.
Published: (2025) -
Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
by: Jia, Qi, et al.
Published: (2024)