Saved in:
| Main Authors: | Liu, Shuai, Li, Youmeng, Wei, Jizeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.10258 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots
by: Zheng, Guangting, et al.
Published: (2025)
by: Zheng, Guangting, et al.
Published: (2025)
QPT V2: Masked Image Modeling Advances Visual Scoring
by: Xie, Qizhi, et al.
Published: (2024)
by: Xie, Qizhi, et al.
Published: (2024)
Self-supervised Photographic Image Layout Representation Learning
by: Zhao, Zhaoran, et al.
Published: (2024)
by: Zhao, Zhaoran, et al.
Published: (2024)
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
by: Liu, Rex, et al.
Published: (2024)
by: Liu, Rex, et al.
Published: (2024)
Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment
by: Liu, Yongxu, et al.
Published: (2024)
by: Liu, Yongxu, et al.
Published: (2024)
FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
by: Cai, Lingling, et al.
Published: (2024)
by: Cai, Lingling, et al.
Published: (2024)
Learning Generalizable and Efficient Image Watermarking via Hierarchical Two-Stage Optimization
by: Liu, Ke, et al.
Published: (2025)
by: Liu, Ke, et al.
Published: (2025)
Single Image Dehazing Using Scene Depth Ordering
by: Ling, Pengyang, et al.
Published: (2024)
by: Ling, Pengyang, et al.
Published: (2024)
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
by: Zhang, Zhenxing, et al.
Published: (2024)
by: Zhang, Zhenxing, et al.
Published: (2024)
MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation
by: Xu, Shuolin, et al.
Published: (2025)
by: Xu, Shuolin, et al.
Published: (2025)
SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models
by: Zhu, Yule, et al.
Published: (2025)
by: Zhu, Yule, et al.
Published: (2025)
AeSlides: Incentivizing Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards
by: Pan, Yiming, et al.
Published: (2026)
by: Pan, Yiming, et al.
Published: (2026)
VC-Bench: Pioneering the Video Connecting Benchmark with a Dataset and Evaluation Metrics
by: Yin, Zhiyu, et al.
Published: (2026)
by: Yin, Zhiyu, et al.
Published: (2026)
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions
by: Zhang, Rui, et al.
Published: (2024)
by: Zhang, Rui, et al.
Published: (2024)
PAME: Self-Supervised Masked Autoencoder for No-Reference Point Cloud Quality Assessment
by: Shan, Ziyu, et al.
Published: (2024)
by: Shan, Ziyu, et al.
Published: (2024)
SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
by: Xiong, Lingyu, et al.
Published: (2024)
by: Xiong, Lingyu, et al.
Published: (2024)
Agent Journey Beyond RGB: Hierarchical Semantic-Spatial Representation Enrichment for Vision-and-Language Navigation
by: Zhang, Xuesong, et al.
Published: (2024)
by: Zhang, Xuesong, et al.
Published: (2024)
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding
by: Sun, Qianqian, et al.
Published: (2025)
by: Sun, Qianqian, et al.
Published: (2025)
SMC++: Masked Learning of Unsupervised Video Semantic Compression
by: Tian, Yuan, et al.
Published: (2024)
by: Tian, Yuan, et al.
Published: (2024)
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
by: Fernandez-Lopez, Adriana, et al.
Published: (2024)
by: Fernandez-Lopez, Adriana, et al.
Published: (2024)
When and How to Cut Classical Concerts? A Multimodal Automated Video Editing Approach
by: Gonzálbez-Biosca, Daniel, et al.
Published: (2025)
by: Gonzálbez-Biosca, Daniel, et al.
Published: (2025)
Hierarchical Sub-action Tree for Continuous Sign Language Recognition
by: Yang, Dejie, et al.
Published: (2025)
by: Yang, Dejie, et al.
Published: (2025)
Learning Brain Representation with Hierarchical Visual Embeddings
by: Zheng, Jiawen, et al.
Published: (2026)
by: Zheng, Jiawen, et al.
Published: (2026)
Generalizable Deepfake Detection Based on Forgery-aware Layer Masking and Multi-artifact Subspace Decomposition
by: Zhang, Xiang, et al.
Published: (2026)
by: Zhang, Xiang, et al.
Published: (2026)
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering
by: Meng, Yiran, et al.
Published: (2025)
by: Meng, Yiran, et al.
Published: (2025)
Probabilistic Temporal Masked Attention for Cross-view Online Action Detection
by: Xie, Liping, et al.
Published: (2025)
by: Xie, Liping, et al.
Published: (2025)
Retrieving Any Relevant Moments: Benchmark and Models for Generalized Moment Retrieval
by: Ding, Yiming, et al.
Published: (2026)
by: Ding, Yiming, et al.
Published: (2026)
A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation
by: Li, Hanxi, et al.
Published: (2024)
by: Li, Hanxi, et al.
Published: (2024)
HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval
by: Chen, Zhiwei, et al.
Published: (2025)
by: Chen, Zhiwei, et al.
Published: (2025)
Hierarchical Semantic Correlation-Aware Masked Autoencoder for Unsupervised Audio-Visual Representation Learning
by: Zeng, Donghuo, et al.
Published: (2026)
by: Zeng, Donghuo, et al.
Published: (2026)
Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery
by: Xu, Yulin, et al.
Published: (2026)
by: Xu, Yulin, et al.
Published: (2026)
Seeing Text in the Dark: Algorithm and Benchmark
by: Xu, Chengpei, et al.
Published: (2024)
by: Xu, Chengpei, et al.
Published: (2024)
Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation
by: Liu, Xiaofeng, et al.
Published: (2023)
by: Liu, Xiaofeng, et al.
Published: (2023)
Order Is Not Layout: Order-to-Space Bias in Image Generation
by: Zhang, Yongkang, et al.
Published: (2026)
by: Zhang, Yongkang, et al.
Published: (2026)
AdaptaGen: Domain-Specific Image Generation through Hierarchical Semantic Optimization Framework
by: Zhang, Suoxiang, et al.
Published: (2025)
by: Zhang, Suoxiang, et al.
Published: (2025)
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
by: Zhou, Jinxing, et al.
Published: (2024)
by: Zhou, Jinxing, et al.
Published: (2024)
Multiple Contexts and Frequencies Aggregation Network forDeepfake Detection
by: Li, Zifeng, et al.
Published: (2024)
by: Li, Zifeng, et al.
Published: (2024)
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
by: Chu, Meng, et al.
Published: (2023)
by: Chu, Meng, et al.
Published: (2023)
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs
by: Xu, Zitong, et al.
Published: (2025)
by: Xu, Zitong, et al.
Published: (2025)
COutfitGAN: Learning to Synthesize Compatible Outfits Supervised by Silhouette Masks and Fashion Styles
by: Zhou, Dongliang, et al.
Published: (2025)
by: Zhou, Dongliang, et al.
Published: (2025)
Similar Items
-
Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots
by: Zheng, Guangting, et al.
Published: (2025) -
QPT V2: Masked Image Modeling Advances Visual Scoring
by: Xie, Qizhi, et al.
Published: (2024) -
Self-supervised Photographic Image Layout Representation Learning
by: Zhao, Zhaoran, et al.
Published: (2024) -
MU-MAE: Multimodal Masked Autoencoders-Based One-Shot Learning
by: Liu, Rex, et al.
Published: (2024) -
Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment
by: Liu, Yongxu, et al.
Published: (2024)