:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dai, Chengjie, Song, Tiantian, Tang, Hui, Chen, Fangdong, Yang, Bowei, Song, Guanghua
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2504.12923
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
by: Wu, Yecheng, et al.
Published: (2025)

Deep Lossless Image Compression via Masked Sampling and Coarse-to-Fine Auto-Regression
by: Li, Tiantian, et al.
Published: (2025)

Efficient Progressive Image Compression with Variance-aware Masking
by: Presta, Alberto, et al.
Published: (2024)

Linear Attention Modeling for Learned Image Compression
by: Feng, Donghui, et al.
Published: (2025)

MSCViT: A Small-size ViT architecture with Multi-Scale Self-Attention Mechanism for Tiny Datasets
by: Zhang, Bowei, et al.
Published: (2025)

Efficient Masked Autoencoders with Self-Consistency
by: Li, Zhaowen, et al.
Published: (2023)

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
by: Mao, Weian, et al.
Published: (2026)

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks
by: Zou, Siyu, et al.
Published: (2024)

Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
by: Luo, Yifu, et al.
Published: (2025)

SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction
by: Wang, Xia, et al.
Published: (2025)

Exploring the Coordination of Frequency and Attention in Masked Image Modeling
by: Gui, Jie, et al.
Published: (2022)

MedVKAN: Efficient Feature Extraction with Mamba and KAN for Medical Image Segmentation
by: Zhu, Hancan, et al.
Published: (2025)

FairViT: Fair Vision Transformer via Adaptive Masking
by: Tian, Bowei, et al.
Published: (2024)

GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation
by: Chen, Chenglizhao, et al.
Published: (2026)

SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
by: Song, Chull Hwan, et al.
Published: (2024)

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
by: Wang, Wenxuan, et al.
Published: (2023)

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
by: You, Haoran, et al.
Published: (2022)

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
by: Bai, Jinbin, et al.
Published: (2024)

Next-Frame Decoding for Ultra-Low-Bitrate Image Compression with Video Diffusion Priors
by: Chen, Yunuo, et al.
Published: (2026)

Polyline Path Masked Attention for Vision Transformer
by: Zhao, Zhongchen, et al.
Published: (2025)

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
by: Chen, Junyu, et al.
Published: (2024)

Timestep-Aware Block Masking for Efficient Diffusion Model Inference
by: He, Haodong, et al.
Published: (2026)

Extremely low-bitrate Image Compression Semantically Disentangled by LMMs from a Human Perception Perspective
by: Song, Juan, et al.
Published: (2025)

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
by: Song, Yiran, et al.
Published: (2024)

Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
by: Chen, Jun, et al.
Published: (2022)

Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents
by: Song, Yurun, et al.
Published: (2026)

DCText: Scheduled Attention Masking for Visual Text Generation via Divide-and-Conquer Strategy
by: Song, Jaewoo, et al.
Published: (2025)

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
by: Li, Yuheng, et al.
Published: (2024)

Mask What Matters: Controllable Text-Guided Masking for Self-Supervised Medical Image Analysis
by: Wang, Ruilang, et al.
Published: (2025)

GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting
by: Li, Tiantian, et al.
Published: (2025)

TCSAFormer: Efficient Vision Transformer with Token Compression and Sparse Attention for Medical Image Segmentation
by: Xia, Zunhui, et al.
Published: (2025)

PositionIC: Unified Position and Identity Consistency for Image Customization
by: Hu, Junjie, et al.
Published: (2025)

EpiMask: Leveraging Epipolar Distance Based Masks in Cross-Attention for Satellite Image Matching
by: Deshmukh, Rahul, et al.
Published: (2026)

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models
by: Xiao, Changming, et al.
Published: (2023)

Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections
by: Bao, Yongtang, et al.
Published: (2025)

MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
by: Wu, Biao, et al.
Published: (2024)

Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models
by: Song, Chenyue, et al.
Published: (2025)

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation
by: Zheng, Haoyu, et al.
Published: (2024)

$Δ$-AttnMask: Attention-Guided Masked Hidden States for Efficient Data Selection and Augmentation
by: Hu, Jucheng, et al.
Published: (2025)

MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis
by: Zhuang, Jiaxin, et al.
Published: (2024)