:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gong, Yue, Li, Hongyu, Liu, Shanyuan, Cheng, Bo, Ma, Yuhang, Wu, Liebucha, Wu, Xiaoyu, Zhang, Manyuan, Leng, Dawei, Yin, Yuhui, Zhang, Lijun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.19206
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
by: Cheng, Bo, et al.
Published: (2024)

Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities
by: Liu, Shanyuan, et al.
Published: (2023)

NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers
by: Ma, Yuhang, et al.
Published: (2025)

CTA-Flux: Integrating Chinese Cultural Semantics into High-Quality English Text-to-Image Communities
by: Gong, Yue, et al.
Published: (2025)

NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer
by: Liu, Shanyuan, et al.
Published: (2025)

PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
by: He, Runze, et al.
Published: (2025)

FLUX-Makeup: High-Fidelity, Identity-Consistent, and Robust Makeup Transfer via Diffusion Transformer
by: Zhu, Jian, et al.
Published: (2025)

RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition
by: Wang, Binhao, et al.
Published: (2026)

RefTon: Reference person shot assist virtual Try-on
by: Li, Liuzhuozheng, et al.
Published: (2025)

U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion Transformers
by: Zhang, Zhanjie, et al.
Published: (2025)

WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation
by: Wang, Jing, et al.
Published: (2025)

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
by: Wang, Bin, et al.
Published: (2024)

RzenEmbed: Towards Comprehensive Multimodal Retrieval
by: Jian, Weijian, et al.
Published: (2025)

ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
by: Li, Shaoning, et al.
Published: (2025)

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
by: Zheng, Dian, et al.
Published: (2026)

LMM-Det: Make Large Multimodal Models Excel in Object Detection
by: Li, Jincheng, et al.
Published: (2025)

FG-CLIP: Fine-Grained Visual and Textual Alignment
by: Xie, Chunyu, et al.
Published: (2025)

TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
by: Li, Teng, et al.
Published: (2026)

SVD-AE: Simple Autoencoders for Collaborative Filtering
by: Hong, Seoyoung, et al.
Published: (2024)

Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task
by: Wang, Jing, et al.
Published: (2024)

KilonovAE: Exploring Kilonova Spectral Features with Autoencoders
by: Ford, N. M., et al.
Published: (2023)

FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
by: Xie, Chunyu, et al.
Published: (2025)

Understanding Internal Representations of Recommendation Models with Sparse Autoencoders
by: Wang, Jiayin, et al.
Published: (2024)

AE SemRL: Learning Semantic Association Rules with Autoencoders
by: Karabulut, Erkan, et al.
Published: (2024)

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing
by: Zhang, Shilong, et al.
Published: (2025)

Improved Baselines with Representation Autoencoders
by: Singh, Jaskirat, et al.
Published: (2026)

RSAttAE: An Information-Aware Attention-based Autoencoder Recommender System
by: Taromi, Amirhossein Dadashzadeh, et al.
Published: (2025)

StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
by: Opper, Mattia, et al.
Published: (2023)

HaHeAE: Learning Generalisable Joint Representations of Human Hand and Head Movements in Extended Reality
by: Hu, Zhiming, et al.
Published: (2024)

Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection
by: Zhang, Ye, et al.
Published: (2024)

Research on the Load Bearing and Impact Resistance of a Novel Structure Exhibiting Both Positive and Negative Poisson’s Ratios
by: Xidong Zhang, et al.
Published: (2024)

DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing
by: Xie, Chenxi, et al.
Published: (2025)

An injectable pH-responsive marine polysaccharide hydrogel (AE&LF@pOA) for sequential therapy of infected diabetic wounds.
by: Zhao, Meiyue, et al.
Published: (2026)

TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders
by: Cheng, Mingyue, et al.
Published: (2023)

AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing
by: Liu, Tianyu, et al.
Published: (2026)

threewater-dot/MvAE: MvAE
by: threewater-dot
Published: (2026)

AE-ViT: Token Enhancement for Vision Transformers via CNN-Based Autoencoder Ensembles
by: AIRCC
Published: (2025)

Functional Autoencoder for Smoothing and Representation Learning
by: Wu, Sidi, et al.
Published: (2024)

Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism
by: Lv, Haotian, et al.
Published: (2025)

UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
by: Zou, Jian, et al.
Published: (2023)