Saved in:
| Main Authors: | Gong, Yue, Li, Hongyu, Liu, Shanyuan, Cheng, Bo, Ma, Yuhang, Wu, Liebucha, Wu, Xiaoyu, Zhang, Manyuan, Leng, Dawei, Yin, Yuhui, Zhang, Lijun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.19206 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
by: Cheng, Bo, et al.
Published: (2024)
by: Cheng, Bo, et al.
Published: (2024)
Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities
by: Liu, Shanyuan, et al.
Published: (2023)
by: Liu, Shanyuan, et al.
Published: (2023)
NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers
by: Ma, Yuhang, et al.
Published: (2025)
by: Ma, Yuhang, et al.
Published: (2025)
CTA-Flux: Integrating Chinese Cultural Semantics into High-Quality English Text-to-Image Communities
by: Gong, Yue, et al.
Published: (2025)
by: Gong, Yue, et al.
Published: (2025)
NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer
by: Liu, Shanyuan, et al.
Published: (2025)
by: Liu, Shanyuan, et al.
Published: (2025)
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
by: He, Runze, et al.
Published: (2025)
by: He, Runze, et al.
Published: (2025)
FLUX-Makeup: High-Fidelity, Identity-Consistent, and Robust Makeup Transfer via Diffusion Transformer
by: Zhu, Jian, et al.
Published: (2025)
by: Zhu, Jian, et al.
Published: (2025)
RevealLayer: Disentangling Hidden and Visible Layers via Occlusion-Aware Image Decomposition
by: Wang, Binhao, et al.
Published: (2026)
by: Wang, Binhao, et al.
Published: (2026)
RefTon: Reference person shot assist virtual Try-on
by: Li, Liuzhuozheng, et al.
Published: (2025)
by: Li, Liuzhuozheng, et al.
Published: (2025)
U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion Transformers
by: Zhang, Zhanjie, et al.
Published: (2025)
by: Zhang, Zhanjie, et al.
Published: (2025)
WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation
by: Wang, Jing, et al.
Published: (2025)
by: Wang, Jing, et al.
Published: (2025)
IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities
by: Wang, Bin, et al.
Published: (2024)
by: Wang, Bin, et al.
Published: (2024)
RzenEmbed: Towards Comprehensive Multimodal Retrieval
by: Jian, Weijian, et al.
Published: (2025)
by: Jian, Weijian, et al.
Published: (2025)
ProteinAE: Protein Diffusion Autoencoders for Structure Encoding
by: Li, Shaoning, et al.
Published: (2025)
by: Li, Shaoning, et al.
Published: (2025)
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
by: Zheng, Dian, et al.
Published: (2026)
by: Zheng, Dian, et al.
Published: (2026)
LMM-Det: Make Large Multimodal Models Excel in Object Detection
by: Li, Jincheng, et al.
Published: (2025)
by: Li, Jincheng, et al.
Published: (2025)
FG-CLIP: Fine-Grained Visual and Textual Alignment
by: Xie, Chunyu, et al.
Published: (2025)
by: Xie, Chunyu, et al.
Published: (2025)
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
by: Li, Teng, et al.
Published: (2026)
by: Li, Teng, et al.
Published: (2026)
SVD-AE: Simple Autoencoders for Collaborative Filtering
by: Hong, Seoyoung, et al.
Published: (2024)
by: Hong, Seoyoung, et al.
Published: (2024)
Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task
by: Wang, Jing, et al.
Published: (2024)
by: Wang, Jing, et al.
Published: (2024)
KilonovAE: Exploring Kilonova Spectral Features with Autoencoders
by: Ford, N. M., et al.
Published: (2023)
by: Ford, N. M., et al.
Published: (2023)
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
by: Xie, Chunyu, et al.
Published: (2025)
by: Xie, Chunyu, et al.
Published: (2025)
Understanding Internal Representations of Recommendation Models with Sparse Autoencoders
by: Wang, Jiayin, et al.
Published: (2024)
by: Wang, Jiayin, et al.
Published: (2024)
AE SemRL: Learning Semantic Association Rules with Autoencoders
by: Karabulut, Erkan, et al.
Published: (2024)
by: Karabulut, Erkan, et al.
Published: (2024)
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing
by: Zhang, Shilong, et al.
Published: (2025)
by: Zhang, Shilong, et al.
Published: (2025)
Improved Baselines with Representation Autoencoders
by: Singh, Jaskirat, et al.
Published: (2026)
by: Singh, Jaskirat, et al.
Published: (2026)
RSAttAE: An Information-Aware Attention-based Autoencoder Recommender System
by: Taromi, Amirhossein Dadashzadeh, et al.
Published: (2025)
by: Taromi, Amirhossein Dadashzadeh, et al.
Published: (2025)
StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure
by: Opper, Mattia, et al.
Published: (2023)
by: Opper, Mattia, et al.
Published: (2023)
HaHeAE: Learning Generalisable Joint Representations of Human Hand and Head Movements in Extended Reality
by: Hu, Zhiming, et al.
Published: (2024)
by: Hu, Zhiming, et al.
Published: (2024)
Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection
by: Zhang, Ye, et al.
Published: (2024)
by: Zhang, Ye, et al.
Published: (2024)
Research on the Load Bearing and Impact Resistance of a Novel Structure Exhibiting Both Positive and Negative Poisson’s Ratios
by: Xidong Zhang, et al.
Published: (2024)
by: Xidong Zhang, et al.
Published: (2024)
DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing
by: Xie, Chenxi, et al.
Published: (2025)
by: Xie, Chenxi, et al.
Published: (2025)
An injectable pH-responsive marine polysaccharide hydrogel (AE&LF@pOA) for sequential therapy of infected diabetic wounds.
by: Zhao, Meiyue, et al.
Published: (2026)
by: Zhao, Meiyue, et al.
Published: (2026)
TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders
by: Cheng, Mingyue, et al.
Published: (2023)
by: Cheng, Mingyue, et al.
Published: (2023)
AutoWeather4D: Autonomous Driving Video Weather Conversion via G-Buffer Dual-Pass Editing
by: Liu, Tianyu, et al.
Published: (2026)
by: Liu, Tianyu, et al.
Published: (2026)
threewater-dot/MvAE: MvAE
by: threewater-dot
Published: (2026)
by: threewater-dot
Published: (2026)
AE-ViT: Token Enhancement for Vision Transformers via CNN-Based Autoencoder Ensembles
by: AIRCC
Published: (2025)
by: AIRCC
Published: (2025)
Functional Autoencoder for Smoothing and Representation Learning
by: Wu, Sidi, et al.
Published: (2024)
by: Wu, Sidi, et al.
Published: (2024)
Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism
by: Lv, Haotian, et al.
Published: (2025)
by: Lv, Haotian, et al.
Published: (2025)
UniM$^2$AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
by: Zou, Jian, et al.
Published: (2023)
by: Zou, Jian, et al.
Published: (2023)
Similar Items
-
HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation
by: Cheng, Bo, et al.
Published: (2024) -
Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities
by: Liu, Shanyuan, et al.
Published: (2023) -
NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow Transformers
by: Ma, Yuhang, et al.
Published: (2025) -
CTA-Flux: Integrating Chinese Cultural Semantics into High-Quality English Text-to-Image Communities
by: Gong, Yue, et al.
Published: (2025) -
NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion Transformer
by: Liu, Shanyuan, et al.
Published: (2025)