Saved in:
| Main Authors: | Xia, Guoxuan, Hanspal, Harleen, Tudosiu, Petru-Daniel, Zhang, Shifeng, Parisot, Sarah |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.15724 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
by: Dutt, Raman, et al.
Published: (2025)
by: Dutt, Raman, et al.
Published: (2025)
Generating Compositional Scenes via Text-to-image RGBA Instance Generation
by: Fontanella, Alessandro, et al.
Published: (2024)
by: Fontanella, Alessandro, et al.
Published: (2024)
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
by: Tudosiu, Petru-Daniel, et al.
Published: (2024)
by: Tudosiu, Petru-Daniel, et al.
Published: (2024)
SceneForge: Structured World Supervision from 3D Interventions
by: Li, Jizhizi, et al.
Published: (2026)
by: Li, Jizhizi, et al.
Published: (2026)
An Extended Evaluation Split for DeepSpaceYoloDataset
by: Parisot, Olivier
Published: (2026)
by: Parisot, Olivier
Published: (2026)
Dynamic Mixture-of-Experts for Visual Autoregressive Model
by: Vincenti, Jort, et al.
Published: (2025)
by: Vincenti, Jort, et al.
Published: (2025)
Improving Object Detection via Local-global Contrastive Learning
by: Triantafyllidou, Danai, et al.
Published: (2024)
by: Triantafyllidou, Danai, et al.
Published: (2024)
Detecting streaks in smart telescopes images with Deep Learning
by: Parisot, Olivier, et al.
Published: (2025)
by: Parisot, Olivier, et al.
Published: (2025)
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
by: Franchi, Gianni, et al.
Published: (2024)
by: Franchi, Gianni, et al.
Published: (2024)
Robustness analysis of Deep Sky Objects detection models on HPC
by: Parisot, Olivier, et al.
Published: (2025)
by: Parisot, Olivier, et al.
Published: (2025)
HemBLIP: A Vision-Language Model for Interpretable Leukemia Cell Morphology Analysis
by: van Logtestijn, Julie, et al.
Published: (2026)
by: van Logtestijn, Julie, et al.
Published: (2026)
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
by: Wang, Jiahao, et al.
Published: (2025)
by: Wang, Jiahao, et al.
Published: (2025)
Enhancing Shape Perception and Segmentation Consistency for Industrial Image Inspection
by: Mao, Guoxuan, et al.
Published: (2025)
by: Mao, Guoxuan, et al.
Published: (2025)
Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look
by: Zhang, Yong, et al.
Published: (2024)
by: Zhang, Yong, et al.
Published: (2024)
Test-time Controllable Image Generation by Explicit Spatial Constraint Enforcement
by: Zhang, Z., et al.
Published: (2025)
by: Zhang, Z., et al.
Published: (2025)
Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
by: Ren, Huan, et al.
Published: (2025)
by: Ren, Huan, et al.
Published: (2025)
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
by: Image Team, et al.
Published: (2025)
by: Image Team, et al.
Published: (2025)
RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation
by: Pang, Lexi, et al.
Published: (2025)
by: Pang, Lexi, et al.
Published: (2025)
Spatial-Aware Latent Initialization for Controllable Image Generation
by: Sun, Wenqiang, et al.
Published: (2024)
by: Sun, Wenqiang, et al.
Published: (2024)
LesionTABE: Equitable AI for Skin Lesion Detection
by: Diaz, Rocio Mexia, et al.
Published: (2026)
by: Diaz, Rocio Mexia, et al.
Published: (2026)
PG-ControlNet: A Physics-Guided ControlNet for Generative Spatially Varying Image Deblurring
by: Motorcu, Hakki, et al.
Published: (2025)
by: Motorcu, Hakki, et al.
Published: (2025)
Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation
by: Yan, Mingfu, et al.
Published: (2023)
by: Yan, Mingfu, et al.
Published: (2023)
Towards Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It
by: Xia, Guoxuan, et al.
Published: (2024)
by: Xia, Guoxuan, et al.
Published: (2024)
Practical No-box Adversarial Attacks with Training-free Hybrid Image Transformation
by: Zhang, Qilong, et al.
Published: (2022)
by: Zhang, Qilong, et al.
Published: (2022)
SpatialLock: Precise Spatial Control in Text-to-Image Synthesis
by: Liu, Biao, et al.
Published: (2025)
by: Liu, Biao, et al.
Published: (2025)
Component Adaptive Clustering for Generalized Category Discovery
by: Yan, Mingfu, et al.
Published: (2025)
by: Yan, Mingfu, et al.
Published: (2025)
FSATFusion: Frequency-Spatial Attention Transformer for Infrared and Visible Image Fusion
by: Zhang, Tianpei, et al.
Published: (2025)
by: Zhang, Tianpei, et al.
Published: (2025)
Intelligent Image Search Algorithms Fusing Visual Large Models
by: Wang, Kehan, et al.
Published: (2025)
by: Wang, Kehan, et al.
Published: (2025)
StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation
by: He, Yinxi, et al.
Published: (2026)
by: He, Yinxi, et al.
Published: (2026)
Layout-Guided Controllable Pathology Image Generation with In-Context Diffusion Transformers
by: Shou, Yuntao, et al.
Published: (2026)
by: Shou, Yuntao, et al.
Published: (2026)
Step1X-Edit: A Practical Framework for General Image Editing
by: Liu, Shiyu, et al.
Published: (2025)
by: Liu, Shiyu, et al.
Published: (2025)
GenSpace: Benchmarking Spatially-Aware Image Generation
by: Wang, Zehan, et al.
Published: (2025)
by: Wang, Zehan, et al.
Published: (2025)
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
by: Zhang, Zhengqiang, et al.
Published: (2025)
by: Zhang, Zhengqiang, et al.
Published: (2025)
ControlLight: Towards Controllable, Consistent, and Generalizable Low-Light Enhancement
by: Yang, Yufeng, et al.
Published: (2026)
by: Yang, Yufeng, et al.
Published: (2026)
TIPS: Text-Image Pretraining with Spatial awareness
by: Maninis, Kevis-Kokitsi, et al.
Published: (2024)
by: Maninis, Kevis-Kokitsi, et al.
Published: (2024)
Taming Transformer for Emotion-Controllable Talking Face Generation
by: Zhang, Ziqi, et al.
Published: (2025)
by: Zhang, Ziqi, et al.
Published: (2025)
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
by: Huang, Shaofei, et al.
Published: (2025)
by: Huang, Shaofei, et al.
Published: (2025)
Recursive Generalization Transformer for Image Super-Resolution
by: Chen, Zheng, et al.
Published: (2023)
by: Chen, Zheng, et al.
Published: (2023)
MagicFight: Personalized Martial Arts Combat Video Generation
by: Huang, Jiancheng, et al.
Published: (2026)
by: Huang, Jiancheng, et al.
Published: (2026)
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
by: Liang, Guang, et al.
Published: (2025)
by: Liang, Guang, et al.
Published: (2025)
Similar Items
-
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities
by: Dutt, Raman, et al.
Published: (2025) -
Generating Compositional Scenes via Text-to-image RGBA Instance Generation
by: Fontanella, Alessandro, et al.
Published: (2024) -
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
by: Tudosiu, Petru-Daniel, et al.
Published: (2024) -
SceneForge: Structured World Supervision from 3D Interventions
by: Li, Jizhizi, et al.
Published: (2026) -
An Extended Evaluation Split for DeepSpaceYoloDataset
by: Parisot, Olivier
Published: (2026)