Saved in:
| Main Authors: | Xiaofeng, Zhang, Courville, Aaron, Drozdzal, Michal, Romero-Soriano, Adriana |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.19557 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Text-to-Image Consistency via Automatic Prompt Optimization
by: Mañas, Oscar, et al.
Published: (2024)
by: Mañas, Oscar, et al.
Published: (2024)
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
by: Assouel, Rim, et al.
Published: (2026)
by: Assouel, Rim, et al.
Published: (2026)
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
by: Hall, Melissa, et al.
Published: (2023)
by: Hall, Melissa, et al.
Published: (2023)
Entropy Rectifying Guidance for Diffusion and Flow Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2025)
by: Ifriqi, Tariq Berrada, et al.
Published: (2025)
Increasing the Utility of Synthetic Images through Chamfer Guidance
by: Dall'Asen, Nicola, et al.
Published: (2025)
by: Dall'Asen, Nicola, et al.
Published: (2025)
Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)
by: Assouel, Rim, et al.
Published: (2025)
Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024)
by: Astolfi, Pietro, et al.
Published: (2024)
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
by: Hall, Melissa, et al.
Published: (2024)
by: Hall, Melissa, et al.
Published: (2024)
Inference-time Physics Alignment of Video Generative Models with Latent World Models
by: Yuan, Jianhao, et al.
Published: (2026)
by: Yuan, Jianhao, et al.
Published: (2026)
Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)
by: Hemmat, Reyhane Askari, et al.
Published: (2023)
Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
Improving the Physics of Video Generation with VJEPA-2 Reward Signal
by: Yuan, Jianhao, et al.
Published: (2025)
by: Yuan, Jianhao, et al.
Published: (2025)
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
by: Hemmat, Reyhane Askari, et al.
Published: (2024)
by: Hemmat, Reyhane Askari, et al.
Published: (2024)
Controlling Multimodal LLMs via Reward-guided Decoding
by: Mañas, Oscar, et al.
Published: (2025)
by: Mañas, Oscar, et al.
Published: (2025)
Boosting Latent Diffusion with Perceptual Objectives
by: Berrada, Tariq, et al.
Published: (2024)
by: Berrada, Tariq, et al.
Published: (2024)
Bias Analysis in Unconditional Image Generative Models
by: Zhang, Xiaofeng, et al.
Published: (2025)
by: Zhang, Xiaofeng, et al.
Published: (2025)
Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency
by: Sun, Shangkun, et al.
Published: (2025)
by: Sun, Shangkun, et al.
Published: (2025)
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2024)
by: Ifriqi, Tariq Berrada, et al.
Published: (2024)
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
by: Teotia, Revant, et al.
Published: (2025)
by: Teotia, Revant, et al.
Published: (2025)
Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
by: Lavoie, Samuel, et al.
Published: (2025)
by: Lavoie, Samuel, et al.
Published: (2025)
EvalGIM: A Library for Evaluating Generative Image Models
by: Hall, Melissa, et al.
Published: (2024)
by: Hall, Melissa, et al.
Published: (2024)
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)
by: Lavoie, Samuel, et al.
Published: (2024)
Augmented Conditioning Is Enough For Effective Training Image Generation
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
by: Urbanek, Jack, et al.
Published: (2023)
by: Urbanek, Jack, et al.
Published: (2023)
Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning
by: Ma, Xu, et al.
Published: (2026)
by: Ma, Xu, et al.
Published: (2026)
ConsiStyle: Style Diversity in Training-Free Consistent T2I Generation
by: Mazuz, Yohai, et al.
Published: (2025)
by: Mazuz, Yohai, et al.
Published: (2025)
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
by: Nguyen, Bac, et al.
Published: (2024)
by: Nguyen, Bac, et al.
Published: (2024)
Harnessing Joint Rain-/Detail-aware Representations to Eliminate Intricate Rains
by: Ran, Wu, et al.
Published: (2024)
by: Ran, Wu, et al.
Published: (2024)
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
by: Vani, Ankit, et al.
Published: (2024)
by: Vani, Ankit, et al.
Published: (2024)
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
by: Liu, Qin, et al.
Published: (2024)
by: Liu, Qin, et al.
Published: (2024)
FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
Consistency-guided Prompt Learning for Vision-Language Models
by: Roy, Shuvendu, et al.
Published: (2023)
by: Roy, Shuvendu, et al.
Published: (2023)
MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
by: Yang, Kaixing, et al.
Published: (2025)
by: Yang, Kaixing, et al.
Published: (2025)
GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning
by: Feizi, Aarash, et al.
Published: (2024)
by: Feizi, Aarash, et al.
Published: (2024)
OpenDance: Multimodal Controllable 3D Dance Generation with Large-scale Internet Data
by: Zhang, Jinlu, et al.
Published: (2025)
by: Zhang, Jinlu, et al.
Published: (2025)
Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
by: He, Wen-Jue, et al.
Published: (2025)
by: He, Wen-Jue, et al.
Published: (2025)
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
by: Ren, Weiming, et al.
Published: (2024)
by: Ren, Weiming, et al.
Published: (2024)
Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes
by: Zhao, Xiaoqi, et al.
Published: (2024)
by: Zhao, Xiaoqi, et al.
Published: (2024)
Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching
by: Singhania, Aditi, et al.
Published: (2025)
by: Singhania, Aditi, et al.
Published: (2025)
Consistency-Preserving Diverse Video Generation
by: Liu, Xinshuang, et al.
Published: (2026)
by: Liu, Xinshuang, et al.
Published: (2026)
Similar Items
-
Improving Text-to-Image Consistency via Automatic Prompt Optimization
by: Mañas, Oscar, et al.
Published: (2024) -
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
by: Assouel, Rim, et al.
Published: (2026) -
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
by: Hall, Melissa, et al.
Published: (2023) -
Entropy Rectifying Guidance for Diffusion and Flow Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2025) -
Increasing the Utility of Synthetic Images through Chamfer Guidance
by: Dall'Asen, Nicola, et al.
Published: (2025)