:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xiaofeng, Zhang, Courville, Aaron, Drozdzal, Michal, Romero-Soriano, Adriana
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2510.19557
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Text-to-Image Consistency via Automatic Prompt Optimization
by: Mañas, Oscar, et al.
Published: (2024)

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
by: Assouel, Rim, et al.
Published: (2026)

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
by: Hall, Melissa, et al.
Published: (2023)

Entropy Rectifying Guidance for Diffusion and Flow Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2025)

Increasing the Utility of Synthetic Images through Chamfer Guidance
by: Dall'Asen, Nicola, et al.
Published: (2025)

Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)

Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024)

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
by: Hall, Melissa, et al.
Published: (2024)

Inference-time Physics Alignment of Video Generative Models with Latent World Models
by: Yuan, Jianhao, et al.
Published: (2026)

Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)

Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)

Improving the Physics of Video Generation with VJEPA-2 Reward Signal
by: Yuan, Jianhao, et al.
Published: (2025)

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
by: Hemmat, Reyhane Askari, et al.
Published: (2024)

Controlling Multimodal LLMs via Reward-guided Decoding
by: Mañas, Oscar, et al.
Published: (2025)

Boosting Latent Diffusion with Perceptual Objectives
by: Berrada, Tariq, et al.
Published: (2024)

Bias Analysis in Unconditional Image Generative Models
by: Zhang, Xiaofeng, et al.
Published: (2025)

Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency
by: Sun, Shangkun, et al.
Published: (2025)

On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2024)

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
by: Teotia, Revant, et al.
Published: (2025)

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models
by: Lavoie, Samuel, et al.
Published: (2025)

EvalGIM: A Library for Evaluating Generative Image Models
by: Hall, Melissa, et al.
Published: (2024)

Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)

Augmented Conditioning Is Enough For Effective Training Image Generation
by: Chen, Jiahui, et al.
Published: (2025)

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
by: Urbanek, Jack, et al.
Published: (2023)

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning
by: Ma, Xu, et al.
Published: (2026)

ConsiStyle: Style Diversity in Training-Free Consistent T2I Generation
by: Mazuz, Yohai, et al.
Published: (2025)

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
by: Nguyen, Bac, et al.
Published: (2024)

Harnessing Joint Rain-/Detail-aware Representations to Eliminate Intricate Rains
by: Ran, Wu, et al.
Published: (2024)

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
by: Vani, Ankit, et al.
Published: (2024)

Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
by: Liu, Qin, et al.
Published: (2024)

FlowerDance: MeanFlow for Efficient and Refined 3D Dance Generation
by: Yang, Kaixing, et al.
Published: (2025)

Consistency-guided Prompt Learning for Vision-Language Models
by: Roy, Shuvendu, et al.
Published: (2023)

MACE-Dance: Motion-Appearance Cascaded Experts for Music-Driven Dance Video Generation
by: Yang, Kaixing, et al.
Published: (2025)

GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning
by: Feizi, Aarash, et al.
Published: (2024)

OpenDance: Multimodal Controllable 3D Dance Generation with Large-scale Internet Data
by: Zhang, Jinlu, et al.
Published: (2025)

Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition
by: He, Wen-Jue, et al.
Published: (2025)

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
by: Ren, Weiming, et al.
Published: (2024)

Inspiring the Next Generation of Segment Anything Models: Comprehensively Evaluate SAM and SAM 2 with Diverse Prompts Towards Context-Dependent Concepts under Different Scenes
by: Zhao, Xiaoqi, et al.
Published: (2024)

Taming Identity Consistency and Prompt Diversity in Diffusion Models via Latent Concatenation and Masked Conditional Flow Matching
by: Singhania, Aditi, et al.
Published: (2025)

Consistency-Preserving Diverse Video Generation
by: Liu, Xinshuang, et al.
Published: (2026)