:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Jianhao, Zhang, Xiaofeng, Friedrich, Felix, Beltran-Velez, Nicolas, Hall, Melissa, Askari-Hemmat, Reyhane, Han, Xiaochuang, Ballas, Nicolas, Drozdzal, Michal, Romero-Soriano, Adriana
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.10553
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving the Physics of Video Generation with VJEPA-2 Reward Signal
by: Yuan, Jianhao, et al.
Published: (2025)

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
by: Hemmat, Reyhane Askari, et al.
Published: (2024)

Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)

Increasing the Utility of Synthetic Images through Chamfer Guidance
by: Dall'Asen, Nicola, et al.
Published: (2025)

Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)

Improving the Scaling Laws of Synthetic Data with Deliberate Practice
by: Askari-Hemmat, Reyhane, et al.
Published: (2025)

On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2024)

DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
by: Hall, Melissa, et al.
Published: (2023)

The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models
by: Xiaofeng, Zhang, et al.
Published: (2025)

Unified Text-Image Generation with Weakness-Targeted Post-Training
by: Chen, Jiahui, et al.
Published: (2026)

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
by: Hu, Yushi, et al.
Published: (2025)

Why Less is More (Sometimes): A Theory of Data Curation
by: Dohmatob, Elvis, et al.
Published: (2025)

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
by: Hall, Melissa, et al.
Published: (2024)

EvalGIM: A Library for Evaluating Generative Image Models
by: Hall, Melissa, et al.
Published: (2024)

Boosting Latent Diffusion with Perceptual Objectives
by: Berrada, Tariq, et al.
Published: (2024)

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
by: Assouel, Rim, et al.
Published: (2026)

Entropy Rectifying Guidance for Diffusion and Flow Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2025)

Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024)

Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)

Learning Latent Action World Models In The Wild
by: Garrido, Quentin, et al.
Published: (2026)

QGen: On the Ability to Generalize in Quantization Aware Training
by: AskariHemmat, MohammadHossein, et al.
Published: (2024)

Improving Text-to-Image Consistency via Automatic Prompt Optimization
by: Mañas, Oscar, et al.
Published: (2024)

Controlling Multimodal LLMs via Reward-guided Decoding
by: Mañas, Oscar, et al.
Published: (2025)

Hierarchical Planning with Latent World Models
by: Zhang, Wancong, et al.
Published: (2026)

What makes a good metric? Evaluating automatic metrics for text-to-image consistency
by: Ross, Candace, et al.
Published: (2024)

VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
by: Lin, Han, et al.
Published: (2024)

Learning and Leveraging World Models in Visual Representation Learning
by: Garrido, Quentin, et al.
Published: (2024)

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
by: Teotia, Revant, et al.
Published: (2025)

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning
by: Ballas, Aristotelis, et al.
Published: (2026)

TV2TV: A Unified Framework for Interleaved Language and Video Generation
by: Han, Xiaochuang, et al.
Published: (2025)

Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
by: Didolkar, Aniket, et al.
Published: (2025)

Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
by: Jesson, Andrew, et al.
Published: (2024)

Delta-Audit: Explaining What Changes When Models Change
by: Hemmat, Arshia, et al.
Published: (2025)

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
by: Krojer, Benno, et al.
Published: (2025)

Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
by: Domingo-Enrich, Carles, et al.
Published: (2024)

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models
by: Hemmat, Arshia, et al.
Published: (2024)

Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density
by: Balestriero, Randall, et al.
Published: (2025)

CONTEMPORARY HERMENEUTICS AND THE ROLE OF THE SELF IN TRANSLATION
by: Amrollah Hemmat
Published: (2009)

Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)

David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
by: Han, Xiaochuang, et al.
Published: (2023)