Saved in:
| Main Authors: | Yuan, Jianhao, Zhang, Xiaofeng, Friedrich, Felix, Beltran-Velez, Nicolas, Hall, Melissa, Askari-Hemmat, Reyhane, Han, Xiaochuang, Ballas, Nicolas, Drozdzal, Michal, Romero-Soriano, Adriana |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.10553 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving the Physics of Video Generation with VJEPA-2 Reward Signal
by: Yuan, Jianhao, et al.
Published: (2025)
by: Yuan, Jianhao, et al.
Published: (2025)
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
by: Hemmat, Reyhane Askari, et al.
Published: (2024)
by: Hemmat, Reyhane Askari, et al.
Published: (2024)
Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025)
by: Chen, Jiahui, et al.
Published: (2025)
Increasing the Utility of Synthetic Images through Chamfer Guidance
by: Dall'Asen, Nicola, et al.
Published: (2025)
by: Dall'Asen, Nicola, et al.
Published: (2025)
Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)
by: Hemmat, Reyhane Askari, et al.
Published: (2023)
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
by: Askari-Hemmat, Reyhane, et al.
Published: (2025)
by: Askari-Hemmat, Reyhane, et al.
Published: (2025)
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2024)
by: Ifriqi, Tariq Berrada, et al.
Published: (2024)
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
by: Hall, Melissa, et al.
Published: (2023)
by: Hall, Melissa, et al.
Published: (2023)
The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models
by: Xiaofeng, Zhang, et al.
Published: (2025)
by: Xiaofeng, Zhang, et al.
Published: (2025)
Unified Text-Image Generation with Weakness-Targeted Post-Training
by: Chen, Jiahui, et al.
Published: (2026)
by: Chen, Jiahui, et al.
Published: (2026)
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image
by: Hu, Yushi, et al.
Published: (2025)
by: Hu, Yushi, et al.
Published: (2025)
Why Less is More (Sometimes): A Theory of Data Curation
by: Dohmatob, Elvis, et al.
Published: (2025)
by: Dohmatob, Elvis, et al.
Published: (2025)
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
by: Hall, Melissa, et al.
Published: (2024)
by: Hall, Melissa, et al.
Published: (2024)
EvalGIM: A Library for Evaluating Generative Image Models
by: Hall, Melissa, et al.
Published: (2024)
by: Hall, Melissa, et al.
Published: (2024)
Boosting Latent Diffusion with Perceptual Objectives
by: Berrada, Tariq, et al.
Published: (2024)
by: Berrada, Tariq, et al.
Published: (2024)
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
by: Assouel, Rim, et al.
Published: (2026)
by: Assouel, Rim, et al.
Published: (2026)
Entropy Rectifying Guidance for Diffusion and Flow Models
by: Ifriqi, Tariq Berrada, et al.
Published: (2025)
by: Ifriqi, Tariq Berrada, et al.
Published: (2025)
Consistency-diversity-realism Pareto fronts of conditional image generative models
by: Astolfi, Pietro, et al.
Published: (2024)
by: Astolfi, Pietro, et al.
Published: (2024)
Object-centric Binding in Contrastive Language-Image Pretraining
by: Assouel, Rim, et al.
Published: (2025)
by: Assouel, Rim, et al.
Published: (2025)
Learning Latent Action World Models In The Wild
by: Garrido, Quentin, et al.
Published: (2026)
by: Garrido, Quentin, et al.
Published: (2026)
QGen: On the Ability to Generalize in Quantization Aware Training
by: AskariHemmat, MohammadHossein, et al.
Published: (2024)
by: AskariHemmat, MohammadHossein, et al.
Published: (2024)
Improving Text-to-Image Consistency via Automatic Prompt Optimization
by: Mañas, Oscar, et al.
Published: (2024)
by: Mañas, Oscar, et al.
Published: (2024)
Controlling Multimodal LLMs via Reward-guided Decoding
by: Mañas, Oscar, et al.
Published: (2025)
by: Mañas, Oscar, et al.
Published: (2025)
Hierarchical Planning with Latent World Models
by: Zhang, Wancong, et al.
Published: (2026)
by: Zhang, Wancong, et al.
Published: (2026)
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
by: Ross, Candace, et al.
Published: (2024)
by: Ross, Candace, et al.
Published: (2024)
VEDIT: Latent Prediction Architecture For Procedural Video Representation Learning
by: Lin, Han, et al.
Published: (2024)
by: Lin, Han, et al.
Published: (2024)
Learning and Leveraging World Models in Visual Representation Learning
by: Garrido, Quentin, et al.
Published: (2024)
by: Garrido, Quentin, et al.
Published: (2024)
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
by: Teotia, Revant, et al.
Published: (2025)
by: Teotia, Revant, et al.
Published: (2025)
Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning
by: Ballas, Aristotelis, et al.
Published: (2026)
by: Ballas, Aristotelis, et al.
Published: (2026)
TV2TV: A Unified Framework for Interleaved Language and Video Generation
by: Han, Xiaochuang, et al.
Published: (2025)
by: Han, Xiaochuang, et al.
Published: (2025)
Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors
by: Didolkar, Aniket, et al.
Published: (2025)
by: Didolkar, Aniket, et al.
Published: (2025)
Can Generative AI Solve Your In-Context Learning Problem? A Martingale Perspective
by: Jesson, Andrew, et al.
Published: (2024)
by: Jesson, Andrew, et al.
Published: (2024)
Delta-Audit: Explaining What Changes When Models Change
by: Hemmat, Arshia, et al.
Published: (2025)
by: Hemmat, Arshia, et al.
Published: (2025)
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
by: Krojer, Benno, et al.
Published: (2025)
by: Krojer, Benno, et al.
Published: (2025)
Adjoint Matching: Fine-tuning Flow and Diffusion Generative Models with Memoryless Stochastic Optimal Control
by: Domingo-Enrich, Carles, et al.
Published: (2024)
by: Domingo-Enrich, Carles, et al.
Published: (2024)
Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models
by: Hemmat, Arshia, et al.
Published: (2024)
by: Hemmat, Arshia, et al.
Published: (2024)
Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density
by: Balestriero, Randall, et al.
Published: (2025)
by: Balestriero, Randall, et al.
Published: (2025)
CONTEMPORARY HERMENEUTICS AND THE ROLE OF THE SELF IN TRANSLATION
by: Amrollah Hemmat
Published: (2009)
by: Amrollah Hemmat
Published: (2009)
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
by: Lavoie, Samuel, et al.
Published: (2024)
by: Lavoie, Samuel, et al.
Published: (2024)
David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
by: Han, Xiaochuang, et al.
Published: (2023)
by: Han, Xiaochuang, et al.
Published: (2023)
Similar Items
-
Improving the Physics of Video Generation with VJEPA-2 Reward Signal
by: Yuan, Jianhao, et al.
Published: (2025) -
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
by: Hemmat, Reyhane Askari, et al.
Published: (2024) -
Multi-Modal Language Models as Text-to-Image Model Evaluators
by: Chen, Jiahui, et al.
Published: (2025) -
Increasing the Utility of Synthetic Images through Chamfer Guidance
by: Dall'Asen, Nicola, et al.
Published: (2025) -
Feedback-guided Data Synthesis for Imbalanced Classification
by: Hemmat, Reyhane Askari, et al.
Published: (2023)