Guardado en:
| Autores principales: | Assouel, Rim, Astolfi, Pietro, Bordes, Florian, Drozdzal, Michal, Romero-Soriano, Adriana |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2502.14113 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
por: Assouel, Rim, et al.
Publicado: (2026)
por: Assouel, Rim, et al.
Publicado: (2026)
Feedback-guided Data Synthesis for Imbalanced Classification
por: Hemmat, Reyhane Askari, et al.
Publicado: (2023)
por: Hemmat, Reyhane Askari, et al.
Publicado: (2023)
Consistency-diversity-realism Pareto fronts of conditional image generative models
por: Astolfi, Pietro, et al.
Publicado: (2024)
por: Astolfi, Pietro, et al.
Publicado: (2024)
Binding Visual Features Point by Point
por: Haputhanthri, Udith, et al.
Publicado: (2026)
por: Haputhanthri, Udith, et al.
Publicado: (2026)
Multi-Modal Language Models as Text-to-Image Model Evaluators
por: Chen, Jiahui, et al.
Publicado: (2025)
por: Chen, Jiahui, et al.
Publicado: (2025)
Entropy Rectifying Guidance for Diffusion and Flow Models
por: Ifriqi, Tariq Berrada, et al.
Publicado: (2025)
por: Ifriqi, Tariq Berrada, et al.
Publicado: (2025)
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
por: Ifriqi, Tariq Berrada, et al.
Publicado: (2024)
por: Ifriqi, Tariq Berrada, et al.
Publicado: (2024)
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
por: Urbanek, Jack, et al.
Publicado: (2023)
por: Urbanek, Jack, et al.
Publicado: (2023)
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
por: Hemmat, Reyhane Askari, et al.
Publicado: (2024)
por: Hemmat, Reyhane Askari, et al.
Publicado: (2024)
Controlling Multimodal LLMs via Reward-guided Decoding
por: Mañas, Oscar, et al.
Publicado: (2025)
por: Mañas, Oscar, et al.
Publicado: (2025)
Improving Text-to-Image Consistency via Automatic Prompt Optimization
por: Mañas, Oscar, et al.
Publicado: (2024)
por: Mañas, Oscar, et al.
Publicado: (2024)
Boosting Latent Diffusion with Perceptual Objectives
por: Berrada, Tariq, et al.
Publicado: (2024)
por: Berrada, Tariq, et al.
Publicado: (2024)
The Intricate Dance of Prompt Complexity, Quality, Diversity, and Consistency in T2I Models
por: Xiaofeng, Zhang, et al.
Publicado: (2025)
por: Xiaofeng, Zhang, et al.
Publicado: (2025)
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
por: Kim, Dahun, et al.
Publicado: (2023)
por: Kim, Dahun, et al.
Publicado: (2023)
Augmented Conditioning Is Enough For Effective Training Image Generation
por: Chen, Jiahui, et al.
Publicado: (2025)
por: Chen, Jiahui, et al.
Publicado: (2025)
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
por: Song, Chull Hwan, et al.
Publicado: (2024)
por: Song, Chull Hwan, et al.
Publicado: (2024)
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
por: Hall, Melissa, et al.
Publicado: (2023)
por: Hall, Melissa, et al.
Publicado: (2023)
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
por: Bendikas, Rokas, et al.
Publicado: (2025)
por: Bendikas, Rokas, et al.
Publicado: (2025)
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
por: Zhu, Bin, et al.
Publicado: (2023)
por: Zhu, Bin, et al.
Publicado: (2023)
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
por: Askari-Hemmat, Reyhane, et al.
Publicado: (2025)
por: Askari-Hemmat, Reyhane, et al.
Publicado: (2025)
SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI
por: Liu, Zhiyang, et al.
Publicado: (2025)
por: Liu, Zhiyang, et al.
Publicado: (2025)
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
por: Hall, Melissa, et al.
Publicado: (2024)
por: Hall, Melissa, et al.
Publicado: (2024)
Increasing the Utility of Synthetic Images through Chamfer Guidance
por: Dall'Asen, Nicola, et al.
Publicado: (2025)
por: Dall'Asen, Nicola, et al.
Publicado: (2025)
Learning Physical Dynamics for Object-centric Visual Prediction
por: Xu, Huilin, et al.
Publicado: (2024)
por: Xu, Huilin, et al.
Publicado: (2024)
Unified Text-Image Generation with Weakness-Targeted Post-Training
por: Chen, Jiahui, et al.
Publicado: (2026)
por: Chen, Jiahui, et al.
Publicado: (2026)
A Framework for Evaluating Zero-Shot Image Generation in Concept-based Explainability
por: Astolfi, Giacomo, et al.
Publicado: (2026)
por: Astolfi, Giacomo, et al.
Publicado: (2026)
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?
por: Li, Yihao, et al.
Publicado: (2025)
por: Li, Yihao, et al.
Publicado: (2025)
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
por: Liu, Yufang, et al.
Publicado: (2024)
por: Liu, Yufang, et al.
Publicado: (2024)
GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning
por: Feizi, Aarash, et al.
Publicado: (2024)
por: Feizi, Aarash, et al.
Publicado: (2024)
Visual symbolic mechanisms: Emergent symbol processing in vision language models
por: Assouel, Rim, et al.
Publicado: (2025)
por: Assouel, Rim, et al.
Publicado: (2025)
ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images
por: Naik, Prithviraj Purushottam, et al.
Publicado: (2024)
por: Naik, Prithviraj Purushottam, et al.
Publicado: (2024)
SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment
por: Lu, Wenbo
Publicado: (2025)
por: Lu, Wenbo
Publicado: (2025)
High-fidelity Person-centric Subject-to-Image Synthesis
por: Wang, Yibin, et al.
Publicado: (2023)
por: Wang, Yibin, et al.
Publicado: (2023)
Non-Contrastive Vision-Language Learning with Predictive Embedding Alignment
por: Kuhn, Lukas, et al.
Publicado: (2026)
por: Kuhn, Lukas, et al.
Publicado: (2026)
Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning
por: Qian, Jiahe, et al.
Publicado: (2025)
por: Qian, Jiahe, et al.
Publicado: (2025)
Multimodal Contrastive Pretraining of CBCT and IOS for Enhanced Tooth Segmentation
por: Son, Moo Hyun, et al.
Publicado: (2025)
por: Son, Moo Hyun, et al.
Publicado: (2025)
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
por: Lavoie, Samuel, et al.
Publicado: (2024)
por: Lavoie, Samuel, et al.
Publicado: (2024)
Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining
por: Molino, Daniele, et al.
Publicado: (2025)
por: Molino, Daniele, et al.
Publicado: (2025)
Successes and Limitations of Object-centric Models at Compositional Generalisation
por: Montero, Milton L., et al.
Publicado: (2024)
por: Montero, Milton L., et al.
Publicado: (2024)
CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition
por: Stilz, Florian, et al.
Publicado: (2026)
por: Stilz, Florian, et al.
Publicado: (2026)
Ejemplares similares
-
PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
por: Assouel, Rim, et al.
Publicado: (2026) -
Feedback-guided Data Synthesis for Imbalanced Classification
por: Hemmat, Reyhane Askari, et al.
Publicado: (2023) -
Consistency-diversity-realism Pareto fronts of conditional image generative models
por: Astolfi, Pietro, et al.
Publicado: (2024) -
Binding Visual Features Point by Point
por: Haputhanthri, Udith, et al.
Publicado: (2026) -
Multi-Modal Language Models as Text-to-Image Model Evaluators
por: Chen, Jiahui, et al.
Publicado: (2025)