Saved in:
| Main Authors: | Jeong, Yujin, Uselis, Arnas, Oh, Seong Joon, Rohrbach, Anna |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.17955 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
When Do Diffusion Models learn to Generate Multiple Objects?
by: Jeong, Yujin, et al.
Published: (2026)
by: Jeong, Yujin, et al.
Published: (2026)
Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
by: Uselis, Arnas, et al.
Published: (2026)
by: Uselis, Arnas, et al.
Published: (2026)
On the rankability of visual embeddings
by: Sonthalia, Ankit, et al.
Published: (2025)
by: Sonthalia, Ankit, et al.
Published: (2025)
Half-Truths Break Similarity-Based Retrieval
by: Kargi, Bora, et al.
Published: (2026)
by: Kargi, Bora, et al.
Published: (2026)
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
by: Koishigarina, Darina, et al.
Published: (2025)
by: Koishigarina, Darina, et al.
Published: (2025)
How can embedding models bind concepts?
by: Uselis, Arnas, et al.
Published: (2026)
by: Uselis, Arnas, et al.
Published: (2026)
Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models
by: Morelli, Fabian, et al.
Published: (2026)
by: Morelli, Fabian, et al.
Published: (2026)
Intermediate Layer Classifiers for OOD generalization
by: Uselis, Arnas, et al.
Published: (2025)
by: Uselis, Arnas, et al.
Published: (2025)
Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models
by: Chen, Ziyuan, et al.
Published: (2026)
by: Chen, Ziyuan, et al.
Published: (2026)
Does Data Scaling Lead to Visual Compositional Generalization?
by: Uselis, Arnas, et al.
Published: (2025)
by: Uselis, Arnas, et al.
Published: (2025)
V$^2$Dial: Unification of Video and Visual Dialog via Multimodal Experts
by: Abdessaied, Adnen, et al.
Published: (2025)
by: Abdessaied, Adnen, et al.
Published: (2025)
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
by: Braun, Tobias, et al.
Published: (2024)
by: Braun, Tobias, et al.
Published: (2024)
Chrono: A Simple Blueprint for Representing Time in MLLMs
by: Rodriguez, Hector, et al.
Published: (2024)
by: Rodriguez, Hector, et al.
Published: (2024)
Enhancing Multi-Image Understanding through Delimiter Token Scaling
by: Lee, Minyoung, et al.
Published: (2026)
by: Lee, Minyoung, et al.
Published: (2026)
Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models
by: Oh, Hyun-Jic, et al.
Published: (2024)
by: Oh, Hyun-Jic, et al.
Published: (2024)
GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning
by: Kim, Nayeong, et al.
Published: (2025)
by: Kim, Nayeong, et al.
Published: (2025)
VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking
by: Rothermel, Mark, et al.
Published: (2026)
by: Rothermel, Mark, et al.
Published: (2026)
Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
by: Min, Seonghui, et al.
Published: (2024)
by: Min, Seonghui, et al.
Published: (2024)
Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles
by: Scimeca, Luca, et al.
Published: (2023)
by: Scimeca, Luca, et al.
Published: (2023)
HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
by: Zohrabi, Reihaneh, et al.
Published: (2026)
by: Zohrabi, Reihaneh, et al.
Published: (2026)
ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
by: Chun, Sanghyuk, et al.
Published: (2022)
by: Chun, Sanghyuk, et al.
Published: (2022)
OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
by: Hwang, Dongjun, et al.
Published: (2024)
by: Hwang, Dongjun, et al.
Published: (2024)
Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection
by: Zohrabi, Reihaneh, et al.
Published: (2025)
by: Zohrabi, Reihaneh, et al.
Published: (2025)
Pretrained Visual Uncertainties
by: Kirchhof, Michael, et al.
Published: (2024)
by: Kirchhof, Michael, et al.
Published: (2024)
MEME: Multi-entity & Evolving Memory Evaluation
by: Jung, Seokwon, et al.
Published: (2026)
by: Jung, Seokwon, et al.
Published: (2026)
Universal Algorithm-Implicit Learning
by: Woerner, Stefano, et al.
Published: (2026)
by: Woerner, Stefano, et al.
Published: (2026)
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
by: Mao, Yucheng, et al.
Published: (2025)
by: Mao, Yucheng, et al.
Published: (2025)
TestDG: Test-time Domain Generalization for Continual Test-time Adaptation
by: Lee, Sohyun, et al.
Published: (2025)
by: Lee, Sohyun, et al.
Published: (2025)
Are We Done with Object-Centric Learning?
by: Rubinstein, Alexander, et al.
Published: (2025)
by: Rubinstein, Alexander, et al.
Published: (2025)
Scalable Ensemble Diversification for OOD Generalization and Detection
by: Rubinstein, Alexander, et al.
Published: (2024)
by: Rubinstein, Alexander, et al.
Published: (2024)
SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring
by: Rodriguez, Hector G., et al.
Published: (2026)
by: Rodriguez, Hector G., et al.
Published: (2026)
ReCap: Lightweight Referential Grounding for Coherent Story Visualization
by: Arora, Aditya, et al.
Published: (2026)
by: Arora, Aditya, et al.
Published: (2026)
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
by: Oh, Youngmin, et al.
Published: (2024)
by: Oh, Youngmin, et al.
Published: (2024)
Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers
by: Busaranuvong, Palawat, et al.
Published: (2024)
by: Busaranuvong, Palawat, et al.
Published: (2024)
Explorer: Robust Collection of Interactable GUI Elements
by: Chaimalas, Iason, et al.
Published: (2025)
by: Chaimalas, Iason, et al.
Published: (2025)
Task-oriented Learnable Diffusion Timesteps for Universal Few-shot Learning of Dense Tasks
by: Oh, Changgyoon, et al.
Published: (2025)
by: Oh, Changgyoon, et al.
Published: (2025)
AVOID: The Adverse Visual Conditions Dataset with Obstacles for Driving Scene Understanding
by: Jeong, Jongoh, et al.
Published: (2025)
by: Jeong, Jongoh, et al.
Published: (2025)
Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier
by: Zhou, Yujie, et al.
Published: (2026)
by: Zhou, Yujie, et al.
Published: (2026)
VSC: Visual Search Compositional Text-to-Image Diffusion Model
by: Dat, Do Huu, et al.
Published: (2025)
by: Dat, Do Huu, et al.
Published: (2025)
Towards Understanding the Mechanisms of Classifier-Free Guidance
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
Similar Items
-
When Do Diffusion Models learn to Generate Multiple Objects?
by: Jeong, Yujin, et al.
Published: (2026) -
Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
by: Uselis, Arnas, et al.
Published: (2026) -
On the rankability of visual embeddings
by: Sonthalia, Ankit, et al.
Published: (2025) -
Half-Truths Break Similarity-Based Retrieval
by: Kargi, Bora, et al.
Published: (2026) -
CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
by: Koishigarina, Darina, et al.
Published: (2025)