:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jeong, Yujin, Uselis, Arnas, Oh, Seong Joon, Rohrbach, Anna
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.17955
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

When Do Diffusion Models learn to Generate Multiple Objects?
by: Jeong, Yujin, et al.
Published: (2026)

Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models
by: Uselis, Arnas, et al.
Published: (2026)

On the rankability of visual embeddings
by: Sonthalia, Ankit, et al.
Published: (2025)

Half-Truths Break Similarity-Based Retrieval
by: Kargi, Bora, et al.
Published: (2026)

CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally
by: Koishigarina, Darina, et al.
Published: (2025)

How can embedding models bind concepts?
by: Uselis, Arnas, et al.
Published: (2026)

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models
by: Morelli, Fabian, et al.
Published: (2026)

Intermediate Layer Classifiers for OOD generalization
by: Uselis, Arnas, et al.
Published: (2025)

Tuning Just Enough: Lightweight Backdoor Attacks on Multi-Encoder Diffusion Models
by: Chen, Ziyuan, et al.
Published: (2026)

Does Data Scaling Lead to Visual Compositional Generalization?
by: Uselis, Arnas, et al.
Published: (2025)

V$^2$Dial: Unification of Video and Visual Dialog via Multimodal Experts
by: Abdessaied, Adnen, et al.
Published: (2025)

DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
by: Braun, Tobias, et al.
Published: (2024)

Chrono: A Simple Blueprint for Representing Time in MLLMs
by: Rodriguez, Hector, et al.
Published: (2024)

Enhancing Multi-Image Understanding through Delimiter Token Scaling
by: Lee, Minyoung, et al.
Published: (2026)

Controllable and Efficient Multi-Class Pathology Nuclei Data Augmentation using Text-Conditioned Diffusion Models
by: Oh, Hyun-Jic, et al.
Published: (2024)

GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning
by: Kim, Nayeong, et al.
Published: (2025)

VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking
by: Rothermel, Mark, et al.
Published: (2026)

Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
by: Min, Seonghui, et al.
Published: (2024)

Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles
by: Scimeca, Luca, et al.
Published: (2023)

HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
by: Zohrabi, Reihaneh, et al.
Published: (2026)

ECCV Caption: Correcting False Negatives by Collecting Machine-and-Human-verified Image-Caption Associations for MS-COCO
by: Chun, Sanghyuk, et al.
Published: (2022)

OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
by: Hwang, Dongjun, et al.
Published: (2024)

Spurious-Aware Prototype Refinement for Reliable Out-of-Distribution Detection
by: Zohrabi, Reihaneh, et al.
Published: (2025)

Pretrained Visual Uncertainties
by: Kirchhof, Michael, et al.
Published: (2024)

MEME: Multi-entity & Evolving Memory Evaluation
by: Jung, Seokwon, et al.
Published: (2026)

Universal Algorithm-Implicit Learning
by: Woerner, Stefano, et al.
Published: (2026)

SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
by: Mao, Yucheng, et al.
Published: (2025)

TestDG: Test-time Domain Generalization for Continual Test-time Adaptation
by: Lee, Sohyun, et al.
Published: (2025)

Are We Done with Object-Centric Learning?
by: Rubinstein, Alexander, et al.
Published: (2025)

Scalable Ensemble Diversification for OOD Generalization and Detection
by: Rubinstein, Alexander, et al.
Published: (2024)

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring
by: Rodriguez, Hector G., et al.
Published: (2026)

ReCap: Lightweight Referential Grounding for Coherent Story Visualization
by: Arora, Aditya, et al.
Published: (2026)

MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
by: Oh, Youngmin, et al.
Published: (2024)

Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers
by: Busaranuvong, Palawat, et al.
Published: (2024)

Explorer: Robust Collection of Interactable GUI Elements
by: Chaimalas, Iason, et al.
Published: (2025)

Task-oriented Learnable Diffusion Timesteps for Universal Few-shot Learning of Dense Tasks
by: Oh, Changgyoon, et al.
Published: (2025)

AVOID: The Adverse Visual Conditions Dataset with Obstacles for Driving Scene Understanding
by: Jeong, Jongoh, et al.
Published: (2025)

Uni-Classifier: Leveraging Video Diffusion Priors for Universal Guidance Classifier
by: Zhou, Yujie, et al.
Published: (2026)

VSC: Visual Search Compositional Text-to-Image Diffusion Model
by: Dat, Do Huu, et al.
Published: (2025)

Towards Understanding the Mechanisms of Classifier-Free Guidance
by: Li, Xiang, et al.
Published: (2025)