:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wu, Hongyu, Yang, Pengwan, Asano, Yuki M., Snoek, Cees G. M.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.19331
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
by: Dorkenwald, Michael, et al.
Published: (2024)

Elastic ViTs from Pretrained Models without Retraining
by: Simoncini, Walter, et al.
Published: (2025)

Redefining Normal: A Novel Object-Level Approach for Multi-Object Novelty Detection
by: Salehi, Mohammadreza, et al.
Published: (2024)

Lost in Time: A New Temporal Benchmark for VideoLLMs
by: Cores, Daniel, et al.
Published: (2024)

MoSiC: Optimal-Transport Motion Trajectory for Dense Self-Supervised Learning
by: Salehi, Mohammadreza, et al.
Published: (2025)

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
by: Sträter, Luc P. J., et al.
Published: (2024)

Any-Shift Prompting for Generalization over Distributions
by: Xiao, Zehao, et al.
Published: (2024)

SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
by: Rastegar, Sarah, et al.
Published: (2024)

SIGMA: Sinkhorn-Guided Masked Video Modeling
by: Salehi, Mohammadreza, et al.
Published: (2024)

TWIST & SCOUT: Grounding Multimodal LLM-Experts by Forget-Free Tuning
by: Bhowmik, Aritra, et al.
Published: (2024)

TULIP: Token-length Upgraded CLIP
by: Najdenkoska, Ivona, et al.
Published: (2024)

QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
by: Sun, Wenfang, et al.
Published: (2024)

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation
by: Nguyen, Duy-Kien, et al.
Published: (2023)

Prompt Diffusion Robustifies Any-Modality Prompt Learning
by: Du, Yingjun, et al.
Published: (2024)

Training-Free Semantic Segmentation via LLM-Supervision
by: Sun, Wenfang, et al.
Published: (2024)

Low-Resource Vision Challenges for Foundation Models
by: Zhang, Yunhua, et al.
Published: (2024)

Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
by: Liu, Huabin, et al.
Published: (2025)

SAMPart3D: Segment Any Part in 3D Objects
by: Yang, Yunhan, et al.
Published: (2024)

IPO: Interpretable Prompt Optimization for Vision-Language Models
by: Du, Yingjun, et al.
Published: (2024)

SAI3D: Segment Any Instance in 3D Scenes
by: Yin, Yingda, et al.
Published: (2023)

Dual Guidance Semi-Supervised Action Detection
by: Singh, Ankit, et al.
Published: (2025)

SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
by: Du, Yingjun, et al.
Published: (2023)

LocoMotion: Learning Motion-Focused Video-Language Representations
by: Doughty, Hazel, et al.
Published: (2024)

SAS: Segment Any 3D Scene with Integrated 2D Priors
by: Li, Zhuoyuan, et al.
Published: (2025)

Beyond Coarse-Grained Matching in Video-Text Retrieval
by: Chen, Aozhu, et al.
Published: (2024)

MoAlign: Motion-Centric Representation Alignment for Video Diffusion Models
by: Bhowmik, Aritra, et al.
Published: (2025)

RegionReasoner: Region-Grounded Multi-Round Visual Reasoning
by: Sun, Wenfang, et al.
Published: (2026)

Union-over-Intersections: Object Detection beyond Winner-Takes-All
by: Bhowmik, Aritra, et al.
Published: (2023)

NeoBabel: A Multilingual Open Tower for Visual Generation
by: Derakhshani, Mohammad Mahdi, et al.
Published: (2025)

The Sound of Water: Inferring Physical Properties from Pouring Liquids
by: Bagad, Piyush, et al.
Published: (2024)

Crane: Context-Guided Prompt Learning and Attention Refinement for Zero-Shot Anomaly Detection
by: Salehi, Alireza, et al.
Published: (2025)

FVO: Fast Visual Odometry with Transformers
by: Yugay, Vlardimir, et al.
Published: (2025)

Segment Any 3D Gaussians
by: Cen, Jiazhong, et al.
Published: (2023)

SAMSelect: A Spectral Index Search for Marine Debris Visualization using Segment Anything
by: van Dalen, Joost, et al.
Published: (2025)

Find Any Part in 3D
by: Ma, Ziqi, et al.
Published: (2024)

Harmonious Parameter Adaptation in Continual Visual Instruction Tuning for Safety-Aligned MLLMs
by: Wang, Ziqi, et al.
Published: (2025)

TAP-CT: 3D Task-Agnostic Pretraining of Computed Tomography Foundation Models
by: Veenboer, Tim, et al.
Published: (2025)

Segment Any 4D Gaussians
by: Ji, Shengxiang, et al.
Published: (2024)

Auto-Vocabulary Semantic Segmentation
by: Ülger, Osman, et al.
Published: (2023)

Evaluating Foundation Models' 3D Understanding Through Multi-View Correspondence Analysis
by: Lilova, Valentina, et al.
Published: (2025)