:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Traub, Manuel, Butz, Martin V.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.02763
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Learning Object Permanence from Videos via Latent Imaginations
by: Traub, Manuel, et al.
Published: (2023)

Loci-Segmented: Improving Scene Segmentation Learning
by: Traub, Manuel, et al.
Published: (2023)

Vector-Quantized Vision Foundation Models for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2025)

Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning
by: Tang, Luyao, et al.
Published: (2024)

Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
by: Huang, Shaofei, et al.
Published: (2025)

PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
by: Žust, Lojze, et al.
Published: (2024)

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
by: Yuan, Yuqian, et al.
Published: (2026)

Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
by: Mirjalili, Vahid, et al.
Published: (2025)

Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models
by: Wei, Zhixiang, et al.
Published: (2025)

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
by: Luo, Yulin, et al.
Published: (2026)

Object-Centric Vision Token Pruning for Vision Language Models
by: Li, Guangyuan, et al.
Published: (2025)

RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model
by: Dong, Guanfang, et al.
Published: (2025)

Appearance-Based Refinement for Object-Centric Motion Segmentation
by: Xie, Junyu, et al.
Published: (2023)

Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation?
by: Fuster-Barceló, Caterina, et al.
Published: (2026)

SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation
by: Nguyen, Duy-Kien, et al.
Published: (2023)

ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
by: Norouzi, Narges, et al.
Published: (2024)

Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
by: Qian, Jianing, et al.
Published: (2024)

ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models
by: Zhang, Ying, et al.
Published: (2025)

Annotation Free Semantic Segmentation with Vision Foundation Models
by: Seifi, Soroush, et al.
Published: (2024)

Mechanisms of Object Localization in Vision-Language Models
by: Schaumlöffel, Timothy, et al.
Published: (2026)

Can Modern Vision Models Understand the Difference Between an Object and a Look-alike?
by: Cohen, Itay, et al.
Published: (2025)

Classifier-Centric Adaptive Framework for Open-Vocabulary Camouflaged Object Segmentation
by: Zhang, Hanyu, et al.
Published: (2025)

EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
by: Kim, Chanyoung, et al.
Published: (2024)

Zero-shot Object-Centric Instruction Following: Integrating Foundation Models with Traditional Navigation
by: Raychaudhuri, Sonia, et al.
Published: (2024)

Scalable Object Detection in the Car Interior With Vision Foundation Models
by: Schmidt, Sebastian, et al.
Published: (2025)

Evaluating Vision Foundation Models for Pixel and Object Classification in Microscopy
by: Teuber, Carolin, et al.
Published: (2026)

SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models
by: Dünkel, Olaf, et al.
Published: (2026)

Token-Space Mask Prediction for Efficient Vision Transformer Segmentation
by: Galagain, Calvin, et al.
Published: (2026)

Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization
by: Hannan, Darryl, et al.
Published: (2025)

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
by: Fuller, Anthony, et al.
Published: (2024)

Self-Supervised Vision Transformers Are Efficient Segmentation Learners for Imperfect Labels
by: Lee, Seungho, et al.
Published: (2024)

Analysis of Object Detection Models for Tiny Object in Satellite Imagery: A Dataset-Centric Approach
by: PS, Kailas, et al.
Published: (2024)

HeightFormer: Learning Height Prediction in Voxel Features for Roadside Vision Centric 3D Object Detection via Transformer
by: Zhang, Zhang, et al.
Published: (2025)

Unbiased Semantic Decoding with Vision Foundation Models for Few-shot Segmentation
by: Wang, Jin, et al.
Published: (2025)

Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation
by: Zhang, Xiaoran, et al.
Published: (2025)

Set Pivot Learning: Redefining Generalized Segmentation with Vision Foundation Models
by: Li, Xinhui, et al.
Published: (2025)

GuiDINO: Rethinking Vision Foundation Model in Medical Image Segmentation
by: Liang, Zhuonan, et al.
Published: (2026)

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation
by: Lv, Chonghua, et al.
Published: (2026)

A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation
by: Tahves, Toomas, et al.
Published: (2025)

Object-Centric Diffusion for Efficient Video Editing
by: Kahatapitiya, Kumara, et al.
Published: (2024)