Saved in:
| Main Authors: | Traub, Manuel, Butz, Martin V. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.02763 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Learning Object Permanence from Videos via Latent Imaginations
by: Traub, Manuel, et al.
Published: (2023)
by: Traub, Manuel, et al.
Published: (2023)
Loci-Segmented: Improving Scene Segmentation Learning
by: Traub, Manuel, et al.
Published: (2023)
by: Traub, Manuel, et al.
Published: (2023)
Vector-Quantized Vision Foundation Models for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2025)
by: Zhao, Rongzhen, et al.
Published: (2025)
Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning
by: Tang, Luyao, et al.
Published: (2024)
by: Tang, Luyao, et al.
Published: (2024)
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
by: Huang, Shaofei, et al.
Published: (2025)
by: Huang, Shaofei, et al.
Published: (2025)
PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
by: Žust, Lojze, et al.
Published: (2024)
by: Žust, Lojze, et al.
Published: (2024)
LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation
by: Yuan, Yuqian, et al.
Published: (2026)
by: Yuan, Yuqian, et al.
Published: (2026)
Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding
by: Mirjalili, Vahid, et al.
Published: (2025)
by: Mirjalili, Vahid, et al.
Published: (2025)
Rein++: Efficient Generalization and Adaptation for Semantic Segmentation with Vision Foundation Models
by: Wei, Zhixiang, et al.
Published: (2025)
by: Wei, Zhixiang, et al.
Published: (2025)
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models
by: Luo, Yulin, et al.
Published: (2026)
by: Luo, Yulin, et al.
Published: (2026)
Object-Centric Vision Token Pruning for Vision Language Models
by: Li, Guangyuan, et al.
Published: (2025)
by: Li, Guangyuan, et al.
Published: (2025)
RePack then Refine: Efficient Diffusion Transformer with Vision Foundation Model
by: Dong, Guanfang, et al.
Published: (2025)
by: Dong, Guanfang, et al.
Published: (2025)
Appearance-Based Refinement for Object-Centric Motion Segmentation
by: Xie, Junyu, et al.
Published: (2023)
by: Xie, Junyu, et al.
Published: (2023)
Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation?
by: Fuster-Barceló, Caterina, et al.
Published: (2026)
by: Fuster-Barceló, Caterina, et al.
Published: (2026)
SimPLR: A Simple and Plain Transformer for Efficient Object Detection and Segmentation
by: Nguyen, Duy-Kien, et al.
Published: (2023)
by: Nguyen, Duy-Kien, et al.
Published: (2023)
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
by: Norouzi, Narges, et al.
Published: (2024)
by: Norouzi, Narges, et al.
Published: (2024)
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
by: Qian, Jianing, et al.
Published: (2024)
by: Qian, Jianing, et al.
Published: (2024)
ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models
by: Zhang, Ying, et al.
Published: (2025)
by: Zhang, Ying, et al.
Published: (2025)
Annotation Free Semantic Segmentation with Vision Foundation Models
by: Seifi, Soroush, et al.
Published: (2024)
by: Seifi, Soroush, et al.
Published: (2024)
Mechanisms of Object Localization in Vision-Language Models
by: Schaumlöffel, Timothy, et al.
Published: (2026)
by: Schaumlöffel, Timothy, et al.
Published: (2026)
Can Modern Vision Models Understand the Difference Between an Object and a Look-alike?
by: Cohen, Itay, et al.
Published: (2025)
by: Cohen, Itay, et al.
Published: (2025)
Classifier-Centric Adaptive Framework for Open-Vocabulary Camouflaged Object Segmentation
by: Zhang, Hanyu, et al.
Published: (2025)
by: Zhang, Hanyu, et al.
Published: (2025)
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
by: Kim, Chanyoung, et al.
Published: (2024)
by: Kim, Chanyoung, et al.
Published: (2024)
Zero-shot Object-Centric Instruction Following: Integrating Foundation Models with Traditional Navigation
by: Raychaudhuri, Sonia, et al.
Published: (2024)
by: Raychaudhuri, Sonia, et al.
Published: (2024)
Scalable Object Detection in the Car Interior With Vision Foundation Models
by: Schmidt, Sebastian, et al.
Published: (2025)
by: Schmidt, Sebastian, et al.
Published: (2025)
Evaluating Vision Foundation Models for Pixel and Object Classification in Microscopy
by: Teuber, Carolin, et al.
Published: (2026)
by: Teuber, Carolin, et al.
Published: (2026)
SOCO: Benchmarking Semantic Object Correspondence in Vision Foundation Models
by: Dünkel, Olaf, et al.
Published: (2026)
by: Dünkel, Olaf, et al.
Published: (2026)
Token-Space Mask Prediction for Efficient Vision Transformer Segmentation
by: Galagain, Calvin, et al.
Published: (2026)
by: Galagain, Calvin, et al.
Published: (2026)
Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization
by: Hannan, Darryl, et al.
Published: (2025)
by: Hannan, Darryl, et al.
Published: (2025)
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
by: Fuller, Anthony, et al.
Published: (2024)
by: Fuller, Anthony, et al.
Published: (2024)
Self-Supervised Vision Transformers Are Efficient Segmentation Learners for Imperfect Labels
by: Lee, Seungho, et al.
Published: (2024)
by: Lee, Seungho, et al.
Published: (2024)
Analysis of Object Detection Models for Tiny Object in Satellite Imagery: A Dataset-Centric Approach
by: PS, Kailas, et al.
Published: (2024)
by: PS, Kailas, et al.
Published: (2024)
HeightFormer: Learning Height Prediction in Voxel Features for Roadside Vision Centric 3D Object Detection via Transformer
by: Zhang, Zhang, et al.
Published: (2025)
by: Zhang, Zhang, et al.
Published: (2025)
Unbiased Semantic Decoding with Vision Foundation Models for Few-shot Segmentation
by: Wang, Jin, et al.
Published: (2025)
by: Wang, Jin, et al.
Published: (2025)
Adapting Vision Foundation Models for Real-time Ultrasound Image Segmentation
by: Zhang, Xiaoran, et al.
Published: (2025)
by: Zhang, Xiaoran, et al.
Published: (2025)
Set Pivot Learning: Redefining Generalized Segmentation with Vision Foundation Models
by: Li, Xinhui, et al.
Published: (2025)
by: Li, Xinhui, et al.
Published: (2025)
GuiDINO: Rethinking Vision Foundation Model in Medical Image Segmentation
by: Liang, Zhuonan, et al.
Published: (2026)
by: Liang, Zhuonan, et al.
Published: (2026)
Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation
by: Lv, Chonghua, et al.
Published: (2026)
by: Lv, Chonghua, et al.
Published: (2026)
A Novel Vision Transformer for Camera-LiDAR Fusion based Traffic Object Segmentation
by: Tahves, Toomas, et al.
Published: (2025)
by: Tahves, Toomas, et al.
Published: (2025)
Object-Centric Diffusion for Efficient Video Editing
by: Kahatapitiya, Kumara, et al.
Published: (2024)
by: Kahatapitiya, Kumara, et al.
Published: (2024)
Similar Items
-
Learning Object Permanence from Videos via Latent Imaginations
by: Traub, Manuel, et al.
Published: (2023) -
Loci-Segmented: Improving Scene Segmentation Learning
by: Traub, Manuel, et al.
Published: (2023) -
Vector-Quantized Vision Foundation Models for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2025) -
Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning
by: Tang, Luyao, et al.
Published: (2024) -
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
by: Huang, Shaofei, et al.
Published: (2025)