:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Rongzhen, Yang, Wenyan, Kannala, Juho, Pajarinen, Joni
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2508.05417
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Predicting Video Slot Attention Queries from Random Slot-Feature Pairs
by: Zhao, Rongzhen, et al.
Published: (2025)

Slot Attention with Re-Initialization and Self-Distillation
by: Zhao, Rongzhen, et al.
Published: (2025)

Vector-Quantized Vision Foundation Models for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2025)

Multi-Scale Fusion for Object Representation
by: Zhao, Rongzhen, et al.
Published: (2024)

Internalizing Temporal Consistency in Video Object-Centric Learning without Explicit Regularization
by: Zhao, Rongzhen, et al.
Published: (2026)

Organized Grouped Discrete Representation for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2024)

Grouped Discrete Representation for Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2024)

Grouped Discrete Representation Guides Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2024)

Cycle Consistency in Video Object-Centric Learning
by: Zhao, Rongzhen, et al.
Published: (2026)

MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
by: Liu, Hongjia, et al.
Published: (2025)

Rethinking Temporal Consistency in Video Object-Centric Learning: From Prediction to Correspondence
by: Li, Zhiyuan, et al.
Published: (2026)

Object-Centric Vision Token Pruning for Vision Language Models
by: Li, Guangyuan, et al.
Published: (2025)

PAWS: Perception of Articulation in the Wild at Scale from Egocentric Videos
by: Wang, Yihao, et al.
Published: (2026)

A2-GNN: Angle-Annular GNN for Visual Descriptor-free Camera Relocalization
by: Zhang, Yejun, et al.
Published: (2025)

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
by: Wang, Shuzhe, et al.
Published: (2023)

Scene-Agnostic Object-Centric Representation Learning for 3D Gaussian Splatting
by: Hsu, Tsuheng, et al.
Published: (2026)

Efficient NeRF Optimization -- Not All Samples Remain Equally Hard
by: Korhonen, Juuso, et al.
Published: (2024)

RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
by: Moshtaghi, Mehdi, et al.
Published: (2025)

Sources of Uncertainty in 3D Scene Reconstruction
by: Klasson, Marcus, et al.
Published: (2024)

Latent-Compressed Variational Autoencoder for Video Diffusion Models
by: Guan, Jiarui, et al.
Published: (2026)

OP2GS: Object-Aware 3D Gaussian Splatting with Dual-Opacity Primitives
by: Liu, Guiyu, et al.
Published: (2026)

Medical Image Segmentation with SAM-generated Annotations
by: Häkkinen, Iira, et al.
Published: (2024)

Gaussian Splatting in Mirrors: Reflection-Aware Rendering via Virtual Camera Optimization
by: Wang, Zihan, et al.
Published: (2024)

Differentiable Product Quantization for Memory Efficient Camera Relocalization
by: Laskar, Zakaria, et al.
Published: (2024)

Slot-VAE: Object-Centric Scene Generation with Slot Attention
by: Wang, Yanbo, et al.
Published: (2023)

3D Gaussian Splatting with Fisheye Images: Field of View Analysis and Depth-Based Initialization
by: Gunes, Ulas, et al.
Published: (2025)

FingerVeinSyn-5M: A Million-Scale Dataset and Benchmark for Finger Vein Recognition
by: Wang, Yinfan, et al.
Published: (2025)

NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation
by: Fang, Junyuan, et al.
Published: (2025)

FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking
by: Gunes, Ulas, et al.
Published: (2025)

DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
by: Turkulainen, Matias, et al.
Published: (2024)

HybVIO: Pushing the Limits of Real-time Visual-inertial Odometry
by: Seiskari, Otto, et al.
Published: (2021)

MuSHRoom: Multi-Sensor Hybrid Room Dataset for Joint 3D Reconstruction and Novel View Synthesis
by: Ren, Xuqian, et al.
Published: (2023)

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
by: Dong, Siyan, et al.
Published: (2024)

Attention Normalization Impacts Cardinality Generalization in Slot Attention
by: Krimmel, Markus, et al.
Published: (2024)

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
by: Fan, Ke, et al.
Published: (2024)

PRISM: Progressive Reasoning through Iterative Slot Memory for Vision
by: Wang, Ziyu, et al.
Published: (2026)

AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones
by: Ren, Xuqian, et al.
Published: (2024)

DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
by: Wang, Yihao, et al.
Published: (2024)

Guided Slot Attention for Unsupervised Video Object Segmentation
by: Lee, Minhyeok, et al.
Published: (2023)

MUFASA: A Multi-Layer Framework for Slot Attention
by: Bock, Sebastian, et al.
Published: (2026)