:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dongfang, Zihao, Zheng, Xu, Weng, Ziqiao, Lyu, Yuanhuiyi, Paudel, Danda Pani, Van Gool, Luc, Yang, Kailun, Hu, Xuming
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.11907
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

PANORAMA: The Rise of Omnidirectional Vision in the Embodied AI Era
by: Zheng, Xu, et al.
Published: (2025)

Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
by: Zheng, Xu, et al.
Published: (2025)

Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
by: Zheng, Xu, et al.
Published: (2025)

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks
by: Zheng, Xu, et al.
Published: (2025)

Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
by: Motamed, Saman, et al.
Published: (2023)

Autonomous Vehicle Controllers From End-to-End Differentiable Simulation
by: Nachkov, Asen, et al.
Published: (2024)

EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM
by: Chen, Shi, et al.
Published: (2024)

Seeing Together: Multi-Robot Cooperative Egocentric Spatial Reasoning with Multimodal Large Language Models
by: Peng, Kunyu, et al.
Published: (2026)

Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
by: Ma, Qi, et al.
Published: (2024)

Occam's LGS: An Efficient Approach for Language Gaussian Splatting
by: Cheng, Jiahuan, et al.
Published: (2024)

Continuous Pose for Monocular Cameras in Neural Implicit Representation
by: Ma, Qi, et al.
Published: (2023)

From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation
by: Mahdi, Mohammad, et al.
Published: (2026)

Vision encoders should be image size agnostic and task driven
by: Prisadnikov, Nedyalko, et al.
Published: (2025)

Self-supervised pretraining for an iterative image size agnostic vision transformer
by: Prisadnikov, Nedyalko, et al.
Published: (2026)

A Simple and Generalist Approach for Panoptic Segmentation
by: Prisadnikov, Nedyalko, et al.
Published: (2024)

GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond
by: Halacheva, Anna-Maria, et al.
Published: (2025)

SeasonScapes: Learning Large-scale Re-lightable 3D Landscapes with Seasonal Variation from Sparse Webcams
by: Kleger, Timo, et al.
Published: (2026)

Inferring Compositional 4D Scenes without Ever Seeing One
by: Gokmen, Ahmet Berke, et al.
Published: (2025)

Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
by: Balauca, Ada-Astrid, et al.
Published: (2024)

RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object Detection
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)

Incremental Object Detection with Prompt-based Methods
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)

Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2025)

BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2025)

EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering
by: Li, Yanjun, et al.
Published: (2025)

Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness
by: Liao, Chenfei, et al.
Published: (2025)

Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM
by: Lin, Junru, et al.
Published: (2023)

Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025
by: Fu, Yuqian, et al.
Published: (2025)

ProOOD: Prototype-Guided Out-of-Distribution 3D Occupancy Prediction
by: Zhang, Yuheng, et al.
Published: (2026)

MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents
by: Xu, Dannong, et al.
Published: (2026)

LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
by: Miao, Yang, et al.
Published: (2025)

ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
by: Dey, Sombit, et al.
Published: (2024)

Autonomous Vehicle Path Planning by Searching With Differentiable Simulation
by: Nachkov, Asen, et al.
Published: (2025)

Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models
by: Nachkov, Asen, et al.
Published: (2025)

Generalist Robot Manipulation beyond Action Labeled Data
by: Spiridonov, Alexander, et al.
Published: (2025)

FireScope: Wildfire Risk Raster Prediction with a Chain-of-Thought Oracle
by: Markov, Mario, et al.
Published: (2025)

B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation
by: Markov, Mario, et al.
Published: (2026)

CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation
by: Ma, Qi, et al.
Published: (2025)

Rethinking Global Context in Crowd Counting
by: Sun, Guolei, et al.
Published: (2021)

Exploration-Driven Generative Interactive Environments
by: Savov, Nedko, et al.
Published: (2025)

Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis
by: Mahdi, Mohammad, et al.
Published: (2025)