Saved in:
| Main Authors: | Paskaleva, Reni, Holubakha, Mykyta, Ilic, Andela, Motamed, Saman, Van Gool, Luc, Paudel, Danda |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.01243 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
by: Motamed, Saman, et al.
Published: (2023)
by: Motamed, Saman, et al.
Published: (2023)
Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
by: Motamed, Saman, et al.
Published: (2024)
by: Motamed, Saman, et al.
Published: (2024)
Continuous Pose for Monocular Cameras in Neural Implicit Representation
by: Ma, Qi, et al.
Published: (2023)
by: Ma, Qi, et al.
Published: (2023)
EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM
by: Chen, Shi, et al.
Published: (2024)
by: Chen, Shi, et al.
Published: (2024)
TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility
by: Motamed, Saman, et al.
Published: (2025)
by: Motamed, Saman, et al.
Published: (2025)
Learning Generative Interactive Environments By Trained Agent Exploration
by: Kazemi, Naser, et al.
Published: (2024)
by: Kazemi, Naser, et al.
Published: (2024)
From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation
by: Mahdi, Mohammad, et al.
Published: (2026)
by: Mahdi, Mohammad, et al.
Published: (2026)
InTraGen: Trajectory-controlled Video Generation for Object Interactions
by: Liu, Zuhao, et al.
Published: (2024)
by: Liu, Zuhao, et al.
Published: (2024)
A Simple and Generalist Approach for Panoptic Segmentation
by: Prisadnikov, Nedyalko, et al.
Published: (2024)
by: Prisadnikov, Nedyalko, et al.
Published: (2024)
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
by: Ma, Qi, et al.
Published: (2024)
by: Ma, Qi, et al.
Published: (2024)
Vision encoders should be image size agnostic and task driven
by: Prisadnikov, Nedyalko, et al.
Published: (2025)
by: Prisadnikov, Nedyalko, et al.
Published: (2025)
Self-supervised pretraining for an iterative image size agnostic vision transformer
by: Prisadnikov, Nedyalko, et al.
Published: (2026)
by: Prisadnikov, Nedyalko, et al.
Published: (2026)
EarthMind: Leveraging Cross-Sensor Data for Advanced Earth Observation Interpretation with a Unified Multimodal LLM
by: Shu, Yan, et al.
Published: (2025)
by: Shu, Yan, et al.
Published: (2025)
Occam's LGS: An Efficient Approach for Language Gaussian Splatting
by: Cheng, Jiahuan, et al.
Published: (2024)
by: Cheng, Jiahuan, et al.
Published: (2024)
Inferring Compositional 4D Scenes without Ever Seeing One
by: Gokmen, Ahmet Berke, et al.
Published: (2025)
by: Gokmen, Ahmet Berke, et al.
Published: (2025)
RICO: Two Realistic Benchmarks and an In-Depth Analysis for Incremental Learning in Object Detection
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
Incremental Object Detection with Prompt-based Methods
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
by: Neuwirth-Trapp, Matthias, et al.
Published: (2025)
Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
by: Balauca, Ada-Astrid, et al.
Published: (2024)
by: Balauca, Ada-Astrid, et al.
Published: (2024)
Exploration-Driven Generative Interactive Environments
by: Savov, Nedko, et al.
Published: (2025)
by: Savov, Nedko, et al.
Published: (2025)
SeasonScapes: Learning Large-scale Re-lightable 3D Landscapes with Seasonal Variation from Sparse Webcams
by: Kleger, Timo, et al.
Published: (2026)
by: Kleger, Timo, et al.
Published: (2026)
Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM
by: Lin, Junru, et al.
Published: (2023)
by: Lin, Junru, et al.
Published: (2023)
Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis
by: Mahdi, Mohammad, et al.
Published: (2025)
by: Mahdi, Mohammad, et al.
Published: (2025)
CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation
by: Ma, Qi, et al.
Published: (2025)
by: Ma, Qi, et al.
Published: (2025)
Cross-View Multi-Modal Segmentation @ Ego-Exo4D Challenges 2025
by: Fu, Yuqian, et al.
Published: (2025)
by: Fu, Yuqian, et al.
Published: (2025)
BiXFormer: A Robust Framework for Maximizing Modality Effectiveness in Multi-Modal Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2025)
by: Chen, Jialei, et al.
Published: (2025)
VOID: Video Object and Interaction Deletion
by: Motamed, Saman, et al.
Published: (2026)
by: Motamed, Saman, et al.
Published: (2026)
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
by: Dey, Sombit, et al.
Published: (2024)
by: Dey, Sombit, et al.
Published: (2024)
Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation
by: Chen, Jialei, et al.
Published: (2025)
by: Chen, Jialei, et al.
Published: (2025)
Rethinking Global Context in Crowd Counting
by: Sun, Guolei, et al.
Published: (2021)
by: Sun, Guolei, et al.
Published: (2021)
Reducing Unimodal Bias in Multi-Modal Semantic Segmentation with Multi-Scale Functional Entropy Regularization
by: Zheng, Xu, et al.
Published: (2025)
by: Zheng, Xu, et al.
Published: (2025)
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
by: Savov, Nedko, et al.
Published: (2025)
by: Savov, Nedko, et al.
Published: (2025)
FireScope: Wildfire Risk Raster Prediction with a Chain-of-Thought Oracle
by: Markov, Mario, et al.
Published: (2025)
by: Markov, Mario, et al.
Published: (2025)
B-GRTO: Bootstrapped Group Relative Tool Optimization for Referring Segmentation
by: Markov, Mario, et al.
Published: (2026)
by: Markov, Mario, et al.
Published: (2026)
GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond
by: Halacheva, Anna-Maria, et al.
Published: (2025)
by: Halacheva, Anna-Maria, et al.
Published: (2025)
From Scan to Action: Leveraging Realistic Scans for Embodied Scene Understanding
by: Halacheva, Anna-Maria, et al.
Published: (2025)
by: Halacheva, Anna-Maria, et al.
Published: (2025)
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
by: Fiaz, Mustansar, et al.
Published: (2025)
by: Fiaz, Mustansar, et al.
Published: (2025)
LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation
by: Miao, Yang, et al.
Published: (2025)
by: Miao, Yang, et al.
Published: (2025)
OSMDA: OpenStreetMap-based Domain Adaptation for Remote Sensing VLMs
by: Ailuro, Stefan Maria, et al.
Published: (2026)
by: Ailuro, Stefan Maria, et al.
Published: (2026)
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
by: Halacheva, Anna-Maria, et al.
Published: (2024)
by: Halacheva, Anna-Maria, et al.
Published: (2024)
ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
by: Ma, Qi, et al.
Published: (2024)
by: Ma, Qi, et al.
Published: (2024)
Similar Items
-
Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
by: Motamed, Saman, et al.
Published: (2023) -
Investigating the Effectiveness of Cross-Attention to Unlock Zero-Shot Editing of Text-to-Video Diffusion Models
by: Motamed, Saman, et al.
Published: (2024) -
Continuous Pose for Monocular Cameras in Neural Implicit Representation
by: Ma, Qi, et al.
Published: (2023) -
EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM
by: Chen, Shi, et al.
Published: (2024) -
TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility
by: Motamed, Saman, et al.
Published: (2025)