Saved in:
| Main Authors: | Kolner, Oleh, Ortner, Thomas, Woźniak, Stanisław, Pantazi, Angeliki |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.20213 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamic Event-based Optical Identification and Communication
by: von Arnim, Axel, et al.
Published: (2023)
by: von Arnim, Axel, et al.
Published: (2023)
FlowState: Sampling Rate Invariant Time Series Forecasting
by: Graf, Lars, et al.
Published: (2025)
by: Graf, Lars, et al.
Published: (2025)
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
by: Pardyl, Adam, et al.
Published: (2024)
by: Pardyl, Adam, et al.
Published: (2024)
Unraveling the geometry of visual relational reasoning
by: Shang, Jiaqi, et al.
Published: (2025)
by: Shang, Jiaqi, et al.
Published: (2025)
GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models
by: Zhang, Jiaxin, et al.
Published: (2026)
by: Zhang, Jiaxin, et al.
Published: (2026)
GAP: Gaussianize Any Point Clouds with Text Guidance
by: Zhang, Weiqi, et al.
Published: (2025)
by: Zhang, Weiqi, et al.
Published: (2025)
Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding
by: Wu, Zhixuan, et al.
Published: (2026)
by: Wu, Zhixuan, et al.
Published: (2026)
Perception Encoder: The best visual embeddings are not at the output of the network
by: Bolya, Daniel, et al.
Published: (2025)
by: Bolya, Daniel, et al.
Published: (2025)
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)
by: Zeng, Quan-Sheng, et al.
Published: (2025)
Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
by: Benavent-Lledo, Manuel, et al.
Published: (2025)
by: Benavent-Lledo, Manuel, et al.
Published: (2025)
Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
by: Jeoung, Sullam, et al.
Published: (2024)
by: Jeoung, Sullam, et al.
Published: (2024)
Mind the Shape Gap: A Benchmark and Baseline for Deformation-Aware 6D Pose Estimation of Agricultural Produce
by: Chatzis, Nikolas, et al.
Published: (2026)
by: Chatzis, Nikolas, et al.
Published: (2026)
CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video
by: Wang, Xinyi, et al.
Published: (2025)
by: Wang, Xinyi, et al.
Published: (2025)
Align the GAP: Prior-based Unified Multi-Task Remote Physiological Measurement Framework For Domain Generalization and Personalization
by: Wang, Jiyao, et al.
Published: (2025)
by: Wang, Jiyao, et al.
Published: (2025)
Active Visual Perception: Opportunities and Challenges
by: Li, Yian, et al.
Published: (2025)
by: Li, Yian, et al.
Published: (2025)
Incremental dimension reduction for efficient and accurate visual anomaly detection
by: Lee, Teng-Yok
Published: (2026)
by: Lee, Teng-Yok
Published: (2026)
Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses
by: Lee, Inhee, et al.
Published: (2024)
by: Lee, Inhee, et al.
Published: (2024)
GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?
by: Zhou, Yiyang, et al.
Published: (2025)
by: Zhou, Yiyang, et al.
Published: (2025)
Affine transformation estimation improves visual self-supervised learning
by: Torpey, David, et al.
Published: (2024)
by: Torpey, David, et al.
Published: (2024)
Gradient events: improved acquisition of visual information in event cameras
by: Lehtonen, Eero, et al.
Published: (2024)
by: Lehtonen, Eero, et al.
Published: (2024)
UnGAP: Uncertainty-Guided Affine Prompting for Real-Time Crack Segmentation
by: Li, Conghui, et al.
Published: (2026)
by: Li, Conghui, et al.
Published: (2026)
Relaxed forced choice improves performance of visual quality assessment methods
by: Jenadeleh, Mohsen, et al.
Published: (2023)
by: Jenadeleh, Mohsen, et al.
Published: (2023)
BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining
by: Khoche, Ajinkya, et al.
Published: (2025)
by: Khoche, Ajinkya, et al.
Published: (2025)
Learning Underwater Active Perception in Simulation
by: Cardaillac, Alexandre, et al.
Published: (2025)
by: Cardaillac, Alexandre, et al.
Published: (2025)
Glimpse: Generalized Locality for Scalable and Robust CT
by: Khorashadizadeh, AmirEhsan, et al.
Published: (2024)
by: Khorashadizadeh, AmirEhsan, et al.
Published: (2024)
VisualChef: Generating Visual Aids in Cooking via Mask Inpainting
by: Kuzyk, Oleh, et al.
Published: (2025)
by: Kuzyk, Oleh, et al.
Published: (2025)
Active Perception Agent for Omnimodal Audio-Video Understanding
by: Tao, Keda, et al.
Published: (2025)
by: Tao, Keda, et al.
Published: (2025)
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation
by: Jiang, Jianwen, et al.
Published: (2025)
by: Jiang, Jianwen, et al.
Published: (2025)
HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors
by: Zheng, Xiaozheng, et al.
Published: (2024)
by: Zheng, Xiaozheng, et al.
Published: (2024)
Mind the Hitch: Dynamic Calibration and Articulated Perception for Autonomous Trucks
by: Zhu, Morui, et al.
Published: (2026)
by: Zhu, Morui, et al.
Published: (2026)
DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
by: Zheng, Weicheng, et al.
Published: (2025)
by: Zheng, Weicheng, et al.
Published: (2025)
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
by: Szymanowicz, Stanislaw, et al.
Published: (2023)
by: Szymanowicz, Stanislaw, et al.
Published: (2023)
Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval
by: Georgoulopoulos, Dimitrios, et al.
Published: (2025)
by: Georgoulopoulos, Dimitrios, et al.
Published: (2025)
Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion
by: Xu, Yan, et al.
Published: (2025)
by: Xu, Yan, et al.
Published: (2025)
Unmasking the Uniqueness: A Glimpse into Age-Invariant Face Recognition of Indigenous African Faces
by: Ajewole, Fakunle, et al.
Published: (2024)
by: Ajewole, Fakunle, et al.
Published: (2024)
ActFormer: Scalable Collaborative Perception via Active Queries
by: Huang, Suozhi, et al.
Published: (2024)
by: Huang, Suozhi, et al.
Published: (2024)
Audio-visual training for improved grounding in video-text LLMs
by: Sagare, Shivprasad, et al.
Published: (2024)
by: Sagare, Shivprasad, et al.
Published: (2024)
Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification
by: Wang, Qingyu, et al.
Published: (2024)
by: Wang, Qingyu, et al.
Published: (2024)
GAP3D: Generative Alignment of VLM Latents to Patch-Level Embeddings for 3D Generation
by: Gkotsi, Polytimi Anna, et al.
Published: (2026)
by: Gkotsi, Polytimi Anna, et al.
Published: (2026)
Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification
by: Chaidos, Nikolaos, et al.
Published: (2025)
by: Chaidos, Nikolaos, et al.
Published: (2025)
Similar Items
-
Dynamic Event-based Optical Identification and Communication
by: von Arnim, Axel, et al.
Published: (2023) -
FlowState: Sampling Rate Invariant Time Series Forecasting
by: Graf, Lars, et al.
Published: (2025) -
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
by: Pardyl, Adam, et al.
Published: (2024) -
Unraveling the geometry of visual relational reasoning
by: Shang, Jiaqi, et al.
Published: (2025) -
GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models
by: Zhang, Jiaxin, et al.
Published: (2026)