:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kolner, Oleh, Ortner, Thomas, Woźniak, Stanisław, Pantazi, Angeliki
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2409.20213
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dynamic Event-based Optical Identification and Communication
by: von Arnim, Axel, et al.
Published: (2023)

FlowState: Sampling Rate Invariant Time Series Forecasting
by: Graf, Lars, et al.
Published: (2025)

AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
by: Pardyl, Adam, et al.
Published: (2024)

Unraveling the geometry of visual relational reasoning
by: Shang, Jiaqi, et al.
Published: (2025)

GAP-MLLM: Geometry-Aligned Pre-training for Activating 3D Spatial Perception in Multimodal Large Language Models
by: Zhang, Jiaxin, et al.
Published: (2026)

GAP: Gaussianize Any Point Clouds with Text Guidance
by: Zhang, Weiqi, et al.
Published: (2025)

Chain-of-Glimpse: Search-Guided Progressive Object-Grounded Reasoning for Video Understanding
by: Wu, Zhixuan, et al.
Published: (2026)

Perception Encoder: The best visual embeddings are not at the output of the network
by: Bolya, Daniel, et al.
Published: (2025)

A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models
by: Zeng, Quan-Sheng, et al.
Published: (2025)

Action Anticipation at a Glimpse: To What Extent Can Multimodal Cues Replace Video?
by: Benavent-Lledo, Manuel, et al.
Published: (2025)

Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning
by: Jeoung, Sullam, et al.
Published: (2024)

Mind the Shape Gap: A Benchmark and Baseline for Deformation-Aware 6D Pose Estimation of Agricultural Produce
by: Chatzis, Nikolas, et al.
Published: (2026)

CAMP-VQA: Caption-Embedded Multimodal Perception for No-Reference Quality Assessment of Compressed Video
by: Wang, Xinyi, et al.
Published: (2025)

Align the GAP: Prior-based Unified Multi-Task Remote Physiological Measurement Framework For Domain Generalization and Personalization
by: Wang, Jiyao, et al.
Published: (2025)

Active Visual Perception: Opportunities and Challenges
by: Li, Yian, et al.
Published: (2025)

Incremental dimension reduction for efficient and accurate visual anomaly detection
by: Lee, Teng-Yok
Published: (2026)

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses
by: Lee, Inhee, et al.
Published: (2024)

GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?
by: Zhou, Yiyang, et al.
Published: (2025)

Affine transformation estimation improves visual self-supervised learning
by: Torpey, David, et al.
Published: (2024)

Gradient events: improved acquisition of visual information in event cameras
by: Lehtonen, Eero, et al.
Published: (2024)

UnGAP: Uncertainty-Guided Affine Prompting for Real-Time Crack Segmentation
by: Li, Conghui, et al.
Published: (2026)

Relaxed forced choice improves performance of visual quality assessment methods
by: Jenadeleh, Mohsen, et al.
Published: (2023)

BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining
by: Khoche, Ajinkya, et al.
Published: (2025)

Learning Underwater Active Perception in Simulation
by: Cardaillac, Alexandre, et al.
Published: (2025)

Glimpse: Generalized Locality for Scalable and Robust CT
by: Khorashadizadeh, AmirEhsan, et al.
Published: (2024)

VisualChef: Generating Visual Aids in Cooking via Mask Inpainting
by: Kuzyk, Oleh, et al.
Published: (2025)

Active Perception Agent for Omnimodal Audio-Video Understanding
by: Tao, Keda, et al.
Published: (2025)

OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation
by: Jiang, Jianwen, et al.
Published: (2025)

HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors
by: Zheng, Xiaozheng, et al.
Published: (2024)

Mind the Hitch: Dynamic Calibration and Articulated Perception for Autonomous Trucks
by: Zhu, Morui, et al.
Published: (2026)

DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking
by: Zheng, Weicheng, et al.
Published: (2025)

Splatter Image: Ultra-Fast Single-View 3D Reconstruction
by: Szymanowicz, Stanislaw, et al.
Published: (2023)

Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval
by: Georgoulopoulos, Dimitrios, et al.
Published: (2025)

Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion
by: Xu, Yan, et al.
Published: (2025)

Unmasking the Uniqueness: A Glimpse into Age-Invariant Face Recognition of Indigenous African Faces
by: Ajewole, Fakunle, et al.
Published: (2024)

ActFormer: Scalable Collaborative Perception via Active Queries
by: Huang, Suozhi, et al.
Published: (2024)

Audio-visual training for improved grounding in video-text LLMs
by: Sagare, Shivprasad, et al.
Published: (2024)

Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification
by: Wang, Qingyu, et al.
Published: (2024)

GAP3D: Generative Alignment of VLM Latents to Patch-Level Embeddings for 3D Generation
by: Gkotsi, Polytimi Anna, et al.
Published: (2026)

Explaining Vision GNNs: A Semantic and Visual Analysis of Graph-based Image Classification
by: Chaidos, Nikolaos, et al.
Published: (2025)