:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Van Brunt, Kai, Kay, Justin, Haucke, Timm, Perona, Pietro, Van Horn, Grant, Beery, Sara
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.05129
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Align and Distill: Unifying and Improving Domain Adaptive Object Detection
by: Kay, Justin, et al.
Published: (2024)

Pairwise Matching of Intermediate Representations for Fine-grained Explainability
by: Shrack, Lauren, et al.
Published: (2025)

Consensus-Driven Active Model Selection
by: Kay, Justin, et al.
Published: (2025)

Single View Seafloor Recovery from Imaging Sonar via Differentiable Rendering
by: Brodjian, Sevan, et al.
Published: (2026)

Deep in the Jungle: Towards Automating Chimpanzee Population Estimation
by: Raynes, Tom, et al.
Published: (2026)

SAVeD: Learning to Denoise Low-SNR Video for Improved Downstream Performance
by: Stathatos, Suzanne, et al.
Published: (2025)

Merlin L48 Spectrogram Dataset
by: Sun, Aaron, et al.
Published: (2025)

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
by: Saha, Oindrila, et al.
Published: (2024)

INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
by: Vendrow, Edward, et al.
Published: (2024)

Representational Similarity via Interpretable Visual Concepts
by: Kondapaneni, Neehar, et al.
Published: (2025)

Representational Difference Explanations
by: Kondapaneni, Neehar, et al.
Published: (2025)

Generate, Transduct, Adapt: Iterative Transduction with VLMs
by: Saha, Oindrila, et al.
Published: (2025)

Human-in-the-Loop Visual Re-ID for Population Size Estimation
by: Perez, Gustavo, et al.
Published: (2023)

Moment Sampling in Video LLMs for Long-Form Video QA
by: Chasmai, Mustafa, et al.
Published: (2025)

Personalized Representation from Personalized Generation
by: Sundaram, Shobhita, et al.
Published: (2024)

A Number Sense as an Emergent Property of the Manipulating Brain
by: Kondapaneni, Neehar, et al.
Published: (2020)

Unsupervised Representation Learning from Sparse Transformation Analysis
by: Song, Yue, et al.
Published: (2024)

Diffusion-Based Action Recognition Generalizes to Untrained Domains
by: Guimaraes, Rogerio, et al.
Published: (2025)

Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
by: Kang, Raphi, et al.
Published: (2026)

Confidence Intervals for Error Rates in 1:1 Matching Tasks: Critical Statistical Analysis and Recommendations
by: Fogliato, Riccardo, et al.
Published: (2023)

WildSAT: Learning Satellite Image Representations from Wildlife Observations
by: Daroya, Rangel, et al.
Published: (2024)

Less is More: Discovering Concise Network Explanations
by: Kondapaneni, Neehar, et al.
Published: (2024)

A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
by: Fogliato, Riccardo, et al.
Published: (2024)

Masked Autoencoders with Limited Data: Does It Work? A Fine-Grained Bioacoustics Case Study
by: Liu, Wuao, et al.
Published: (2026)

Seeing Through the PRISM: Compound & Controllable Restoration of Scientific Images
by: Kurinchi-Vendhan, Rupa, et al.
Published: (2026)

Text-image Alignment for Diffusion-based Perception
by: Kondapaneni, Neehar, et al.
Published: (2023)

On the Effect of Image Resolution on Semantic Segmentation
by: Singh, Ritambhara, et al.
Published: (2024)

CleverBirds: A Multiple-Choice Benchmark for Fine-grained Human Knowledge Tracing
by: Bossemeyer, Leonie, et al.
Published: (2025)

Is CLIP ideal? No. Can we fix it? Yes!
by: Kang, Raphi, et al.
Published: (2025)

A Rapid Test for Accuracy and Bias of Face Recognition Technology
by: Knott, Manuel, et al.
Published: (2025)

Learning Keypoints for Multi-Agent Behavior Analysis using Self-Supervision
by: Khalil, Daniel, et al.
Published: (2024)

Anchored Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models
by: Hassan, Mariam, et al.
Published: (2025)

You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction
by: Lawrence, Logan, et al.
Published: (2025)

SonarSplat: Novel View Synthesis of Imaging Sonar via Gaussian Splatting
by: Sethuraman, Advaith V., et al.
Published: (2025)

Visually Consistent Hierarchical Image Classification
by: Park, Seulki, et al.
Published: (2024)

Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval
by: Balloli, Vaibhav, et al.
Published: (2024)

Spatial-Temporal Graph Mamba for Music-Guided Dance Video Synthesis
by: Tang, Hao, et al.
Published: (2025)

TrackMAE: Video Representation Learning via Track Mask and Predict
by: Vandeghen, Renaud, et al.
Published: (2026)

SONIC: Sonar Image Correspondence using Pose Supervised Learning for Imaging Sonars
by: Gode, Samiran, et al.
Published: (2023)

Counting to Four is still a Chore for VLMs
by: Anh, Duy Le Dinh, et al.
Published: (2026)