:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Robinson, David, Gupta, Animesh, Clark, Elizabeth, Melnik, Olga, Fu, Qiushi, Shah, Mubarak
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.29101
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

STROKEVISION-BENCH: A Multimodal Video And 2D Pose Benchmark For Tracking Stroke Recovery
by: Robinson, David, et al.
Published: (2025)

From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos
by: Gupta, Animesh, et al.
Published: (2025)

Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs
by: Narnaware, Vishal, et al.
Published: (2026)

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)

Cross-View Open-Vocabulary Object Detection in Aerial Imagery
by: Kini, Jyoti, et al.
Published: (2025)

Development, Measurement Properties and Reference Values of the Upper Extremity Motor Coordination Test: A New Motor Coordination Test of the Upper Limbs
by: João Victor Drummond Ribeiro, et al.
Published: (2025)

StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales
by: Siddiqui, Nyle, et al.
Published: (2025)

The Telephone Game: Evaluating Semantic Drift in Unified Models
by: Mollah, Sabbir, et al.
Published: (2025)

PTQ4DiT: Post-training Quantization for Diffusion Transformers
by: Wu, Junyi, et al.
Published: (2024)

Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety
by: Kim, Younggun, et al.
Published: (2025)

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
by: Chen, Chen, et al.
Published: (2025)

BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models
by: Narnaware, Vishal, et al.
Published: (2025)

VidTAG: Temporally Aligned Video to GPS Geolocalization with Denoising Sequence Prediction at a Global Scale
by: Kulkarni, Parth Parag, et al.
Published: (2026)

Diffusion Models in Vision: A Survey
by: Croitoru, Florinel-Alin, et al.
Published: (2022)

TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval
by: Shatwell, David G., et al.
Published: (2026)

Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace
by: Donahue, Seth, et al.
Published: (2026)

Learnability-Guided Diffusion for Dataset Distillation
by: Chan-Santiago, Jeffrey A., et al.
Published: (2026)

PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
by: Li, Kunyang, et al.
Published: (2026)

TimeLogic: A Temporal Logic Benchmark for Video QA
by: Swetha, Sirnam, et al.
Published: (2025)

Prevalence of Upper Extremity Distal Predominant Weakness Pattern in Chronic Stroke
by: Baxter, Ryan H., et al.
Published: (2025)

Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data
by: Kumar, Aakash, et al.
Published: (2024)

ViLL-E: Video LLM Embeddings for Retrieval
by: Gupta, Rohit, et al.
Published: (2026)

Searching for Uncollected Litter with Computer Vision
by: Hernandez, Julian, et al.
Published: (2022)

Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
by: Li, Kunyang, et al.
Published: (2026)

Weakly-Supervised Spatiotemporal Anomaly Detection
by: Gianchandani, Urvi, et al.
Published: (2026)

Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
by: Fu, Shuai, et al.
Published: (2024)

Test-Time Hinting for Black-Box Vision-Language Models
by: Hou, Kaihua, et al.
Published: (2026)

Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
by: Fioresi, Joseph, et al.
Published: (2025)

GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
by: Pillai, Manu S, et al.
Published: (2024)

SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
by: Yousaf, Adeel, et al.
Published: (2025)

GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space
by: Shatwell, David G., et al.
Published: (2025)

ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
by: Fioresi, Joseph, et al.
Published: (2025)

Exploring Local Memorization in Diffusion Models via Bright Ending Attention
by: Chen, Chen, et al.
Published: (2024)

CityGuessr: City-Level Video Geo-Localization on a Global Scale
by: Kulkarni, Parth Parag, et al.
Published: (2024)

Generative Physical AI in Vision: A Survey
by: Liu, Daochang, et al.
Published: (2025)

FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)

Surgical Triplet Recognition via Diffusion Model
by: Liu, Daochang, et al.
Published: (2024)

Unsupervised Detection of Post-Stroke Brain Abnormalities
by: Mahé, Youwan, et al.
Published: (2025)

Computer-Aided Multi-Stroke Character Simplification by Stroke Removal
by: Ishiyama, Ryo, et al.
Published: (2025)

Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review
by: Constable, Merryn D., et al.
Published: (2024)