Saved in:
| Main Authors: | Robinson, David, Gupta, Animesh, Clark, Elizabeth, Melnik, Olga, Fu, Qiushi, Shah, Mubarak |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.29101 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
STROKEVISION-BENCH: A Multimodal Video And 2D Pose Benchmark For Tracking Stroke Recovery
by: Robinson, David, et al.
Published: (2025)
by: Robinson, David, et al.
Published: (2025)
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos
by: Gupta, Animesh, et al.
Published: (2025)
by: Gupta, Animesh, et al.
Published: (2025)
Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs
by: Narnaware, Vishal, et al.
Published: (2026)
by: Narnaware, Vishal, et al.
Published: (2026)
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024)
by: Kang, Weitai, et al.
Published: (2024)
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
by: Kini, Jyoti, et al.
Published: (2025)
by: Kini, Jyoti, et al.
Published: (2025)
Development, Measurement Properties and Reference Values of the Upper Extremity Motor Coordination Test: A New Motor Coordination Test of the Upper Limbs
by: João Victor Drummond Ribeiro, et al.
Published: (2025)
by: João Victor Drummond Ribeiro, et al.
Published: (2025)
StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales
by: Siddiqui, Nyle, et al.
Published: (2025)
by: Siddiqui, Nyle, et al.
Published: (2025)
The Telephone Game: Evaluating Semantic Drift in Unified Models
by: Mollah, Sabbir, et al.
Published: (2025)
by: Mollah, Sabbir, et al.
Published: (2025)
PTQ4DiT: Post-training Quantization for Diffusion Transformers
by: Wu, Junyi, et al.
Published: (2024)
by: Wu, Junyi, et al.
Published: (2024)
Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety
by: Kim, Younggun, et al.
Published: (2025)
by: Kim, Younggun, et al.
Published: (2025)
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models
by: Narnaware, Vishal, et al.
Published: (2025)
by: Narnaware, Vishal, et al.
Published: (2025)
VidTAG: Temporally Aligned Video to GPS Geolocalization with Denoising Sequence Prediction at a Global Scale
by: Kulkarni, Parth Parag, et al.
Published: (2026)
by: Kulkarni, Parth Parag, et al.
Published: (2026)
Diffusion Models in Vision: A Survey
by: Croitoru, Florinel-Alin, et al.
Published: (2022)
by: Croitoru, Florinel-Alin, et al.
Published: (2022)
TIGeR: A Unified Framework for Time, Images and Geo-location Retrieval
by: Shatwell, David G., et al.
Published: (2026)
by: Shatwell, David G., et al.
Published: (2026)
Monocular Markerless Motion Capture Enables Quantitative Assessment of Upper Extremity Reachable Workspace
by: Donahue, Seth, et al.
Published: (2026)
by: Donahue, Seth, et al.
Published: (2026)
Learnability-Guided Diffusion for Dataset Distillation
by: Chan-Santiago, Jeffrey A., et al.
Published: (2026)
by: Chan-Santiago, Jeffrey A., et al.
Published: (2026)
PackCache: A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
by: Li, Kunyang, et al.
Published: (2026)
by: Li, Kunyang, et al.
Published: (2026)
TimeLogic: A Temporal Logic Benchmark for Video QA
by: Swetha, Sirnam, et al.
Published: (2025)
by: Swetha, Sirnam, et al.
Published: (2025)
Prevalence of Upper Extremity Distal Predominant Weakness Pattern in Chronic Stroke
by: Baxter, Ryan H., et al.
Published: (2025)
by: Baxter, Ryan H., et al.
Published: (2025)
Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data
by: Kumar, Aakash, et al.
Published: (2024)
by: Kumar, Aakash, et al.
Published: (2024)
ViLL-E: Video LLM Embeddings for Retrieval
by: Gupta, Rohit, et al.
Published: (2026)
by: Gupta, Rohit, et al.
Published: (2026)
Searching for Uncollected Litter with Computer Vision
by: Hernandez, Julian, et al.
Published: (2022)
by: Hernandez, Julian, et al.
Published: (2022)
Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion
by: Li, Kunyang, et al.
Published: (2026)
by: Li, Kunyang, et al.
Published: (2026)
Weakly-Supervised Spatiotemporal Anomaly Detection
by: Gianchandani, Urvi, et al.
Published: (2026)
by: Gianchandani, Urvi, et al.
Published: (2026)
Nemesis: Normalizing the Soft-prompt Vectors of Vision-Language Models
by: Fu, Shuai, et al.
Published: (2024)
by: Fu, Shuai, et al.
Published: (2024)
Test-Time Hinting for Black-Box Vision-Language Models
by: Hou, Kaihua, et al.
Published: (2026)
by: Hou, Kaihua, et al.
Published: (2026)
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding
by: Fioresi, Joseph, et al.
Published: (2025)
by: Fioresi, Joseph, et al.
Published: (2025)
GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
by: Pillai, Manu S, et al.
Published: (2024)
by: Pillai, Manu S, et al.
Published: (2024)
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge
by: Yousaf, Adeel, et al.
Published: (2025)
by: Yousaf, Adeel, et al.
Published: (2025)
GT-Loc: Unifying When and Where in Images Through a Joint Embedding Space
by: Shatwell, David G., et al.
Published: (2025)
by: Shatwell, David G., et al.
Published: (2025)
ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition
by: Fioresi, Joseph, et al.
Published: (2025)
by: Fioresi, Joseph, et al.
Published: (2025)
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
CityGuessr: City-Level Video Geo-Localization on a Global Scale
by: Kulkarni, Parth Parag, et al.
Published: (2024)
by: Kulkarni, Parth Parag, et al.
Published: (2024)
Generative Physical AI in Vision: A Survey
by: Liu, Daochang, et al.
Published: (2025)
by: Liu, Daochang, et al.
Published: (2025)
FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)
by: Dave, Ishan Rajendrakumar, et al.
Published: (2024)
Surgical Triplet Recognition via Diffusion Model
by: Liu, Daochang, et al.
Published: (2024)
by: Liu, Daochang, et al.
Published: (2024)
Unsupervised Detection of Post-Stroke Brain Abnormalities
by: Mahé, Youwan, et al.
Published: (2025)
by: Mahé, Youwan, et al.
Published: (2025)
Computer-Aided Multi-Stroke Character Simplification by Stroke Removal
by: Ishiyama, Ryo, et al.
Published: (2025)
by: Ishiyama, Ryo, et al.
Published: (2025)
Enhancing Surgical Performance in Cardiothoracic Surgery with Innovations from Computer Vision and Artificial Intelligence: A Narrative Review
by: Constable, Merryn D., et al.
Published: (2024)
by: Constable, Merryn D., et al.
Published: (2024)
Similar Items
-
STROKEVISION-BENCH: A Multimodal Video And 2D Pose Benchmark For Tracking Stroke Recovery
by: Robinson, David, et al.
Published: (2025) -
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained Videos
by: Gupta, Animesh, et al.
Published: (2025) -
Seeing to Ground: Visual Attention for Hallucination-Resilient MDLLMs
by: Narnaware, Vishal, et al.
Published: (2026) -
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
by: Kang, Weitai, et al.
Published: (2024) -
Cross-View Open-Vocabulary Object Detection in Aerial Imagery
by: Kini, Jyoti, et al.
Published: (2025)