:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bellos, Filippos, Li, Yayuan, Shu, Cary, Day, Ruey, Siskind, Jeffrey M., Corso, Jason J.
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.18374
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Towards Consistent Long-Term Pose Generation
by: Li, Yayuan, et al.
Published: (2025)

Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025)

EchoVQA: Enabling Conversational Assistance for Point-of-Care Cardiac Ultrasound
by: Bellos, Filippos, et al.
Published: (2026)

HANDI: Hand-Centric Text-and-Image Conditioned Video Generation
by: Li, Yayuan, et al.
Published: (2024)

When to Think and When to Look: Uncertainty-Guided Lookback
by: Bi, Jing, et al.
Published: (2025)

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?
by: Liang, Susan, et al.
Published: (2026)

Follow Your Heart: Landmark-Guided Transducer Pose Scoring for Point-of-Care Echocardiography
by: Guo, Zaiyang, et al.
Published: (2026)

Substantial, Decomposable, and Invisible: Visual Context Misalignment in Instructional Videos for Physical Tasks
by: Li, Yayuan, et al.
Published: (2026)

BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation
by: Wang, Miaowei, et al.
Published: (2026)

Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
by: Louis, Nathan, et al.
Published: (2025)

VITRO: Vocabulary Inversion for Time-series Representation Optimization
by: Bellos, Filippos, et al.
Published: (2024)

Zero-Shot Coreset Selection via Iterative Subspace Sampling
by: Griffin, Brent A., et al.
Published: (2024)

@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
by: Jiang, Xin, et al.
Published: (2024)

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP
by: Li, Yayuan, et al.
Published: (2024)

Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence
by: Coglianese, Cary, et al.
Published: (2024)

Auto-Labeling Data for Object Detection
by: Griffin, Brent A., et al.
Published: (2025)

Class-wise Autoencoders Measure Classification Difficulty And Detect Label Mistakes
by: Marks, Jacob, et al.
Published: (2024)

Embodied4C: Measuring What Matters for Embodied Vision-Language Navigation
by: Sohn, Tin Stribor, et al.
Published: (2025)

R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space
by: Sohn, Tin Stribor, et al.
Published: (2025)

SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
by: Sohn, Tin Stribor, et al.
Published: (2025)

Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
by: Veerabadran, Vijay, et al.
Published: (2025)

AIris: An AI-powered Wearable Assistive Device for the Visually Impaired
by: Brilli, Dionysia Danai, et al.
Published: (2024)

The Repeated-Stimulus Confound in Electroencephalography
by: Kilgallen, Jack A., et al.
Published: (2025)

Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
by: Maiti, Shalini, et al.
Published: (2025)

ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop
by: Tsang, Clement Fuji, et al.
Published: (2026)

Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI
by: Yang, Sicheng, et al.
Published: (2026)

MedUHIP: Towards Human-In-the-Loop Medical Segmentation
by: Zhu, Jiayuan, et al.
Published: (2024)

LoopViT: Scaling Visual ARC with Looped Transformers
by: Shu, Wen-Jie, et al.
Published: (2026)

RORem: Training a Robust Object Remover with Human-in-the-Loop
by: Li, Ruibin, et al.
Published: (2025)

Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation
by: Hu, Shishuai, et al.
Published: (2024)

A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
by: Sohn, Tin Stribor, et al.
Published: (2025)

Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing
by: Qin, Haoyun, et al.
Published: (2024)

Intentional Gesture: Deliver Your Intentions with Gestures for Speech
by: Liu, Pinxin, et al.
Published: (2025)

Robust Iris Centre Localisation for Assistive Eye-Gaze Tracking
by: Pathiranage, Nipun Sandamal Ranasekara, et al.
Published: (2024)

Interactive Tracking: A Human-in-the-Loop Paradigm with Memory-Augmented Adaptation
by: Huang, Yuqing, et al.
Published: (2026)

TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving
by: Liu, Wenzhuo, et al.
Published: (2025)

Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation
by: Wang, Hao, et al.
Published: (2024)

AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech
by: Kang, Bin, et al.
Published: (2026)

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
by: Chen, Yiming, et al.
Published: (2025)

Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
by: Louis, Nathan, et al.
Published: (2021)