Saved in:
| Main Authors: | Bellos, Filippos, Li, Yayuan, Shu, Cary, Day, Ruey, Siskind, Jeffrey M., Corso, Jason J. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.18374 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Consistent Long-Term Pose Generation
by: Li, Yayuan, et al.
Published: (2025)
by: Li, Yayuan, et al.
Published: (2025)
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025)
by: Li, Yayuan, et al.
Published: (2025)
EchoVQA: Enabling Conversational Assistance for Point-of-Care Cardiac Ultrasound
by: Bellos, Filippos, et al.
Published: (2026)
by: Bellos, Filippos, et al.
Published: (2026)
HANDI: Hand-Centric Text-and-Image Conditioned Video Generation
by: Li, Yayuan, et al.
Published: (2024)
by: Li, Yayuan, et al.
Published: (2024)
When to Think and When to Look: Uncertainty-Guided Lookback
by: Bi, Jing, et al.
Published: (2025)
by: Bi, Jing, et al.
Published: (2025)
Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?
by: Liang, Susan, et al.
Published: (2026)
by: Liang, Susan, et al.
Published: (2026)
Follow Your Heart: Landmark-Guided Transducer Pose Scoring for Point-of-Care Echocardiography
by: Guo, Zaiyang, et al.
Published: (2026)
by: Guo, Zaiyang, et al.
Published: (2026)
Substantial, Decomposable, and Invisible: Visual Context Misalignment in Instructional Videos for Physical Tasks
by: Li, Yayuan, et al.
Published: (2026)
by: Li, Yayuan, et al.
Published: (2026)
BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation
by: Wang, Miaowei, et al.
Published: (2026)
by: Wang, Miaowei, et al.
Published: (2026)
Measuring Physical Plausibility of 3D Human Poses Using Physics Simulation
by: Louis, Nathan, et al.
Published: (2025)
by: Louis, Nathan, et al.
Published: (2025)
VITRO: Vocabulary Inversion for Time-series Representation Optimization
by: Bellos, Filippos, et al.
Published: (2024)
by: Bellos, Filippos, et al.
Published: (2024)
Zero-Shot Coreset Selection via Iterative Subspace Sampling
by: Griffin, Brent A., et al.
Published: (2024)
by: Griffin, Brent A., et al.
Published: (2024)
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP
by: Li, Yayuan, et al.
Published: (2024)
by: Li, Yayuan, et al.
Published: (2024)
Taking Training Seriously: Human Guidance and Management-Based Regulation of Artificial Intelligence
by: Coglianese, Cary, et al.
Published: (2024)
by: Coglianese, Cary, et al.
Published: (2024)
Auto-Labeling Data for Object Detection
by: Griffin, Brent A., et al.
Published: (2025)
by: Griffin, Brent A., et al.
Published: (2025)
Class-wise Autoencoders Measure Classification Difficulty And Detect Label Mistakes
by: Marks, Jacob, et al.
Published: (2024)
by: Marks, Jacob, et al.
Published: (2024)
Embodied4C: Measuring What Matters for Embodied Vision-Language Navigation
by: Sohn, Tin Stribor, et al.
Published: (2025)
by: Sohn, Tin Stribor, et al.
Published: (2025)
R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space
by: Sohn, Tin Stribor, et al.
Published: (2025)
by: Sohn, Tin Stribor, et al.
Published: (2025)
SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
by: Sohn, Tin Stribor, et al.
Published: (2025)
by: Sohn, Tin Stribor, et al.
Published: (2025)
Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents
by: Veerabadran, Vijay, et al.
Published: (2025)
by: Veerabadran, Vijay, et al.
Published: (2025)
AIris: An AI-powered Wearable Assistive Device for the Visually Impaired
by: Brilli, Dionysia Danai, et al.
Published: (2024)
by: Brilli, Dionysia Danai, et al.
Published: (2024)
The Repeated-Stimulus Confound in Electroencephalography
by: Kilgallen, Jack A., et al.
Published: (2025)
by: Kilgallen, Jack A., et al.
Published: (2025)
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
by: Maiti, Shalini, et al.
Published: (2025)
by: Maiti, Shalini, et al.
Published: (2025)
ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop
by: Tsang, Clement Fuji, et al.
Published: (2026)
by: Tsang, Clement Fuji, et al.
Published: (2026)
Egocentric Co-Pilot: Web-Native Smart-Glasses Agents for Assistive Egocentric AI
by: Yang, Sicheng, et al.
Published: (2026)
by: Yang, Sicheng, et al.
Published: (2026)
MedUHIP: Towards Human-In-the-Loop Medical Segmentation
by: Zhu, Jiayuan, et al.
Published: (2024)
by: Zhu, Jiayuan, et al.
Published: (2024)
LoopViT: Scaling Visual ARC with Looped Transformers
by: Shu, Wen-Jie, et al.
Published: (2026)
by: Shu, Wen-Jie, et al.
Published: (2026)
RORem: Training a Robust Object Remover with Human-in-the-Loop
by: Li, Ruibin, et al.
Published: (2025)
by: Li, Ruibin, et al.
Published: (2025)
Towards Clinician-Preferred Segmentation: Leveraging Human-in-the-Loop for Test Time Adaptation in Medical Image Segmentation
by: Hu, Shishuai, et al.
Published: (2024)
by: Hu, Shishuai, et al.
Published: (2024)
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
by: Sohn, Tin Stribor, et al.
Published: (2025)
by: Sohn, Tin Stribor, et al.
Published: (2025)
Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing
by: Qin, Haoyun, et al.
Published: (2024)
by: Qin, Haoyun, et al.
Published: (2024)
Intentional Gesture: Deliver Your Intentions with Gestures for Speech
by: Liu, Pinxin, et al.
Published: (2025)
by: Liu, Pinxin, et al.
Published: (2025)
Robust Iris Centre Localisation for Assistive Eye-Gaze Tracking
by: Pathiranage, Nipun Sandamal Ranasekara, et al.
Published: (2024)
by: Pathiranage, Nipun Sandamal Ranasekara, et al.
Published: (2024)
Interactive Tracking: A Human-in-the-Loop Paradigm with Memory-Augmented Adaptation
by: Huang, Yuqing, et al.
Published: (2026)
by: Huang, Yuqing, et al.
Published: (2026)
TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving
by: Liu, Wenzhuo, et al.
Published: (2025)
by: Liu, Wenzhuo, et al.
Published: (2025)
Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
AgentSteerTTS: A Multi-Agent Closed-Loop Framework for Composite-Instruction Text-to-Speech
by: Kang, Bin, et al.
Published: (2026)
by: Kang, Bin, et al.
Published: (2026)
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
by: Chen, Yiming, et al.
Published: (2025)
by: Chen, Yiming, et al.
Published: (2025)
Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
by: Louis, Nathan, et al.
Published: (2021)
by: Louis, Nathan, et al.
Published: (2021)
Similar Items
-
Towards Consistent Long-Term Pose Generation
by: Li, Yayuan, et al.
Published: (2025) -
Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos
by: Li, Yayuan, et al.
Published: (2025) -
EchoVQA: Enabling Conversational Assistance for Point-of-Care Cardiac Ultrasound
by: Bellos, Filippos, et al.
Published: (2026) -
HANDI: Hand-Centric Text-and-Image Conditioned Video Generation
by: Li, Yayuan, et al.
Published: (2024) -
When to Think and When to Look: Uncertainty-Guided Lookback
by: Bi, Jing, et al.
Published: (2025)