:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Klein, Benjamin, Rahman, Kazi Ruslan, Ghose, Sanchita
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2604.23909
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

A Visually Impaired Assistance Benchmark for VLM-as-a-Judge Evaluation
von: Zhao, Yi, et al.
Veröffentlicht: (2026)

Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance
von: Wang, Xiangxiang, et al.
Veröffentlicht: (2025)

VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
von: Zhao, Yi, et al.
Veröffentlicht: (2024)

LINK: Adaptive Modality Interaction for Audio-Visual Video Parsing
von: Wang, Langyu, et al.
Veröffentlicht: (2024)

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
von: Wang, Xingrui, et al.
Veröffentlicht: (2025)

Vision-based Wearable Steering Assistance for People with Impaired Vision in Jogging
von: Liu, Xiaotong, et al.
Veröffentlicht: (2024)

AI-Driven Smartphone Solution for Digitizing Rapid Diagnostic Test Kits and Enhancing Accessibility for the Visually Impaired
von: Dastagir, R. B., et al.
Veröffentlicht: (2024)

GAITGen: Disentangled Motion-Pathology Impaired Gait Generative Model -- Bringing Motion Generation to the Clinical Domain
von: Adeli, Vida, et al.
Veröffentlicht: (2025)

Video Generation with Learned Action Prior
von: Sarkar, Meenakshi, et al.
Veröffentlicht: (2024)

3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
von: Fang, Zhixue, et al.
Veröffentlicht: (2026)

Video Object Segmentation-Aware Audio Generation
von: Viertola, Ilpo, et al.
Veröffentlicht: (2025)

MUG: Pseudo Labeling Augmented Audio-Visual Mamba Network for Audio-Visual Video Parsing
von: Wang, Langyu, et al.
Veröffentlicht: (2025)

Efficient Audio-Visual Fusion for Video Classification
von: Awan, Mahrukh, et al.
Veröffentlicht: (2024)

A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters
von: Desai, Shasvat, et al.
Veröffentlicht: (2025)

Efficient Motion-Aware Video MLLM
von: Zhao, Zijia, et al.
Veröffentlicht: (2025)

Motion-Aware Video Frame Interpolation
von: Han, Pengfei, et al.
Veröffentlicht: (2024)

MotionBooth: Motion-Aware Customized Text-to-Video Generation
von: Wu, Jianzong, et al.
Veröffentlicht: (2024)

eMotions: A Large-Scale Dataset and Audio-Visual Fusion Network for Emotion Analysis in Short-form Videos
von: Wu, Xuecheng, et al.
Veröffentlicht: (2025)

MotiMotion: Motion-Controlled Video Generation with Visual Reasoning
von: Hsin-Ying, Lee, et al.
Veröffentlicht: (2026)

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
von: Yao, Linli, et al.
Veröffentlicht: (2026)

AI-based Wearable Vision Assistance System for the Visually Impaired: Integrating Real-Time Object Recognition and Contextual Understanding Using Large Vision-Language Models
von: Baig, Mirza Samad Ahmed, et al.
Veröffentlicht: (2024)

TextDiffuser-RL: Efficient and Robust Text Layout Optimization for High-Fidelity Text-to-Image Synthesis
von: Rahman, Kazi Mahathir, et al.
Veröffentlicht: (2025)

Semantics-Aware Human Motion Generation from Audio Instructions
von: Wang, Zi-An, et al.
Veröffentlicht: (2025)

Tuning-free Visual Effect Transfer across Videos
von: Jones, Maxwell, et al.
Veröffentlicht: (2026)

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
von: Wang, Kai, et al.
Veröffentlicht: (2024)

SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation
von: Tan, Shuai, et al.
Veröffentlicht: (2025)

Adaptive Multi-Scale Channel-Spatial Attention Aggregation Framework for 3D Indoor Semantic Scene Completion Toward Assisting Visually Impaired
von: He, Qi, et al.
Veröffentlicht: (2026)

Turn-by-Turn Indoor Navigation for the Visually Impaired
von: Srinivasaiah, Santosh, et al.
Veröffentlicht: (2024)

SLAM for Visually Impaired People: a Survey
von: Bamdad, Marziyeh, et al.
Veröffentlicht: (2022)

PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing
von: Munir, Mustafa, et al.
Veröffentlicht: (2025)

Semantic-Aware Adaptive Visual Memory for Streaming Video Understanding
von: Wu, Hang, et al.
Veröffentlicht: (2026)

Question-Aware Gaussian Experts for Audio-Visual Question Answering
von: Kim, Hongyeob, et al.
Veröffentlicht: (2025)

MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization
von: Zhang, Zhexin, et al.
Veröffentlicht: (2026)

Don't Let the Video Speak: Audio-Contrastive Preference Optimization for Audio-Visual Language Models
von: Baid, Ami, et al.
Veröffentlicht: (2026)

Relevance-guided Audio Visual Fusion for Video Saliency Prediction
von: Yu, Li, et al.
Veröffentlicht: (2024)

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
von: Yun, Heeseung, et al.
Veröffentlicht: (2024)

Moaw: Unleashing Motion Awareness for Video Diffusion Models
von: Zhang, Tianqi, et al.
Veröffentlicht: (2026)

iMOVE: Instance-Motion-Aware Video Understanding
von: Li, Jiaze, et al.
Veröffentlicht: (2025)

Space-Aware Instruction Tuning: Dataset and Benchmark for Guide Dog Robots Assisting the Visually Impaired
von: Han, ByungOk, et al.
Veröffentlicht: (2025)

Zero-Shot Video Restoration and Enhancement with Assistance of Video Diffusion Models
von: Cao, Cong, et al.
Veröffentlicht: (2026)