Saved in:
| Main Authors: | Li, Yichun, Li, Shuanglin, Naqvi, Syed Mohsen |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.02243 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ADHD diagnosis based on action characteristics recorded in videos using machine learning
by: Li, Yichun, et al.
Published: (2024)
by: Li, Yichun, et al.
Published: (2024)
Action-Based ADHD Diagnosis in Video
by: Li, Yichun, et al.
Published: (2024)
by: Li, Yichun, et al.
Published: (2024)
A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio
by: Li, Shuanglin, et al.
Published: (2025)
by: Li, Shuanglin, et al.
Published: (2025)
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
by: Awan, Mahrukh, et al.
Published: (2024)
by: Awan, Mahrukh, et al.
Published: (2024)
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
by: Yu, Li, et al.
Published: (2024)
by: Yu, Li, et al.
Published: (2024)
Efficient Audio-Visual Fusion for Video Classification
by: Awan, Mahrukh, et al.
Published: (2024)
by: Awan, Mahrukh, et al.
Published: (2024)
FusionBERT: Multi-View Image-3D Retrieval via Cross-Attention Visual Fusion and Normal-Aware 3D Encoder
by: Li, Wei, et al.
Published: (2026)
by: Li, Wei, et al.
Published: (2026)
AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies
by: Wang, Rui, et al.
Published: (2024)
by: Wang, Rui, et al.
Published: (2024)
STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking
by: Li, Yidi, et al.
Published: (2024)
by: Li, Yidi, et al.
Published: (2024)
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection
by: Jeong, Sungheon, et al.
Published: (2025)
by: Jeong, Sungheon, et al.
Published: (2025)
TriFusion-SR: Joint Tri-Modal Medical Image Fusion and SR
by: Dharejo, Fayaz Ali, et al.
Published: (2026)
by: Dharejo, Fayaz Ali, et al.
Published: (2026)
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
by: Hooshanfar, Kiana, et al.
Published: (2025)
by: Hooshanfar, Kiana, et al.
Published: (2025)
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
by: Liu, Chen, et al.
Published: (2025)
by: Liu, Chen, et al.
Published: (2025)
SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection
by: Liang, Yachao, et al.
Published: (2025)
by: Liang, Yachao, et al.
Published: (2025)
FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units
by: Wang, Jian, et al.
Published: (2025)
by: Wang, Jian, et al.
Published: (2025)
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies
by: Li, Xiwen, et al.
Published: (2024)
by: Li, Xiwen, et al.
Published: (2024)
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues
by: Li, Xiwen, et al.
Published: (2025)
by: Li, Xiwen, et al.
Published: (2025)
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
by: Oorloff, Trevine, et al.
Published: (2024)
by: Oorloff, Trevine, et al.
Published: (2024)
Embedding and Enriching Explicit Semantics for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2024)
by: Dong, Neng, et al.
Published: (2024)
Inconsistency-Aware Cross-Attention for Audio-Visual Fusion in Dimensional Emotion Recognition
by: Rajasekhar, G, et al.
Published: (2024)
by: Rajasekhar, G, et al.
Published: (2024)
Diverse Semantics-Guided Feature Alignment and Decoupling for Visible-Infrared Person Re-Identification
by: Dong, Neng, et al.
Published: (2025)
by: Dong, Neng, et al.
Published: (2025)
From Waveforms to Pixels: A Survey on Audio-Visual Segmentation
by: Li, Jia, et al.
Published: (2025)
by: Li, Jia, et al.
Published: (2025)
LAVA: Layered Audio-Visual Anti-tampering Watermarking for Robust Deepfake Detection and Localization
by: Zeng, Bokang, et al.
Published: (2026)
by: Zeng, Bokang, et al.
Published: (2026)
Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection
by: Peng, Jielun, et al.
Published: (2026)
by: Peng, Jielun, et al.
Published: (2026)
ShapeSpeak: Body Shape-Aware Textual Alignment for Visible-Infrared Person Re-Identification
by: Yan, Shuanglin, et al.
Published: (2025)
by: Yan, Shuanglin, et al.
Published: (2025)
Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models
by: Tang, Hao, et al.
Published: (2026)
by: Tang, Hao, et al.
Published: (2026)
TransMatch: A Transfer-Learning Framework for Defect Detection in Laser Powder Bed Fusion Additive Manufacturing
by: Ilani, Mohsen Asghari, et al.
Published: (2025)
by: Ilani, Mohsen Asghari, et al.
Published: (2025)
EgoVIS@CVPR: PAIR-Net: Enhancing Egocentric Speaker Detection via Pretrained Audio-Visual Fusion and Alignment Loss
by: Wang, Yu, et al.
Published: (2025)
by: Wang, Yu, et al.
Published: (2025)
Handcrafted Feature Fusion for Reliable Detection of AI-Generated Images
by: Nirob, Syed Mehedi Hasan, et al.
Published: (2026)
by: Nirob, Syed Mehedi Hasan, et al.
Published: (2026)
Dynamic Multi-Target Fusion for Efficient Audio-Visual Navigation
by: Yu, Yinfeng, et al.
Published: (2025)
by: Yu, Yinfeng, et al.
Published: (2025)
A Synchronized Audio-Visual Multi-View Capture System
by: Shi, Xiangwei, et al.
Published: (2026)
by: Shi, Xiangwei, et al.
Published: (2026)
DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction
by: Zhao, Li, et al.
Published: (2024)
by: Zhao, Li, et al.
Published: (2024)
Real-Time Idling Vehicles Detection using Combined Audio-Visual Deep Learning
by: Li, Xiwen, et al.
Published: (2023)
by: Li, Xiwen, et al.
Published: (2023)
Dynamic Inter-Class Confusion-Aware Encoder for Audio-Visual Fusion in Human Activity Recognition
by: Cong, Kaixuan, et al.
Published: (2025)
by: Cong, Kaixuan, et al.
Published: (2025)
Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
by: Li, Wenrui, et al.
Published: (2024)
by: Li, Wenrui, et al.
Published: (2024)
A Multi-Mode Structured Light 3D Imaging System with Multi-Source Information Fusion for Underwater Pipeline Detection
by: Hu, Qinghan, et al.
Published: (2025)
by: Hu, Qinghan, et al.
Published: (2025)
Implicit Counterfactual Learning for Audio-Visual Segmentation
by: Zha, Mingfeng, et al.
Published: (2025)
by: Zha, Mingfeng, et al.
Published: (2025)
LMBF-Net: A Lightweight Multipath Bidirectional Focal Attention Network for Multifeatures Segmentation
by: Khan, Tariq M, et al.
Published: (2024)
by: Khan, Tariq M, et al.
Published: (2024)
An Efficient and Streaming Audio Visual Active Speaker Detection System
by: Kundu, Arnav, et al.
Published: (2024)
by: Kundu, Arnav, et al.
Published: (2024)
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image Segmentation
by: Iqbal, Shahzaib, et al.
Published: (2024)
by: Iqbal, Shahzaib, et al.
Published: (2024)
Similar Items
-
ADHD diagnosis based on action characteristics recorded in videos using machine learning
by: Li, Yichun, et al.
Published: (2024) -
Action-Based ADHD Diagnosis in Video
by: Li, Yichun, et al.
Published: (2024) -
A Frequency-aware Augmentation Network for Mental Disorders Assessment from Audio
by: Li, Shuanglin, et al.
Published: (2025) -
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
by: Awan, Mahrukh, et al.
Published: (2024) -
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
by: Yu, Li, et al.
Published: (2024)