Saved in:
| Main Author: | Wolf-Monheim, Friedrich |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.07756 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks
by: Wolf-Monheim, Friedrich
Published: (2024)
by: Wolf-Monheim, Friedrich
Published: (2024)
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
by: Fan, Congyi, et al.
Published: (2025)
by: Fan, Congyi, et al.
Published: (2025)
Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks
by: Moussa, Denise, et al.
Published: (2022)
by: Moussa, Denise, et al.
Published: (2022)
Deep Active Audio Feature Learning in Resource-Constrained Environments
by: Mohaimenuzzaman, Md, et al.
Published: (2023)
by: Mohaimenuzzaman, Md, et al.
Published: (2023)
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
by: Jung, Jongmin, et al.
Published: (2025)
by: Jung, Jongmin, et al.
Published: (2025)
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
by: Majumder, Sagnik, et al.
Published: (2023)
by: Majumder, Sagnik, et al.
Published: (2023)
VGGSounder: Audio-Visual Evaluations for Foundation Models
by: Zverev, Daniil, et al.
Published: (2025)
by: Zverev, Daniil, et al.
Published: (2025)
Audio-Vision Contrastive Learning for Phonological Class Recognition
by: Liu, Daiqi, et al.
Published: (2025)
by: Liu, Daiqi, et al.
Published: (2025)
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
by: Atito, Sara, et al.
Published: (2022)
by: Atito, Sara, et al.
Published: (2022)
Improving Acoustic Scene Classification with City Features
by: Cai, Yiqiang, et al.
Published: (2025)
by: Cai, Yiqiang, et al.
Published: (2025)
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
by: Rouditchenko, Andrew, et al.
Published: (2024)
by: Rouditchenko, Andrew, et al.
Published: (2024)
MAVERIX: Multimodal Audio-Visual Evaluation and Recognition IndeX
by: Xie, Liuyue, et al.
Published: (2025)
by: Xie, Liuyue, et al.
Published: (2025)
Compressing Quaternion Convolutional Neural Networks for Audio Classification
by: Singh, Arshdeep, et al.
Published: (2025)
by: Singh, Arshdeep, et al.
Published: (2025)
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
by: Vilaca, Luis, et al.
Published: (2024)
by: Vilaca, Luis, et al.
Published: (2024)
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
by: Aneja, Shivangi, et al.
Published: (2023)
by: Aneja, Shivangi, et al.
Published: (2023)
Towards Reliable Audio Deepfake Attribution and Model Recognition: A Multi-Level Autoencoder-Based Framework
by: Di Pierno, Andrea, et al.
Published: (2025)
by: Di Pierno, Andrea, et al.
Published: (2025)
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
by: Pascual, Santiago, et al.
Published: (2024)
by: Pascual, Santiago, et al.
Published: (2024)
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
by: Liang, Yunming, et al.
Published: (2025)
by: Liang, Yunming, et al.
Published: (2025)
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
by: Chen, Tianxiang, et al.
Published: (2024)
by: Chen, Tianxiang, et al.
Published: (2024)
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
by: Oorloff, Trevine, et al.
Published: (2024)
by: Oorloff, Trevine, et al.
Published: (2024)
Raw Audio Classification with Cosine Convolutional Neural Network (CosCovNN)
by: Haque, Kazi Nazmul, et al.
Published: (2024)
by: Haque, Kazi Nazmul, et al.
Published: (2024)
SAVE: Segment Audio-Visual Easy way using Segment Anything Model
by: Nguyen, Khanh-Binh, et al.
Published: (2024)
by: Nguyen, Khanh-Binh, et al.
Published: (2024)
From Vision to Sound: Advancing Audio Anomaly Detection with Vision-Based Algorithms
by: Barusco, Manuel, et al.
Published: (2025)
by: Barusco, Manuel, et al.
Published: (2025)
Enhancing Lie Detection Accuracy: A Comparative Study of Classic ML, CNN, and GCN Models using Audio-Visual Features
by: Abdelwahab, Abdelrahman, et al.
Published: (2024)
by: Abdelwahab, Abdelrahman, et al.
Published: (2024)
Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities
by: Li, Yidi, et al.
Published: (2024)
by: Li, Yidi, et al.
Published: (2024)
CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation
by: Chen, Yuanhong, et al.
Published: (2025)
by: Chen, Yuanhong, et al.
Published: (2025)
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
by: Liu, Chen, et al.
Published: (2025)
by: Liu, Chen, et al.
Published: (2025)
OmniAudio: Generating Spatial Audio from 360-Degree Video
by: Liu, Huadai, et al.
Published: (2025)
by: Liu, Huadai, et al.
Published: (2025)
Classifying Shelf Life Quality of Pineapples by Combining Audio and Visual Features
by: Jiang, Yi-Lu, et al.
Published: (2025)
by: Jiang, Yi-Lu, et al.
Published: (2025)
Enriching Multimodal Sentiment Analysis through Textual Emotional Descriptions of Visual-Audio Content
by: Wu, Sheng, et al.
Published: (2024)
by: Wu, Sheng, et al.
Published: (2024)
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
by: Yang, Qi, et al.
Published: (2023)
by: Yang, Qi, et al.
Published: (2023)
High-Quality Visually-Guided Sound Separation from Diverse Categories
by: Huang, Chao, et al.
Published: (2023)
by: Huang, Chao, et al.
Published: (2023)
DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation
by: Tian, Jingqi, et al.
Published: (2025)
by: Tian, Jingqi, et al.
Published: (2025)
Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features
by: Neururer, Daniel, et al.
Published: (2023)
by: Neururer, Daniel, et al.
Published: (2023)
Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks
by: Erattakulangara, Subin, et al.
Published: (2025)
by: Erattakulangara, Subin, et al.
Published: (2025)
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
by: Liu, Kai, et al.
Published: (2025)
by: Liu, Kai, et al.
Published: (2025)
Sounding Highlights: Dual-Pathway Audio Encoders for Audio-Visual Video Highlight Detection
by: Joo, Seohyun, et al.
Published: (2026)
by: Joo, Seohyun, et al.
Published: (2026)
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
by: Zhu, Wentao
Published: (2024)
by: Zhu, Wentao
Published: (2024)
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
by: Zhang, Haomin, et al.
Published: (2025)
by: Zhang, Haomin, et al.
Published: (2025)
Similar Items
-
Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks
by: Wolf-Monheim, Friedrich
Published: (2024) -
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
by: Fan, Congyi, et al.
Published: (2025) -
Towards Unconstrained Audio Splicing Detection and Localization with Neural Networks
by: Moussa, Denise, et al.
Published: (2022) -
Deep Active Audio Feature Learning in Resource-Constrained Environments
by: Mohaimenuzzaman, Md, et al.
Published: (2023) -
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio
by: Jung, Jongmin, et al.
Published: (2025)