Saved in:
| Main Authors: | Pandey, Amitesh, Arifdjanov, Jafarbek, Tiwari, Ansh |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.12083 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
by: Rong, Yan, et al.
Published: (2025)
by: Rong, Yan, et al.
Published: (2025)
Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)
by: Li, Haowen, et al.
Published: (2025)
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023)
by: Nzeyimana, Antoine
Published: (2023)
Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026)
by: Niizumi, Daisuke, et al.
Published: (2026)
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)
by: Kang, Taein, et al.
Published: (2024)
Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)
by: Adelson, Trevor, et al.
Published: (2026)
Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
by: Zhang, Pengfei, et al.
Published: (2026)
by: Zhang, Pengfei, et al.
Published: (2026)
Noise-Robust Keyword Spotting through Self-supervised Pretraining
by: Mørk, Jacob, et al.
Published: (2024)
by: Mørk, Jacob, et al.
Published: (2024)
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
by: Bovbjerg, Holger Severin, et al.
Published: (2023)
by: Bovbjerg, Holger Severin, et al.
Published: (2023)
Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
by: Bovbjerg, Holger Severin, et al.
Published: (2025)
by: Bovbjerg, Holger Severin, et al.
Published: (2025)
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
by: Bovbjerg, Holger Severin, et al.
Published: (2025)
by: Bovbjerg, Holger Severin, et al.
Published: (2025)
Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)
by: Sun, Qiyang, et al.
Published: (2024)
Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)
by: Agarwal, Rohan
Published: (2025)
Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)
by: Jahanbin, Peyman
Published: (2025)
Holon: a cybernetic interface for bio-semiotics
by: McCormack, Jon, et al.
Published: (2024)
by: McCormack, Jon, et al.
Published: (2024)
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
A Multimodal Symphony: Integrating Taste and Sound through Generative AI
by: Spanio, Matteo, et al.
Published: (2025)
by: Spanio, Matteo, et al.
Published: (2025)
GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)
by: Bhattacharjee, Aditya, et al.
Published: (2024)
Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)
by: Dhiman, Jai
Published: (2026)
Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)
by: Bhattacharjee, Aditya, et al.
Published: (2025)
Sequence-to-sequence models in peer-to-peer learning: A practical application
by: Šajina, Robert, et al.
Published: (2024)
by: Šajina, Robert, et al.
Published: (2024)
Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)
by: de Brito, Daniel Oliveira, et al.
Published: (2025)
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
by: Ferreira, Alexandre R., et al.
Published: (2023)
by: Ferreira, Alexandre R., et al.
Published: (2023)
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
by: Wan, Zhen, et al.
Published: (2026)
by: Wan, Zhen, et al.
Published: (2026)
ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio
by: Poltronieri, Andrea, et al.
Published: (2024)
by: Poltronieri, Andrea, et al.
Published: (2024)
Symbolic Audio Classification via Modal Decision Tree Learning
by: Marzano, Enrico, et al.
Published: (2025)
by: Marzano, Enrico, et al.
Published: (2025)
Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)
by: Liu, Yucheng, et al.
Published: (2025)
Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)
by: Aristorenas, Aris J.
Published: (2024)
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
PodAgent: A Comprehensive Framework for Podcast Generation
by: Xiao, Yujia, et al.
Published: (2025)
by: Xiao, Yujia, et al.
Published: (2025)
Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
by: Zhou, Jinxing, et al.
Published: (2025)
by: Zhou, Jinxing, et al.
Published: (2025)
Learning velocity model for complex media with deep convolutional neural networks
by: Stankevich, A., et al.
Published: (2021)
by: Stankevich, A., et al.
Published: (2021)
A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration
by: Selvamani, Shaja Arul, et al.
Published: (2025)
by: Selvamani, Shaja Arul, et al.
Published: (2025)
Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection
by: Cao, Xinwei, et al.
Published: (2026)
by: Cao, Xinwei, et al.
Published: (2026)
Decoding Phone Pairs from MEG Signals Across Speech Modalities
by: de Zuazo, Xabier, et al.
Published: (2025)
by: de Zuazo, Xabier, et al.
Published: (2025)
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)
by: Tsangko, Iosif, et al.
Published: (2025)
Musical Agent Systems: MACAT and MACataRT
by: Lee, Keon Ju M., et al.
Published: (2025)
by: Lee, Keon Ju M., et al.
Published: (2025)
Spoken Conversational Agents with Large Language Models
by: Yang, Chao-Han Huck, et al.
Published: (2025)
by: Yang, Chao-Han Huck, et al.
Published: (2025)
"I made this (sort of)": Negotiating authorship, confronting fraudulence, and exploring new musical spaces with prompt-based AI music generation
by: Sturm, Bob L. T.
Published: (2025)
by: Sturm, Bob L. T.
Published: (2025)
Similar Items
-
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
by: Rong, Yan, et al.
Published: (2025) -
Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025) -
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023) -
Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026) -
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)