:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pandey, Amitesh, Arifdjanov, Jafarbek, Tiwari, Ansh
Format:	Preprint
Published:	2025
Subjects:	Sound Multiagent Systems Audio and Speech Processing I.2.6
Online Access:	https://arxiv.org/abs/2506.12083
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
by: Rong, Yan, et al.
Published: (2025)

Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
by: Li, Haowen, et al.
Published: (2025)

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023)

Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026)

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
by: Adelson, Trevor, et al.
Published: (2026)

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
by: Zhang, Pengfei, et al.
Published: (2026)

Noise-Robust Keyword Spotting through Self-supervised Pretraining
by: Mørk, Jacob, et al.
Published: (2024)

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
by: Bovbjerg, Holger Severin, et al.
Published: (2023)

Learning Robust Spatial Representations from Binaural Audio through Feature Distillation
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
by: Bovbjerg, Holger Severin, et al.
Published: (2025)

Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)

Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)

Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)

Holon: a cybernetic interface for bio-semiotics
by: McCormack, Jon, et al.
Published: (2024)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

A Multimodal Symphony: Integrating Taste and Sound through Generative AI
by: Spanio, Matteo, et al.
Published: (2025)

GraFPrint: A GNN-Based Approach for Audio Identification
by: Bhattacharjee, Aditya, et al.
Published: (2024)

Audio Foundation Models Outperform Symbolic Representations for Piano Performance Evaluation
by: Dhiman, Jai
Published: (2026)

Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation
by: Bhattacharjee, Aditya, et al.
Published: (2025)

Sequence-to-sequence models in peer-to-peer learning: A practical application
by: Šajina, Robert, et al.
Published: (2024)

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
by: Ferreira, Alexandre R., et al.
Published: (2023)

Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
by: Wan, Zhen, et al.
Published: (2026)

ChordSync: Conformer-Based Alignment of Chord Annotations to Music Audio
by: Poltronieri, Andrea, et al.
Published: (2024)

Symbolic Audio Classification via Modal Decision Tree Learning
by: Marzano, Enrico, et al.
Published: (2025)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)

PodAgent: A Comprehensive Framework for Podcast Generation
by: Xiao, Yujia, et al.
Published: (2025)

Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation
by: Zhou, Jinxing, et al.
Published: (2025)

Learning velocity model for complex media with deep convolutional neural networks
by: Stankevich, A., et al.
Published: (2021)

A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration
by: Selvamani, Shaja Arul, et al.
Published: (2025)

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection
by: Cao, Xinwei, et al.
Published: (2026)

Decoding Phone Pairs from MEG Signals Across Speech Modalities
by: de Zuazo, Xabier, et al.
Published: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)

Musical Agent Systems: MACAT and MACataRT
by: Lee, Keon Ju M., et al.
Published: (2025)

Spoken Conversational Agents with Large Language Models
by: Yang, Chao-Han Huck, et al.
Published: (2025)

"I made this (sort of)": Negotiating authorship, confronting fraudulence, and exploring new musical spaces with prompt-based AI music generation
by: Sturm, Bob L. T.
Published: (2025)