:: Library Catalog

Imatge de la portada

Guardat en:

Dades bibliogràfiques
Autors principals:	Subramani, Krishna, Smaragdis, Paris, Higuchi, Takuya, Souden, Mehrez
Format:	Preprint
Publicat:	2024
Matèries:	Audio and Speech Processing Machine Learning Sound
Accés en línia:	https://arxiv.org/abs/2404.04439
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Ítems similars

Combolutional Neural Networks
per: Churchwell, Cameron, et al.
Publicat: (2025)

Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
per: Bralios, Dimitrios, et al.
Publicat: (2025)

ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model
per: Heydari, Mojtaba, et al.
Publicat: (2024)

Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity
per: Subramani, Krishna, et al.
Publicat: (2023)

Resource-constrained stereo singing voice cancellation
per: Borrelli, Clara, et al.
Publicat: (2024)

Learning to Upsample and Upmix Audio in the Latent Domain
per: Bralios, Dimitrios, et al.
Publicat: (2025)

On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning
per: Tavares, Tiago, et al.
Publicat: (2024)

Audio Editing with Non-Rigid Text Prompts
per: Paissan, Francesco, et al.
Publicat: (2023)

Sound Source Separation Using Latent Variational Block-Wise Disentanglement
per: Helwani, Karim, et al.
Publicat: (2024)

Adaptive Slimming for Scalable and Efficient Speech Enhancement
per: Miccini, Riccardo, et al.
Publicat: (2025)

Ambisonics Super-Resolution Using A Waveform-Domain Neural Network
per: Nawfal, Ismael, et al.
Publicat: (2025)

Scaling Up Adaptive Filter Optimizers
per: Casebeer, Jonah, et al.
Publicat: (2024)

StereoFoley: Object-Aware Stereo Audio Generation from Video
per: Karchkhadze, Tornike, et al.
Publicat: (2025)

User-guided Generative Source Separation
per: Wen, Yutong, et al.
Publicat: (2025)

Gencho: Room Impulse Response Generation from Reverberant Speech and Text via Diffusion Transformers
per: Lin, Jackie, et al.
Publicat: (2026)

Large Language Models and Non-Negative Matrix Factorization for Bioacoustic Signal Decomposition
per: Torabi, Yasaman, et al.
Publicat: (2025)

Bayesian Negative Binomial Regression of Afrobeats Chart Persistence
per: Cabansag, Ian Jacob, et al.
Publicat: (2026)

Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
per: Kim, Minsu, et al.
Publicat: (2025)

HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
per: Wisnu, Dyah A. M. G., et al.
Publicat: (2024)

Unsupervised Composable Representations for Audio
per: Bindi, Giovanni, et al.
Publicat: (2024)

Learning Disentangled Speech Representations
per: Brima, Yusuf, et al.
Publicat: (2023)

PromptSep: Generative Audio Separation via Multimodal Prompting
per: Wen, Yutong, et al.
Publicat: (2025)

Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
per: Maharana, Sarthak Kumar, et al.
Publicat: (2023)

DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective
per: Chi, Hyung Gun, et al.
Publicat: (2025)

Feature Representations for Automatic Meerkat Vocalization Classification
per: Mahmoud, Imen Ben, et al.
Publicat: (2024)

Benchmarking Representations for Speech, Music, and Acoustic Events
per: La Quatra, Moreno, et al.
Publicat: (2024)

Evaluating Disentangled Representations for Controllable Music Generation
per: Ibáñez-Martínez, Laura, et al.
Publicat: (2026)

Learning Music Audio Representations With Limited Data
per: Plachouras, Christos, et al.
Publicat: (2025)

Multichannel Voice Trigger Detection Based on Transform-average-concatenate
per: Higuchi, Takuya, et al.
Publicat: (2023)

Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology
per: Netzorg, Robin, et al.
Publicat: (2024)

Learning Disentangled Audio Representations through Controlled Synthesis
per: Brima, Yusuf, et al.
Publicat: (2024)

Knowledge boosting during low-latency inference
per: Srinivas, Vidya, et al.
Publicat: (2024)

ASTRA: Aligning Speech and Text Representations for Asr without Sampling
per: Gaur, Neeraj, et al.
Publicat: (2024)

Singer Identity Representation Learning using Self-Supervised Techniques
per: Torres, Bernardo, et al.
Publicat: (2024)

COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
per: Ciranni, Ruben, et al.
Publicat: (2024)

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022
per: Miyaguchi, Anthony, et al.
Publicat: (2022)

RepCodec: A Speech Representation Codec for Speech Tokenization
per: Huang, Zhichao, et al.
Publicat: (2023)

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing
per: Lebourdais, Martin, et al.
Publicat: (2024)

Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement
per: Cheng, Longbiao, et al.
Publicat: (2024)

Towards the Synthesis of Non-speech Vocalizations
per: Hoq, Enjamamul, et al.
Publicat: (2024)