Guardat en:
| Autors principals: | Subramani, Krishna, Smaragdis, Paris, Higuchi, Takuya, Souden, Mehrez |
|---|---|
| Format: | Preprint |
| Publicat: |
2024
|
| Matèries: | |
| Accés en línia: | https://arxiv.org/abs/2404.04439 |
| Etiquetes: |
Afegir etiqueta
Sense etiquetes, Sigues el primer a etiquetar aquest registre!
|
Ítems similars
Combolutional Neural Networks
per: Churchwell, Cameron, et al.
Publicat: (2025)
per: Churchwell, Cameron, et al.
Publicat: (2025)
Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
per: Bralios, Dimitrios, et al.
Publicat: (2025)
per: Bralios, Dimitrios, et al.
Publicat: (2025)
ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model
per: Heydari, Mojtaba, et al.
Publicat: (2024)
per: Heydari, Mojtaba, et al.
Publicat: (2024)
Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity
per: Subramani, Krishna, et al.
Publicat: (2023)
per: Subramani, Krishna, et al.
Publicat: (2023)
Resource-constrained stereo singing voice cancellation
per: Borrelli, Clara, et al.
Publicat: (2024)
per: Borrelli, Clara, et al.
Publicat: (2024)
Learning to Upsample and Upmix Audio in the Latent Domain
per: Bralios, Dimitrios, et al.
Publicat: (2025)
per: Bralios, Dimitrios, et al.
Publicat: (2025)
On Class Separability Pitfalls In Audio-Text Contrastive Zero-Shot Learning
per: Tavares, Tiago, et al.
Publicat: (2024)
per: Tavares, Tiago, et al.
Publicat: (2024)
Audio Editing with Non-Rigid Text Prompts
per: Paissan, Francesco, et al.
Publicat: (2023)
per: Paissan, Francesco, et al.
Publicat: (2023)
Sound Source Separation Using Latent Variational Block-Wise Disentanglement
per: Helwani, Karim, et al.
Publicat: (2024)
per: Helwani, Karim, et al.
Publicat: (2024)
Adaptive Slimming for Scalable and Efficient Speech Enhancement
per: Miccini, Riccardo, et al.
Publicat: (2025)
per: Miccini, Riccardo, et al.
Publicat: (2025)
Ambisonics Super-Resolution Using A Waveform-Domain Neural Network
per: Nawfal, Ismael, et al.
Publicat: (2025)
per: Nawfal, Ismael, et al.
Publicat: (2025)
Scaling Up Adaptive Filter Optimizers
per: Casebeer, Jonah, et al.
Publicat: (2024)
per: Casebeer, Jonah, et al.
Publicat: (2024)
StereoFoley: Object-Aware Stereo Audio Generation from Video
per: Karchkhadze, Tornike, et al.
Publicat: (2025)
per: Karchkhadze, Tornike, et al.
Publicat: (2025)
User-guided Generative Source Separation
per: Wen, Yutong, et al.
Publicat: (2025)
per: Wen, Yutong, et al.
Publicat: (2025)
Gencho: Room Impulse Response Generation from Reverberant Speech and Text via Diffusion Transformers
per: Lin, Jackie, et al.
Publicat: (2026)
per: Lin, Jackie, et al.
Publicat: (2026)
Large Language Models and Non-Negative Matrix Factorization for Bioacoustic Signal Decomposition
per: Torabi, Yasaman, et al.
Publicat: (2025)
per: Torabi, Yasaman, et al.
Publicat: (2025)
Bayesian Negative Binomial Regression of Afrobeats Chart Persistence
per: Cabansag, Ian Jacob, et al.
Publicat: (2026)
per: Cabansag, Ian Jacob, et al.
Publicat: (2026)
Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
per: Kim, Minsu, et al.
Publicat: (2025)
per: Kim, Minsu, et al.
Publicat: (2025)
HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids
per: Wisnu, Dyah A. M. G., et al.
Publicat: (2024)
per: Wisnu, Dyah A. M. G., et al.
Publicat: (2024)
Unsupervised Composable Representations for Audio
per: Bindi, Giovanni, et al.
Publicat: (2024)
per: Bindi, Giovanni, et al.
Publicat: (2024)
Learning Disentangled Speech Representations
per: Brima, Yusuf, et al.
Publicat: (2023)
per: Brima, Yusuf, et al.
Publicat: (2023)
PromptSep: Generative Audio Separation via Multimodal Prompting
per: Wen, Yutong, et al.
Publicat: (2025)
per: Wen, Yutong, et al.
Publicat: (2025)
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
per: Maharana, Sarthak Kumar, et al.
Publicat: (2023)
per: Maharana, Sarthak Kumar, et al.
Publicat: (2023)
DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective
per: Chi, Hyung Gun, et al.
Publicat: (2025)
per: Chi, Hyung Gun, et al.
Publicat: (2025)
Feature Representations for Automatic Meerkat Vocalization Classification
per: Mahmoud, Imen Ben, et al.
Publicat: (2024)
per: Mahmoud, Imen Ben, et al.
Publicat: (2024)
Benchmarking Representations for Speech, Music, and Acoustic Events
per: La Quatra, Moreno, et al.
Publicat: (2024)
per: La Quatra, Moreno, et al.
Publicat: (2024)
Evaluating Disentangled Representations for Controllable Music Generation
per: Ibáñez-Martínez, Laura, et al.
Publicat: (2026)
per: Ibáñez-Martínez, Laura, et al.
Publicat: (2026)
Learning Music Audio Representations With Limited Data
per: Plachouras, Christos, et al.
Publicat: (2025)
per: Plachouras, Christos, et al.
Publicat: (2025)
Multichannel Voice Trigger Detection Based on Transform-average-concatenate
per: Higuchi, Takuya, et al.
Publicat: (2023)
per: Higuchi, Takuya, et al.
Publicat: (2023)
Speech After Gender: A Trans-Feminine Perspective on Next Steps for Speech Science and Technology
per: Netzorg, Robin, et al.
Publicat: (2024)
per: Netzorg, Robin, et al.
Publicat: (2024)
Learning Disentangled Audio Representations through Controlled Synthesis
per: Brima, Yusuf, et al.
Publicat: (2024)
per: Brima, Yusuf, et al.
Publicat: (2024)
Knowledge boosting during low-latency inference
per: Srinivas, Vidya, et al.
Publicat: (2024)
per: Srinivas, Vidya, et al.
Publicat: (2024)
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
per: Gaur, Neeraj, et al.
Publicat: (2024)
per: Gaur, Neeraj, et al.
Publicat: (2024)
Singer Identity Representation Learning using Self-Supervised Techniques
per: Torres, Bernardo, et al.
Publicat: (2024)
per: Torres, Bernardo, et al.
Publicat: (2024)
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
per: Ciranni, Ruben, et al.
Publicat: (2024)
per: Ciranni, Ruben, et al.
Publicat: (2024)
Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022
per: Miyaguchi, Anthony, et al.
Publicat: (2022)
per: Miyaguchi, Anthony, et al.
Publicat: (2022)
RepCodec: A Speech Representation Codec for Speech Tokenization
per: Huang, Zhichao, et al.
Publicat: (2023)
per: Huang, Zhichao, et al.
Publicat: (2023)
Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing
per: Lebourdais, Martin, et al.
Publicat: (2024)
per: Lebourdais, Martin, et al.
Publicat: (2024)
Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement
per: Cheng, Longbiao, et al.
Publicat: (2024)
per: Cheng, Longbiao, et al.
Publicat: (2024)
Towards the Synthesis of Non-speech Vocalizations
per: Hoq, Enjamamul, et al.
Publicat: (2024)
per: Hoq, Enjamamul, et al.
Publicat: (2024)
Ítems similars
-
Combolutional Neural Networks
per: Churchwell, Cameron, et al.
Publicat: (2025) -
Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders
per: Bralios, Dimitrios, et al.
Publicat: (2025) -
ImmerseDiffusion: A Generative Spatial Audio Latent Diffusion Model
per: Heydari, Mojtaba, et al.
Publicat: (2024) -
Noise-Robust DSP-Assisted Neural Pitch Estimation with Very Low Complexity
per: Subramani, Krishna, et al.
Publicat: (2023) -
Resource-constrained stereo singing voice cancellation
per: Borrelli, Clara, et al.
Publicat: (2024)