Saved in:
| Main Authors: | Ji, Qingfeng, Wang, Yuxin, Sun, Letong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.18007 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ASM: Audio Spectrogram Mixer
by: Ji, Qingfeng, et al.
Published: (2024)
by: Ji, Qingfeng, et al.
Published: (2024)
Sparse Autoencoders Make Audio Foundation Models more Explainable
by: Mariotte, Théo, et al.
Published: (2025)
by: Mariotte, Théo, et al.
Published: (2025)
Mitigating Unauthorized Speech Synthesis for Voice Protection
by: Zhang, Zhisheng, et al.
Published: (2024)
by: Zhang, Zhisheng, et al.
Published: (2024)
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
by: Kim, Ji-Hoon, et al.
Published: (2023)
by: Kim, Ji-Hoon, et al.
Published: (2023)
Symbotunes: unified hub for symbolic music generative models
by: Skierś, Paweł, et al.
Published: (2024)
by: Skierś, Paweł, et al.
Published: (2024)
Self-Supervised Disentangled Representation Learning for Robust Target Speech Extraction
by: Mu, Zhaoxi, et al.
Published: (2023)
by: Mu, Zhaoxi, et al.
Published: (2023)
Collaborative Watermarking for Adversarial Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2023)
by: Juvela, Lauri, et al.
Published: (2023)
F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation
by: Agarwal, Manvi, et al.
Published: (2025)
by: Agarwal, Manvi, et al.
Published: (2025)
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement
by: Zhang, Junan, et al.
Published: (2025)
by: Zhang, Junan, et al.
Published: (2025)
A Conditioned UNet for Music Source Separation
by: O'Hanlon, Ken, et al.
Published: (2025)
by: O'Hanlon, Ken, et al.
Published: (2025)
Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning
by: Kang, Zuheng, et al.
Published: (2024)
by: Kang, Zuheng, et al.
Published: (2024)
Schrödinger Bridge Mamba for One-Step Speech Enhancement
by: Yang, Jing, et al.
Published: (2025)
by: Yang, Jing, et al.
Published: (2025)
Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models
by: Zhang, Wenda, et al.
Published: (2026)
by: Zhang, Wenda, et al.
Published: (2026)
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
by: Song, Yakun, et al.
Published: (2025)
by: Song, Yakun, et al.
Published: (2025)
Scaling Speech Tokenizers with Diffusion Autoencoders
by: Wang, Yuancheng, et al.
Published: (2026)
by: Wang, Yuancheng, et al.
Published: (2026)
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
by: Du, Zhihao, et al.
Published: (2024)
by: Du, Zhihao, et al.
Published: (2024)
Machine listening in a neonatal intensive care unit
by: Tailleur, Modan, et al.
Published: (2024)
by: Tailleur, Modan, et al.
Published: (2024)
Self-Supervised Speech Quality Estimation and Enhancement Using Only Clean Speech
by: Fu, Szu-Wei, et al.
Published: (2024)
by: Fu, Szu-Wei, et al.
Published: (2024)
Multi-Metric Preference Alignment for Generative Speech Restoration
by: Zhang, Junan, et al.
Published: (2025)
by: Zhang, Junan, et al.
Published: (2025)
Whisfusion: Parallel ASR Decoding via a Diffusion Transformer
by: Kwon, Taeyoun, et al.
Published: (2025)
by: Kwon, Taeyoun, et al.
Published: (2025)
Personalized Speech Enhancement Without a Separate Speaker Embedding Model
by: Pärnamaa, Tanel, et al.
Published: (2024)
by: Pärnamaa, Tanel, et al.
Published: (2024)
Masked Audio Generation using a Single Non-Autoregressive Transformer
by: Ziv, Alon, et al.
Published: (2024)
by: Ziv, Alon, et al.
Published: (2024)
Repurposing Image Diffusion Models for Training-Free Music Style Transfer on Mel-spectrograms
by: Wang, Heehwan, et al.
Published: (2024)
by: Wang, Heehwan, et al.
Published: (2024)
Music Plagiarism Detection: Problem Formulation and a Segment-based Solution
by: Go, Seonghyeon, et al.
Published: (2026)
by: Go, Seonghyeon, et al.
Published: (2026)
aTENNuate: Optimized Real-time Speech Enhancement with Deep SSMs on Raw Audio
by: Pei, Yan Ru, et al.
Published: (2024)
by: Pei, Yan Ru, et al.
Published: (2024)
JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention
by: Ioannides, Georgios, et al.
Published: (2025)
by: Ioannides, Georgios, et al.
Published: (2025)
DiceHuBERT: Distilling HuBERT with a Self-Supervised Learning Objective
by: Chi, Hyung Gun, et al.
Published: (2025)
by: Chi, Hyung Gun, et al.
Published: (2025)
Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition
by: Singh, Satwinder, et al.
Published: (2025)
by: Singh, Satwinder, et al.
Published: (2025)
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
by: Wang, Yongqi, et al.
Published: (2024)
by: Wang, Yongqi, et al.
Published: (2024)
Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure Classifiers
by: Abdalmalak, Kerlos Atia, et al.
Published: (2024)
by: Abdalmalak, Kerlos Atia, et al.
Published: (2024)
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
by: Tal, Or, et al.
Published: (2025)
by: Tal, Or, et al.
Published: (2025)
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction
by: Ma, Yinghao, et al.
Published: (2026)
by: Ma, Yinghao, et al.
Published: (2026)
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
by: Wang, Yuancheng, et al.
Published: (2024)
by: Wang, Yuancheng, et al.
Published: (2024)
Multimodal Audio-based Disease Prediction with Transformer-based Hierarchical Fusion Network
by: Cai, Jinjin, et al.
Published: (2024)
by: Cai, Jinjin, et al.
Published: (2024)
SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing
by: Tan, Jiaye, et al.
Published: (2025)
by: Tan, Jiaye, et al.
Published: (2025)
Speech Emotion Recognition Using CNN and Its Use Case in Digital Healthcare
by: Nigar, Nishargo
Published: (2024)
by: Nigar, Nishargo
Published: (2024)
DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
by: Luo, Yin-Jyun, et al.
Published: (2024)
by: Luo, Yin-Jyun, et al.
Published: (2024)
Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
by: Fujita, Yoto, et al.
Published: (2024)
by: Fujita, Yoto, et al.
Published: (2024)
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
by: Zhang, Yuwei, et al.
Published: (2024)
by: Zhang, Yuwei, et al.
Published: (2024)
Exploring and Applying Audio-Based Sentiment Analysis in Music
by: Jhanji, Etash
Published: (2024)
by: Jhanji, Etash
Published: (2024)
Similar Items
-
ASM: Audio Spectrogram Mixer
by: Ji, Qingfeng, et al.
Published: (2024) -
Sparse Autoencoders Make Audio Foundation Models more Explainable
by: Mariotte, Théo, et al.
Published: (2025) -
Mitigating Unauthorized Speech Synthesis for Voice Protection
by: Zhang, Zhisheng, et al.
Published: (2024) -
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
by: Kim, Ji-Hoon, et al.
Published: (2023) -
Symbotunes: unified hub for symbolic music generative models
by: Skierś, Paweł, et al.
Published: (2024)