:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mokrý, Ondřej, Rajmic, Pavel
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2403.04433
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Regularized autoregressive modeling and its application to audio signal reconstruction
by: Mokrý, Ondřej, et al.
Published: (2024)

Multiple Hankel matrix rank minimization for audio inpainting
by: Záviška, Pavel, et al.
Published: (2023)

Janssen 2.0: Audio Inpainting in the Time-frequency Domain
by: Mokrý, Ondřej, et al.
Published: (2024)

Audio Inpainting in Time-Frequency Domain with Phase-Aware Prior
by: Balušík, Peter, et al.
Published: (2026)

A MATLAB toolbox for Computation of Speech Transmission Index (STI)
by: Rajmic, Pavel, et al.
Published: (2025)

Blind estimation of audio effects using an auto-encoder approach and differentiable digital signal processing
by: Peladeau, Côme, et al.
Published: (2023)

Scaling up masked audio encoder learning for general audio classification
by: Dinkel, Heinrich, et al.
Published: (2024)

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis
by: Liu, Yunyi, et al.
Published: (2024)

Self-supervised learning method using multiple sampling strategies for general-purpose audio representation
by: Kuroyanagi, Ibuki, et al.
Published: (2025)

Towards audio language modeling -- an overview
by: Wu, Haibin, et al.
Published: (2024)

Why some audio signal short-time Fourier transform coefficients have nonuniform phase distributions
by: Voran, Stephen D.
Published: (2024)

Compositional nonlinear audio signal processing with Volterra series
by: Araujo-Simon, Jake
Published: (2023)

Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)

MBCodec:Thorough disentangle for high-fidelity audio compression
by: Zhang, Ruonan, et al.
Published: (2025)

Real-time implementation of vibrato transfer as an audio effect
by: Hyrkas, Jeremy
Published: (2025)

EDTC: enhance depth of text comprehension in automated audio captioning
by: Tan, Liwen, et al.
Published: (2024)

Masked Self-distilled Transducer-based Keyword Spotting with Semi-autoregressive Decoding
by: Xi, Yu, et al.
Published: (2025)

Speaker anonymization using neural audio codec language models
by: Panariello, Michele, et al.
Published: (2023)

FxSearcher: gradient-free text-driven audio transformation
by: Ki, Hojoon, et al.
Published: (2025)

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)

MusicGen-Stem: Multi-stem music generation and edition through autoregressive modeling
by: Rouard, Simon, et al.
Published: (2025)

AxLSTMs: learning self-supervised audio representations with xLSTMs
by: Yadav, Sarthak, et al.
Published: (2024)

STASE: A spatialized text-to-audio synthesis engine for music generation
by: Chi, Tutti, et al.
Published: (2025)

Deep learning based spatial aliasing reduction in beamforming for audio capture
by: Guzik, Mateusz, et al.
Published: (2025)

Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks
by: Das, Sneha, et al.
Published: (2020)

Human-CLAP: Human-perception-based contrastive language-audio pretraining
by: Takano, Taisei, et al.
Published: (2025)

DashengTokenizer: One layer is enough for unified audio understanding and generation
by: Dinkel, Heinrich, et al.
Published: (2026)

A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)

Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline
by: Riley, Xavier, et al.
Published: (2024)

Exploring trends in audio mixes and masters: Insights from a dataset analysis
by: Mourgela, Angeliki, et al.
Published: (2024)

RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio
by: Kanamori, Yusuke, et al.
Published: (2025)

Modeling strategies for speech enhancement in the latent space of a neural audio codec
by: Kammoun, Sofiene, et al.
Published: (2025)

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
by: Niu, Yadong, et al.
Published: (2026)

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
by: Siuzdak, Hubert
Published: (2023)

A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
by: Hsu, Yicheng, et al.
Published: (2024)

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
by: Wu, Haibin, et al.
Published: (2024)

Efficient learning-based sound propagation for virtual and real-world audio processing applications
by: Ratnarajah, Anton Jeran
Published: (2024)

ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
by: Jing, Xin, et al.
Published: (2024)

WavJEPA: Semantic learning unlocks robust audio foundation models for raw waveforms
by: Yuksel, Goksenin, et al.
Published: (2025)

audio2chart: End to End Audio Transcription into playable Guitar Hero charts
by: Tripodi, Riccardo
Published: (2025)