:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bourdin, Yann, Legrand, Pierrick, Roche, Fanny
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning
Online Access:	https://arxiv.org/abs/2512.15313
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects
by: Bourdin, Yann, et al.
Published: (2025)

Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
by: Raimon, Athul, et al.
Published: (2024)

AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
by: Chen, Guangke, et al.
Published: (2025)

A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model
by: Hu, Xiaolin, et al.
Published: (2026)

GE2E-AC: Generalized End-to-End Loss Training for Accent Classification
by: Watanabe, Chihiro, et al.
Published: (2024)

End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers
by: Bartoli, Pietro, et al.
Published: (2025)

E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
by: Zhang, Zhisheng, et al.
Published: (2025)

A Hierarchical End-of-Turn Model with Primary Speaker Segmentation for Real-Time Conversational AI
by: Helwani, Karim, et al.
Published: (2026)

Content Adaptive Front End For Audio Classification
by: Verma, Prateek, et al.
Published: (2023)

Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)

Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
by: Rufai, Amina Mardiyyah, et al.
Published: (2020)

O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
by: Gruttadauria, Elio, et al.
Published: (2025)

End-to-End Spoken Grammatical Error Correction
by: Qian, Mengjie, et al.
Published: (2025)

FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time
by: Jobayer, Md, et al.
Published: (2024)

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)

SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
by: Cui, Jianwei, et al.
Published: (2024)

Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
by: Wu, Yanru, et al.
Published: (2026)

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
by: Bogavelli, Tara, et al.
Published: (2026)

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
by: Morrone, Giovanni, et al.
Published: (2023)

Fast Text-to-Audio Generation with Adversarial Post-Training
by: Novack, Zachary, et al.
Published: (2025)

Zero-Shot End-To-End Spoken Question Answering In Medical Domain
by: Labrak, Yanis, et al.
Published: (2024)

Training-Free Multimodal Guidance for Video to Audio Generation
by: Grassucci, Eleonora, et al.
Published: (2025)

Audio-Visual Continual Test-Time Adaptation without Forgetting
by: Maharana, Sarthak Kumar, et al.
Published: (2026)

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)

AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers
by: Yamamoto, Kohei, et al.
Published: (2025)

AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
by: Shao, Weichuang, et al.
Published: (2025)

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
by: Samin, Md. Nazmus Sadat, et al.
Published: (2024)

End-to-end Piano Performance-MIDI to Score Conversion with Transformers
by: Beyer, Tim, et al.
Published: (2024)

Segmentwise Pruning in Audio-Language Models
by: Gibier, Marcel, et al.
Published: (2025)

AWARE: Audio Watermarking with Adversarial Resistance to Edits
by: Pavlović, Kosta, et al.
Published: (2025)

Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis
by: Yu, Chin-Yun, et al.
Published: (2024)

Audio Super-Resolution with Latent Bridge Models
by: Li, Chang, et al.
Published: (2025)

DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation
by: Shao, Weichuang, et al.
Published: (2025)

How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
by: Xiao, Yixuan, et al.
Published: (2026)

ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
by: Wu, Yusong, et al.
Published: (2025)

Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems
by: Pizzi, Karla, et al.
Published: (2024)

Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
by: Wu, Daiqing, et al.
Published: (2026)

FastWave: Optimized Diffusion Model for Audio Super-Resolution
by: Kuznetsov, Nikita, et al.
Published: (2026)

TADA! Tuning Audio Diffusion Models through Activation Steering
by: Staniszewski, Łukasz, et al.
Published: (2026)