Saved in:
| Main Authors: | Bourdin, Yann, Legrand, Pierrick, Roche, Fanny |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.15313 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects
by: Bourdin, Yann, et al.
Published: (2025)
by: Bourdin, Yann, et al.
Published: (2025)
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
by: Raimon, Athul, et al.
Published: (2024)
by: Raimon, Athul, et al.
Published: (2024)
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
by: Chen, Guangke, et al.
Published: (2025)
by: Chen, Guangke, et al.
Published: (2025)
A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model
by: Hu, Xiaolin, et al.
Published: (2026)
by: Hu, Xiaolin, et al.
Published: (2026)
GE2E-AC: Generalized End-to-End Loss Training for Accent Classification
by: Watanabe, Chihiro, et al.
Published: (2024)
by: Watanabe, Chihiro, et al.
Published: (2024)
End-to-End Efficiency in Keyword Spotting: A System-Level Approach for Embedded Microcontrollers
by: Bartoli, Pietro, et al.
Published: (2025)
by: Bartoli, Pietro, et al.
Published: (2025)
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
by: Zhang, Zhisheng, et al.
Published: (2025)
by: Zhang, Zhisheng, et al.
Published: (2025)
A Hierarchical End-of-Turn Model with Primary Speaker Segmentation for Real-Time Conversational AI
by: Helwani, Karim, et al.
Published: (2026)
by: Helwani, Karim, et al.
Published: (2026)
Content Adaptive Front End For Audio Classification
by: Verma, Prateek, et al.
Published: (2023)
by: Verma, Prateek, et al.
Published: (2023)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
by: Hsiao, Chi-Yuan, et al.
Published: (2025)
Towards End-to-End Training of Automatic Speech Recognition for Nigerian Pidgin
by: Rufai, Amina Mardiyyah, et al.
Published: (2020)
by: Rufai, Amina Mardiyyah, et al.
Published: (2020)
O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
by: Gruttadauria, Elio, et al.
Published: (2025)
by: Gruttadauria, Elio, et al.
Published: (2025)
End-to-End Spoken Grammatical Error Correction
by: Qian, Mengjie, et al.
Published: (2025)
by: Qian, Mengjie, et al.
Published: (2025)
FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time
by: Jobayer, Md, et al.
Published: (2024)
by: Jobayer, Md, et al.
Published: (2024)
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)
by: Ravi, Nagarathna, et al.
Published: (2024)
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model
by: Cui, Jianwei, et al.
Published: (2024)
by: Cui, Jianwei, et al.
Published: (2024)
Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
by: Wu, Yanru, et al.
Published: (2026)
by: Wu, Yanru, et al.
Published: (2026)
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
by: Bogavelli, Tara, et al.
Published: (2026)
by: Bogavelli, Tara, et al.
Published: (2026)
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
by: Morrone, Giovanni, et al.
Published: (2023)
by: Morrone, Giovanni, et al.
Published: (2023)
Fast Text-to-Audio Generation with Adversarial Post-Training
by: Novack, Zachary, et al.
Published: (2025)
by: Novack, Zachary, et al.
Published: (2025)
Zero-Shot End-To-End Spoken Question Answering In Medical Domain
by: Labrak, Yanis, et al.
Published: (2024)
by: Labrak, Yanis, et al.
Published: (2024)
Training-Free Multimodal Guidance for Video to Audio Generation
by: Grassucci, Eleonora, et al.
Published: (2025)
by: Grassucci, Eleonora, et al.
Published: (2025)
Audio-Visual Continual Test-Time Adaptation without Forgetting
by: Maharana, Sarthak Kumar, et al.
Published: (2026)
by: Maharana, Sarthak Kumar, et al.
Published: (2026)
Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs
by: Sinha, Anshuman, et al.
Published: (2024)
by: Sinha, Anshuman, et al.
Published: (2024)
AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers
by: Yamamoto, Kohei, et al.
Published: (2025)
by: Yamamoto, Kohei, et al.
Published: (2025)
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
by: Shao, Weichuang, et al.
Published: (2025)
by: Shao, Weichuang, et al.
Published: (2025)
BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization
by: Samin, Md. Nazmus Sadat, et al.
Published: (2024)
by: Samin, Md. Nazmus Sadat, et al.
Published: (2024)
End-to-end Piano Performance-MIDI to Score Conversion with Transformers
by: Beyer, Tim, et al.
Published: (2024)
by: Beyer, Tim, et al.
Published: (2024)
Segmentwise Pruning in Audio-Language Models
by: Gibier, Marcel, et al.
Published: (2025)
by: Gibier, Marcel, et al.
Published: (2025)
AWARE: Audio Watermarking with Adversarial Resistance to Edits
by: Pavlović, Kosta, et al.
Published: (2025)
by: Pavlović, Kosta, et al.
Published: (2025)
Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis
by: Yu, Chin-Yun, et al.
Published: (2024)
by: Yu, Chin-Yun, et al.
Published: (2024)
Audio Super-Resolution with Latent Bridge Models
by: Li, Chang, et al.
Published: (2025)
by: Li, Chang, et al.
Published: (2025)
DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation
by: Shao, Weichuang, et al.
Published: (2025)
by: Shao, Weichuang, et al.
Published: (2025)
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
by: Xiao, Yixuan, et al.
Published: (2026)
by: Xiao, Yixuan, et al.
Published: (2026)
ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)
by: Jimon, Daniel, et al.
Published: (2025)
Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction
by: Wu, Yusong, et al.
Published: (2025)
by: Wu, Yusong, et al.
Published: (2025)
Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems
by: Pizzi, Karla, et al.
Published: (2024)
by: Pizzi, Karla, et al.
Published: (2024)
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
by: Wu, Daiqing, et al.
Published: (2026)
by: Wu, Daiqing, et al.
Published: (2026)
FastWave: Optimized Diffusion Model for Audio Super-Resolution
by: Kuznetsov, Nikita, et al.
Published: (2026)
by: Kuznetsov, Nikita, et al.
Published: (2026)
TADA! Tuning Audio Diffusion Models through Activation Steering
by: Staniszewski, Łukasz, et al.
Published: (2026)
by: Staniszewski, Łukasz, et al.
Published: (2026)
Similar Items
-
Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects
by: Bourdin, Yann, et al.
Published: (2025) -
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
by: Raimon, Athul, et al.
Published: (2024) -
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
by: Chen, Guangke, et al.
Published: (2025) -
A$^2$-LLM: An End-to-end Conversational Audio Avatar Large Language Model
by: Hu, Xiaolin, et al.
Published: (2026) -
GE2E-AC: Generalized End-to-End Loss Training for Accent Classification
by: Watanabe, Chihiro, et al.
Published: (2024)