:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Nan, Zheng, Dang, Ting, Sethu, Vidhyasaharan, Ahmed, Beena
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Audio and Speech Processing Computation and Language Machine Learning Sound
Accesso online:	https://arxiv.org/abs/2409.15357
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features
di: Meng, Hanyu, et al.
Pubblicazione: (2024)

Why Can't They Remember? Uncovering Representation and Retrieval Bottlenecks in Multi-Turn Acoustic Memory
di: Xiao, Yang, et al.
Pubblicazione: (2026)

What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions
di: Meng, Hanyu, et al.
Pubblicazione: (2024)

Beyond Deep Learning: Speech Segmentation and Phone Classification with Neural Assemblies
di: Adelson, Trevor, et al.
Pubblicazione: (2026)

ProMode: A Speech Prosody Model Conditioned on Acoustic and Textual Inputs
di: Eren, Eray, et al.
Pubblicazione: (2025)

PAST: Phonetic-Acoustic Speech Tokenizer
di: Har-Tuv, Nadav, et al.
Pubblicazione: (2025)

Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends
di: Zhang, Qiquan, et al.
Pubblicazione: (2025)

Binaural Selective Attention Model for Target Speaker Extraction
di: Meng, Hanyu, et al.
Pubblicazione: (2024)

Disentangling Textual and Acoustic Features of Neural Speech Representations
di: Mohebbi, Hosein, et al.
Pubblicazione: (2024)

Adaptive Per-Channel Energy Normalization Front-end for Robust Audio Signal Processing
di: Meng, Hanyu, et al.
Pubblicazione: (2025)

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers
di: Yang, Zijian, et al.
Pubblicazione: (2023)

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
di: Yuan, Yi, et al.
Pubblicazione: (2024)

Energy-Based Models with Applications to Speech and Language Processing
di: Ou, Zhijian
Pubblicazione: (2024)

Audio-to-Score Conversion Model Based on Whisper methodology
di: Zhang, Hongyao, et al.
Pubblicazione: (2024)

Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
di: Segev, Eliya, et al.
Pubblicazione: (2023)

On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
di: Varshavsky-Hassid, Miri, et al.
Pubblicazione: (2024)

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR
di: Shakeel, Muhammad, et al.
Pubblicazione: (2026)

Medical Spoken Named Entity Recognition
di: Le-Duc, Khai, et al.
Pubblicazione: (2024)

Unsupervised Blind Joint Dereverberation and Room Acoustics Estimation with Diffusion Models
di: Lemercier, Jean-Marie, et al.
Pubblicazione: (2024)

Joint Transcription of Acoustic Guitar Strumming Directions and Chords
di: Murgul, Sebastian, et al.
Pubblicazione: (2025)

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
di: Li, Weiqin, et al.
Pubblicazione: (2024)

A Novel Markovian Framework for Integrating Absolute and Relative Ordinal Emotion Information
di: Wu, Jingyao, et al.
Pubblicazione: (2021)

Efficient VoIP Communications through LLM-based Real-Time Speech Reconstruction and Call Prioritization for Emergency Services
di: Venkateshperumal, Danush, et al.
Pubblicazione: (2024)

Property Neurons in Self-Supervised Speech Transformers
di: Lin, Tzu-Quan, et al.
Pubblicazione: (2024)

Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection
di: Yang, Tzu-Ting, et al.
Pubblicazione: (2024)

Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE
di: Lian, Jiachen, et al.
Pubblicazione: (2022)

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion
di: Wang, Jinhan, et al.
Pubblicazione: (2024)

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
di: Attia, Ahmed Adel, et al.
Pubblicazione: (2024)

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
di: Li, Yizhi, et al.
Pubblicazione: (2023)

SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
di: Kumar, Anurag, et al.
Pubblicazione: (2025)

A Variational Framework for Improving Naturalness in Generative Spoken Language Models
di: Chen, Li-Wei, et al.
Pubblicazione: (2025)

WEE-Therapy: A Mixture of Weak Encoders Framework for Psychological Counseling Dialogue Analysis
di: Kang, Yongqi, et al.
Pubblicazione: (2025)

Zero-Shot Cognitive Impairment Detection from Speech Using AudioLLM
di: Shahin, Mostafa, et al.
Pubblicazione: (2025)

EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning
di: Seth, Ashish, et al.
Pubblicazione: (2024)

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
di: Battenberg, Eric, et al.
Pubblicazione: (2024)

OLMoASR: Open Models and Data for Training Robust Speech Recognition Models
di: Ngo, Huong, et al.
Pubblicazione: (2025)

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
di: Veluri, Bandhav, et al.
Pubblicazione: (2024)

Modeling Overlapped Speech with Shuffles
di: Wiesner, Matthew, et al.
Pubblicazione: (2026)

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
di: Kanda, Naoyuki, et al.
Pubblicazione: (2024)

The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN
di: Yuan, Zheng, et al.
Pubblicazione: (2023)