:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Myrgyyassov, Alisher, Wang, Bruce Xiao, Sun, Yu, Huang, Shuming, Song, Zhen, Wong, Min Ney, Zheng, Yongping
Formato:	Preprint
Publicado:	2026
Materias:	Quantitative Methods Machine Learning Sound Audio and Speech Processing
Acceso en línea:	https://arxiv.org/abs/2603.03350
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example
por: Ghosh, Suhita, et al.
Publicado: (2024)

Temporal Feature Learning in Weakly Labelled Bioacoustic Cetacean Datasets via a Variational Autoencoder and Temporal Convolutional Network: An Interdisciplinary Approach
por: Fonollosa, Laia Garrobé, et al.
Publicado: (2024)

Computational bioacoustics with deep learning: a review and roadmap
por: Stowell, Dan
Publicado: (2021)

Adaptive Representations of Sound for Automatic Insect Recognition
por: Faiß, Marius, et al.
Publicado: (2023)

Learning to detect an animal sound from five examples
por: Nolasco, Inês, et al.
Publicado: (2023)

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Survey
por: Cui, Meng, et al.
Publicado: (2024)

Learning to rumble: Automated elephant call classification, detection and endpointing using deep architectures
por: Geldenhuys, Christiaan M., et al.
Publicado: (2024)

Cochlear Wave Propagation and Dynamics in the Human Base and Apex: Model-Based Estimates from Noninvasive Measurements
por: Alkhairy, Samiya A
Publicado: (2024)

Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases
por: Zhang, Pengfei, et al.
Publicado: (2024)

A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Voice
por: Paterson, Mary, et al.
Publicado: (2024)

Which Augmentation Should I Use? An Empirical Investigation of Augmentations for Self-Supervised Phonocardiogram Representation Learning
por: Ballas, Aristotelis, et al.
Publicado: (2023)

Automatic detection of Mild Cognitive Impairment using high-dimensional acoustic features in spontaneous speech
por: Zhang, Cong, et al.
Publicado: (2024)

Screening method for early dementia using sound objects as voice biomarkers
por: Pluta, Adam, et al.
Publicado: (2024)

Foundation Models for Bioacoustics -- a Comparative Review
por: Schwinger, Raphael, et al.
Publicado: (2025)

From Birdsong to Rumbles: Classifying Elephant Calls with Out-of-Species Embeddings
por: Geldenhuys, Christiaan M., et al.
Publicado: (2026)

Prosody of speech production in latent post-stroke aphasia
por: Zhang, Cong, et al.
Publicado: (2024)

All Thresholds Barred: Direct Estimation of Call Density in Bioacoustic Data
por: Navine, Amanda K., et al.
Publicado: (2024)

Multi Modal Information Fusion of Acoustic and Linguistic Data for Decoding Dairy Cow Vocalizations in Animal Welfare Assessment
por: Jobarteh, Bubacarr, et al.
Publicado: (2024)

animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics
por: Schäfer-Zimmermann, Julian C., et al.
Publicado: (2024)

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
por: Zeineldeen, Mohammad, et al.
Publicado: (2023)

Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study
por: Yang, Zijian, et al.
Publicado: (2026)

Right Label Context in End-to-End Training of Time-Synchronous ASR Models
por: Raissi, Tina, et al.
Publicado: (2025)

Atrial Fibrillation Detection System via Acoustic Sensing for Mobile Phones
por: Liu, Xuanyu, et al.
Publicado: (2024)

WhaleVAD-BPN: Improving Baleen Whale Call Detection with Boundary Proposal Networks and Post-processing Optimisation
por: Geldenhuys, Christiaan M., et al.
Publicado: (2025)

Vision-Integrated High-Quality Neural Speech Coding
por: Guo, Yao, et al.
Publicado: (2025)

Conformer-based Ultrasound-to-Speech Conversion
por: Ibrahimov, Ibrahim, et al.
Publicado: (2025)

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing
por: Zhang, Hanlin, et al.
Publicado: (2026)

Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control
por: Lu, Ye-Xin, et al.
Publicado: (2024)

Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality
por: Raissi, Tina, et al.
Publicado: (2024)

Low-Latency Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses for Speech Generation Tasks
por: Ai, Yang, et al.
Publicado: (2024)

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
por: Zheng, Youqiang, et al.
Publicado: (2024)

AntiDeepFake: AI for Deep Fake Speech Recognition
por: Togootogtokh, Enkhtogtokh, et al.
Publicado: (2024)

All Neural Low-latency Directional Speech Extraction
por: Pandey, Ashutosh, et al.
Publicado: (2024)

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
por: Li, Jiaqi, et al.
Publicado: (2024)

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
por: Hsieh, Tsun-An, et al.
Publicado: (2024)

Universal Preference-Score-based Pairwise Speech Quality Assessment
por: Shi, Yu-Fei, et al.
Publicado: (2025)

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
por: Suda, Hitoshi, et al.
Publicado: (2025)

Regularizing Learnable Feature Extraction for Automatic Speech Recognition
por: Vieting, Peter, et al.
Publicado: (2025)

Deep Speech Synthesis from Multimodal Articulatory Representations
por: Wu, Peter, et al.
Publicado: (2024)

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
por: Tao, Dehua, et al.
Publicado: (2024)