:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xie, Huang, Khorrami, Khazar, Räsänen, Okko, Virtanen, Tuomas
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2408.14939
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Text-based Audio Retrieval by Learning from Similarities between Audio Captions
by: Xie, Huang, et al.
Published: (2024)

Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
by: Khorrami, Khazar, et al.
Published: (2023)

A model of early word acquisition based on realistic-scale audiovisual naming events
by: Khorrami, Khazar, et al.
Published: (2024)

Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
by: Khorrami, Khazar, et al.
Published: (2021)

Age-Dependent Analysis and Stochastic Generation of Child-Directed Speech
by: Räsänen, Okko, et al.
Published: (2024)

Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)

Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking
by: Moritz, Martin, et al.
Published: (2024)

Inter-Speaker Relative Cues for Two-Stage Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2026)

Evaluating the Temporal Detection Capability of Integrated Gradients Applied on Sound Classifier
by: Dumpis, Martynas, et al.
Published: (2026)

Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)

Impact of Microphone Array Mismatches to Learning-based Replay Speech Detection
by: Neri, Michael, et al.
Published: (2025)

Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
by: Räsänen, Okko
Published: (2026)

Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2025)

Multi-channel Replay Speech Detection using an Adaptive Learnable Beamformer
by: Neri, Michael, et al.
Published: (2025)

Speaker Distance Estimation in Enclosures from Single-Channel Audio
by: Neri, Michael, et al.
Published: (2024)

Automatic Contextual Audio Denoising
by: Luong, Diep, et al.
Published: (2026)

Automatic Live Music Song Identification Using Multi-level Deep Sequence Similarity Learning
by: Hakala, Aapo, et al.
Published: (2025)

Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
by: Khorrami, Khazar, et al.
Published: (2021)

Neural Ambisonics encoding for compact irregular microphone arrays
by: Heikkinen, Mikko, et al.
Published: (2024)

Multi-Channel Replay Speech Detection using Acoustic Maps
by: Neri, Michael, et al.
Published: (2026)

Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
by: Dai, Wang, et al.
Published: (2024)

Beyond Omnidirectional: Neural Ambisonics Encoding for Arbitrary Microphone Directivity Patterns using Cross-Attention
by: Heikkinen, Mikko, et al.
Published: (2026)

Acoustic Simulation Framework for Multi-channel Replay Speech Detection
by: Neri, Michael, et al.
Published: (2025)

Adversarial Representation Learning for Robust Privacy Preservation in Audio
by: Gharib, Shayan, et al.
Published: (2023)

Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
by: Zhang, Shiqi, et al.
Published: (2025)

Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning
by: Luong, Diep, et al.
Published: (2023)

Multi-Utterance Speech Separation and Association Trained on Short Segments
by: Wang, Yuzhu, et al.
Published: (2025)

Moving Speaker Separation via Parallel Spectral-Spatial Processing
by: Wang, Yuzhu, et al.
Published: (2026)

Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers
by: Wang, Yuzhu, et al.
Published: (2025)

Computer Audition: From Task-Specific Machine Learning to Foundation Models
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)

Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers
by: Silaev, Mikhail, et al.
Published: (2026)

Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
by: Neri, Michael, et al.
Published: (2026)

From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning
by: Martinsson, John, et al.
Published: (2024)

Learning Perceptually Relevant Temporal Envelope Morphing
by: Dixit, Satvik, et al.
Published: (2025)

A decade of DCASE: Achievements, practices, evaluations and future challenges
by: Mesaros, Annamaria, et al.
Published: (2024)

Gen-A: Generalizing Ambisonics Neural Encoding to Unseen Microphone Arrays
by: Heikkinen, Mikko, et al.
Published: (2025)

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
by: Luong, Diep, et al.
Published: (2025)

Score-informed Music Source Separation: Improving Synthetic-to-real Generalization in Classical Music
by: Tunturi, Eetu, et al.
Published: (2025)

BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models
by: Lavechin, Marvin, et al.
Published: (2023)

AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)