Saved in:
| Main Authors: | Xie, Huang, Khorrami, Khazar, Räsänen, Okko, Virtanen, Tuomas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.14939 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Text-based Audio Retrieval by Learning from Similarities between Audio Captions
by: Xie, Huang, et al.
Published: (2024)
by: Xie, Huang, et al.
Published: (2024)
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
by: Khorrami, Khazar, et al.
Published: (2023)
by: Khorrami, Khazar, et al.
Published: (2023)
A model of early word acquisition based on realistic-scale audiovisual naming events
by: Khorrami, Khazar, et al.
Published: (2024)
by: Khorrami, Khazar, et al.
Published: (2024)
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
by: Khorrami, Khazar, et al.
Published: (2021)
by: Khorrami, Khazar, et al.
Published: (2021)
Age-Dependent Analysis and Stochastic Generation of Child-Directed Speech
by: Räsänen, Okko, et al.
Published: (2024)
by: Räsänen, Okko, et al.
Published: (2024)
Multi-label Zero-Shot Audio Classification with Temporal Attention
by: Dogan, Duygu, et al.
Published: (2024)
by: Dogan, Duygu, et al.
Published: (2024)
Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking
by: Moritz, Martin, et al.
Published: (2024)
by: Moritz, Martin, et al.
Published: (2024)
Inter-Speaker Relative Cues for Two-Stage Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2026)
by: Dai, Wang, et al.
Published: (2026)
Evaluating the Temporal Detection Capability of Integrated Gradients Applied on Sound Classifier
by: Dumpis, Martynas, et al.
Published: (2026)
by: Dumpis, Martynas, et al.
Published: (2026)
Representation Learning for Semantic Alignment of Language, Audio, and Visual Modalities
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)
by: Sudarsanam, Parthasaarathy, et al.
Published: (2025)
Impact of Microphone Array Mismatches to Learning-based Replay Speech Detection
by: Neri, Michael, et al.
Published: (2025)
by: Neri, Michael, et al.
Published: (2025)
Computational modeling of early language learning from acoustic speech and audiovisual input without linguistic priors
by: Räsänen, Okko
Published: (2026)
by: Räsänen, Okko
Published: (2026)
Inter-Speaker Relative Cues for Text-Guided Target Speech Extraction
by: Dai, Wang, et al.
Published: (2025)
by: Dai, Wang, et al.
Published: (2025)
Multi-channel Replay Speech Detection using an Adaptive Learnable Beamformer
by: Neri, Michael, et al.
Published: (2025)
by: Neri, Michael, et al.
Published: (2025)
Speaker Distance Estimation in Enclosures from Single-Channel Audio
by: Neri, Michael, et al.
Published: (2024)
by: Neri, Michael, et al.
Published: (2024)
Automatic Contextual Audio Denoising
by: Luong, Diep, et al.
Published: (2026)
by: Luong, Diep, et al.
Published: (2026)
Automatic Live Music Song Identification Using Multi-level Deep Sequence Similarity Learning
by: Hakala, Aapo, et al.
Published: (2025)
by: Hakala, Aapo, et al.
Published: (2025)
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
by: Khorrami, Khazar, et al.
Published: (2021)
by: Khorrami, Khazar, et al.
Published: (2021)
Neural Ambisonics encoding for compact irregular microphone arrays
by: Heikkinen, Mikko, et al.
Published: (2024)
by: Heikkinen, Mikko, et al.
Published: (2024)
Multi-Channel Replay Speech Detection using Acoustic Maps
by: Neri, Michael, et al.
Published: (2026)
by: Neri, Michael, et al.
Published: (2026)
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
by: Dai, Wang, et al.
Published: (2024)
by: Dai, Wang, et al.
Published: (2024)
Beyond Omnidirectional: Neural Ambisonics Encoding for Arbitrary Microphone Directivity Patterns using Cross-Attention
by: Heikkinen, Mikko, et al.
Published: (2026)
by: Heikkinen, Mikko, et al.
Published: (2026)
Acoustic Simulation Framework for Multi-channel Replay Speech Detection
by: Neri, Michael, et al.
Published: (2025)
by: Neri, Michael, et al.
Published: (2025)
Adversarial Representation Learning for Robust Privacy Preservation in Audio
by: Gharib, Shayan, et al.
Published: (2023)
by: Gharib, Shayan, et al.
Published: (2023)
Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
by: Zhang, Shiqi, et al.
Published: (2025)
by: Zhang, Shiqi, et al.
Published: (2025)
Representation Learning for Audio Privacy Preservation using Source Separation and Robust Adversarial Learning
by: Luong, Diep, et al.
Published: (2023)
by: Luong, Diep, et al.
Published: (2023)
Multi-Utterance Speech Separation and Association Trained on Short Segments
by: Wang, Yuzhu, et al.
Published: (2025)
by: Wang, Yuzhu, et al.
Published: (2025)
Moving Speaker Separation via Parallel Spectral-Spatial Processing
by: Wang, Yuzhu, et al.
Published: (2026)
by: Wang, Yuzhu, et al.
Published: (2026)
Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers
by: Wang, Yuzhu, et al.
Published: (2025)
by: Wang, Yuzhu, et al.
Published: (2025)
Computer Audition: From Task-Specific Machine Learning to Foundation Models
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)
Discriminating real and synthetic super-resolved audio samples using embedding-based classifiers
by: Silaev, Mikhail, et al.
Published: (2026)
by: Silaev, Mikhail, et al.
Published: (2026)
Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation
by: Neri, Michael, et al.
Published: (2026)
by: Neri, Michael, et al.
Published: (2026)
From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning
by: Martinsson, John, et al.
Published: (2024)
by: Martinsson, John, et al.
Published: (2024)
Learning Perceptually Relevant Temporal Envelope Morphing
by: Dixit, Satvik, et al.
Published: (2025)
by: Dixit, Satvik, et al.
Published: (2025)
A decade of DCASE: Achievements, practices, evaluations and future challenges
by: Mesaros, Annamaria, et al.
Published: (2024)
by: Mesaros, Annamaria, et al.
Published: (2024)
Gen-A: Generalizing Ambisonics Neural Encoding to Unseen Microphone Arrays
by: Heikkinen, Mikko, et al.
Published: (2025)
by: Heikkinen, Mikko, et al.
Published: (2025)
Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance
by: Luong, Diep, et al.
Published: (2025)
by: Luong, Diep, et al.
Published: (2025)
Score-informed Music Source Separation: Improving Synthetic-to-real Generalization in Classical Music
by: Tunturi, Eetu, et al.
Published: (2025)
by: Tunturi, Eetu, et al.
Published: (2025)
BabySLM: language-acquisition-friendly benchmark of self-supervised spoken language models
by: Lavechin, Marvin, et al.
Published: (2023)
by: Lavechin, Marvin, et al.
Published: (2023)
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
Similar Items
-
Text-based Audio Retrieval by Learning from Similarities between Audio Captions
by: Xie, Huang, et al.
Published: (2024) -
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
by: Khorrami, Khazar, et al.
Published: (2023) -
A model of early word acquisition based on realistic-scale audiovisual naming events
by: Khorrami, Khazar, et al.
Published: (2024) -
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
by: Khorrami, Khazar, et al.
Published: (2021) -
Age-Dependent Analysis and Stochastic Generation of Child-Directed Speech
by: Räsänen, Okko, et al.
Published: (2024)