:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Kocharov, Daniil, Räsänen, Okko
Format:	Preprint
Publié:	2025
Sujets:	Sound Machine Learning
Accès en ligne:	https://arxiv.org/abs/2506.11747
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

Age-Dependent Analysis and Stochastic Generation of Child-Directed Speech
par: Räsänen, Okko, et autres
Publié: (2024)

Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
par: Khorrami, Khazar, et autres
Publié: (2021)

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
par: Ferreira, Alexandre R., et autres
Publié: (2023)

IsoNet: Spatially-aware audio-visual target speech extraction in complex acoustic environments
par: Padhya, Dinanath, et autres
Publié: (2026)

CognoSpeak: an automatic, remote assessment of early cognitive decline in real-world conversational speech
par: Pahar, Madhurananda, et autres
Publié: (2025)

Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription
par: Zehren, Mickaël, et autres
Publié: (2024)

GRAM: Spatial general-purpose audio representations for real-world environments
par: Yuksel, Goksenin, et autres
Publié: (2026)

Training chord recognition models on artificially generated audio
par: Majchrzak, Martyna, et autres
Publié: (2025)

Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models
par: Khorrami, Khazar, et autres
Publié: (2021)

Reconstructing the Charlie Parker Omnibook using an audio-to-score automatic transcription pipeline
par: Riley, Xavier, et autres
Publié: (2024)

Context-aware child-directed speech detection from long-form recordings
par: Charlot, Théo, et autres
Publié: (2026)

Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks
par: Pepino, Leonardo, et autres
Publié: (2025)

The silence of the weights: a structural pruning strategy for attention-based audio signal architectures with second order metrics
par: Diecidue, Andrea, et autres
Publié: (2025)

PFML: Self-Supervised Learning of Time-Series Data Without Representation Collapse
par: Vaaras, Einari, et autres
Publié: (2024)

Transformation of audio embeddings into interpretable, concept-based representations
par: Zhang, Alice, et autres
Publié: (2025)

Towards generalizing deep-audio fake detection networks
par: Gasenzer, Konstantin, et autres
Publié: (2023)

Versatile audio-visual learning for emotion recognition
par: Goncalves, Lucas, et autres
Publié: (2023)

Testing chatbots on the creation of encoders for audio conditioned image generation
par: León, Jorge E., et autres
Publié: (2025)

Unsupervised outlier detection to improve bird audio dataset labels
par: Collins, Bruce
Publié: (2025)

Combining audio control and style transfer using latent diffusion
par: Demerlé, Nils, et autres
Publié: (2024)

Late fusion ensembles for speech recognition on diverse input audio representations
par: Jezidžić, Marin, et autres
Publié: (2024)

Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation
par: Vaaras, Einari, et autres
Publié: (2026)

EnCodecMAE: Leveraging neural codecs for universal audio representation learning
par: Pepino, Leonardo, et autres
Publié: (2023)

Investigating self-supervised representations for audio-visual deepfake detection
par: Boldisor, Dragos-Alexandru, et autres
Publié: (2025)

Cough activity detection for automatic tuberculosis screening
par: van Vüren, Joshua Jansen, et autres
Publié: (2026)

Recomposer: Event-roll-guided generative audio editing
par: Ellis, Daniel P. W., et autres
Publié: (2025)

Sentiment analysis in non-fixed length audios using a Fully Convolutional Neural Network
par: García-Ordás, María Teresa, et autres
Publié: (2024)

Decodable but not structured: linear probing enables Underwater Acoustic Target Recognition with pretrained audio embeddings
par: Hummel, Hilde I., et autres
Publié: (2026)

Optimising MFCC parameters for the automatic detection of respiratory diseases
par: Yan, Yuyang, et autres
Publié: (2024)

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
par: Robinson, David, et autres
Publié: (2024)

DeepForestSound: a multi-species automatic detector for passive acoustic monitoring in African tropical forests, a case study in Kibale National Park
par: Dubus, Gabriel, et autres
Publié: (2026)

Zipformer: A faster and better encoder for automatic speech recognition
par: Yao, Zengwei, et autres
Publié: (2023)

Audio-based automatic mating success prediction of giant pandas
par: Yan, WeiRan, et autres
Publié: (2019)

Robustifying automatic speech recognition by extracting slowly varying features
par: Pizarro, Matías, et autres
Publié: (2021)

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
par: Siuzdak, Hubert
Publié: (2023)

Exploring bat song syllable representations in self-supervised audio encoders
par: Kloots, Marianne de Heer, et autres
Publié: (2024)

Supervised contrastive learning from weakly-labeled audio segments for musical version matching
par: Serrà, Joan, et autres
Publié: (2025)

Investigating Affect Mining Techniques for Annotation Sample Selection in the Creation of Finnish Affective Speech Corpus
par: Lahtinen, Kalle, et autres
Publié: (2025)

Adaptive vector steering: A training-free, layer-wise intervention for hallucination mitigation in large audio and multimodal models
par: Lin, Tsung-En, et autres
Publié: (2025)

Joint sentiment analysis of lyrics and audio in music
par: Schaab, Lea, et autres
Publié: (2024)