:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Kawamura, Takao, Niizumi, Daisuke, Ono, Nobutaka
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Audio and Speech Processing Sound
Accesso online:	https://arxiv.org/abs/2602.15307
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection
di: Niizumi, Daisuke, et al.
Pubblicazione: (2024)

Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
di: Yasuda, Masahiro, et al.
Pubblicazione: (2025)

Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
di: Niizumi, Daisuke, et al.
Pubblicazione: (2026)

M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
di: Niizumi, Daisuke, et al.
Pubblicazione: (2024)

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
di: Tsubaki, Shunsuke, et al.
Pubblicazione: (2024)

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
di: Niizumi, Daisuke, et al.
Pubblicazione: (2024)

SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
di: Hernandez-Olivan, Carlos, et al.
Pubblicazione: (2024)

Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers
di: Masuyama, Yoshiki, et al.
Pubblicazione: (2025)

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
di: Takeuchi, Daiki, et al.
Pubblicazione: (2025)

Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
di: Nguyen, Binh Thien, et al.
Pubblicazione: (2025)

Towards Pre-training an Effective Respiratory Audio Foundation Model
di: Niizumi, Daisuke, et al.
Pubblicazione: (2025)

Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
di: Niizumi, Daisuke, et al.
Pubblicazione: (2025)

Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
di: Nishida, Tomoya, et al.
Pubblicazione: (2026)

AND: Audio Network Dissection for Interpreting Deep Acoustic Models
di: Wu, Tung-Yu, et al.
Pubblicazione: (2024)

Listenable Maps for Audio Classifiers
di: Paissan, Francesco, et al.
Pubblicazione: (2024)

UniAudio: An Audio Foundation Model Toward Universal Audio Generation
di: Yang, Dongchao, et al.
Pubblicazione: (2023)

What Are They Doing? Joint Audio-Speech Co-Reasoning
di: Wang, Yingzhi, et al.
Pubblicazione: (2024)

RF-GML: Reference-Free Generative Machine Listener
di: Biswas, Arijit, et al.
Pubblicazione: (2024)

Description and Discussion on DCASE 2025 Challenge Task 2: First-shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
di: Nishida, Tomoya, et al.
Pubblicazione: (2025)

AudioLCM: Text-to-Audio Generation with Latent Consistency Models
di: Liu, Huadai, et al.
Pubblicazione: (2024)

Listen, Think, and Understand
di: Gong, Yuan, et al.
Pubblicazione: (2023)

Active Listener: Continuous Generation of Listener's Head Motion Response in Dyadic Interactions
di: Ghosh, Bishal, et al.
Pubblicazione: (2024)

Requirements for Mass Adoption of Assistive Listening Technology by the General Public
di: Kaufmann, Thomas B., et al.
Pubblicazione: (2023)

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing
di: Xiao, Yang, et al.
Pubblicazione: (2025)

Latent Watermarking of Audio Generative Models
di: Roman, Robin San, et al.
Pubblicazione: (2024)

Listenable Maps for Zero-Shot Audio Classifiers
di: Paissan, Francesco, et al.
Pubblicazione: (2024)

DIFFA: Large Language Diffusion Models Can Listen and Understand
di: Zhou, Jiaming, et al.
Pubblicazione: (2025)

Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
di: Rouditchenko, Andrew, et al.
Pubblicazione: (2025)

Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
di: Chung, Soo-Whan, et al.
Pubblicazione: (2025)

Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech Generation from SSL features
di: Ohnaka, Hien, et al.
Pubblicazione: (2026)

Learning How to Listen: A Temporal-Frequential Attention Model for Sound Event Detection
di: Shen, Yu-Han, et al.
Pubblicazione: (2018)

SemanticAudio: Audio Generation and Editing in Semantic Space
di: Dai, Zheqi, et al.
Pubblicazione: (2026)

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
di: Comunità, Marco, et al.
Pubblicazione: (2024)

LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
di: Guan, Wenhao, et al.
Pubblicazione: (2024)

Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
di: Plaquet, Alexis, et al.
Pubblicazione: (2025)

Reproducing the Acoustic Velocity Vectors in a Circular Listening Area
di: Wang, Jiarui, et al.
Pubblicazione: (2024)

Listening broadband physical model for microphones: a first step
di: Millot, Laurent, et al.
Pubblicazione: (2024)

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
di: Liu, Rui, et al.
Pubblicazione: (2025)

When Audio-LLMs Don't Listen: A Cross-Linguistic Study of Modality Arbitration
di: Billa, Jayadev
Pubblicazione: (2026)

SRC-gAudio: Sampling-Rate-Controlled Audio Generation
di: Li, Chenxing, et al.
Pubblicazione: (2024)