:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Bai, Jisheng, Liu, Haohe, Wang, Mou, Shi, Dongyuan, Wang, Wenwu, Plumbley, Mark D., Gan, Woon-Seng, Chen, Jianfeng
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2411.18953
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AudioLog: LLMs-Powered Long Audio Logging with Hybrid Token-Semantic Contrastive Learning
by: Bai, Jisheng, et al.
Published: (2023)

Efficient Audio Captioning with Encoder-Level Knowledge Distillation
by: Xu, Xuenan, et al.
Published: (2024)

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
by: Mei, Xinhao, et al.
Published: (2023)

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music
by: Yin, Han, et al.
Published: (2024)

AudioTurbo: Fast Text-to-Audio Generation with Rectified Diffusion
by: Zhao, Junqi, et al.
Published: (2025)

DreamAudio: Customized Text-to-Audio Generation with Diffusion Models
by: Yuan, Yi, et al.
Published: (2025)

Retrieval-Augmented Text-to-Audio Generation
by: Yuan, Yi, et al.
Published: (2023)

Learning Temporal Resolution in Spectrogram for Audio Classification
by: Liu, Haohe, et al.
Published: (2022)

Towards Generating Diverse Audio Captions via Adversarial Training
by: Mei, Xinhao, et al.
Published: (2022)

Region-Specific Audio Tagging for Spatial Sound
by: Zhao, Jinzheng, et al.
Published: (2025)

Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study
by: Yuan, Yi, et al.
Published: (2023)

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift
by: Bai, Jisheng, et al.
Published: (2024)

WavCraft: Audio Editing and Generation with Large Language Models
by: Liang, Jinhua, et al.
Published: (2024)

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
by: Liu, Haohe, et al.
Published: (2024)

Zero-Shot Audio Captioning Using Soft and Hard Prompts
by: Zhang, Yiming, et al.
Published: (2024)

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
by: Liu, Haohe, et al.
Published: (2023)

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
by: Yuan, Yi, et al.
Published: (2024)

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
by: Yuan, Yi, et al.
Published: (2024)

SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
by: Liu, Haohe, et al.
Published: (2024)

Discrete Audio Representations for Automated Audio Captioning
by: Tian, Jingguang, et al.
Published: (2025)

EnvSDD: Benchmarking Environmental Sound Deepfake Detection
by: Yin, Han, et al.
Published: (2025)

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)

ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
by: Atito, Sara, et al.
Published: (2022)

Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows
by: Liu, Haohe, et al.
Published: (2025)

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
by: Yuan, Yi, et al.
Published: (2024)

Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
by: Su, Yi, et al.
Published: (2025)

Mixed-gradients Distributed Filtered Reference Least Mean Square Algorithm -- A Robust Distributed Multichannel Active Noise Control Algorithm
by: Ji, Junwei, et al.
Published: (2025)

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
by: Liu, Jizhong, et al.
Published: (2024)

CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions
by: Zhu, Xinfa, et al.
Published: (2025)

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection
by: Bibbó, Gabriel, et al.
Published: (2024)

MACE: Leveraging Audio for Evaluating Audio Captioning Systems
by: Dixit, Satvik, et al.
Published: (2024)

Enhancing Situational Awareness in Wearable Audio Devices Using a Lightweight Sound Event Localization and Detection System
by: Yeow, Jun-Wei, et al.
Published: (2025)

Exploring Text-Queried Sound Event Detection with Audio Source Separation
by: Yin, Han, et al.
Published: (2024)

Measuring Audio's Impact on Correctness: Audio-Contribution-Aware Post-Training of Large Audio Language Models
by: He, Haolin, et al.
Published: (2025)

MiDashengLM: Efficient Audio Understanding with General Audio Captions
by: Dinkel, Heinrich, et al.
Published: (2025)

Sound event localization and classification using WASN in Outdoor Environment
by: Zhang, Dongzhe, et al.
Published: (2024)

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
by: Kim, Jaeyeon, et al.
Published: (2024)

Self-Boosted Weight-Constrained FxLMS: A Robustness Distributed Active Noise Control Algorithm Without Internode Communication
by: Ji, Junwei, et al.
Published: (2025)

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)

Squeeze-and-Excite ResNet-Conformers for Sound Event Localization, Detection, and Distance Estimation for DCASE 2024 Challenge
by: Yeow, Jun Wei, et al.
Published: (2024)