Saved in:
| Main Authors: | Ma, Hao, Peng, Zhiyuan, Li, Xu, Shao, Mingjie, Wu, Xixin, Liu, Ju |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.17455 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Language-Queried Target Sound Extraction Without Parallel Training Data
by: Ma, Hao, et al.
Published: (2024)
by: Ma, Hao, et al.
Published: (2024)
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
by: Wu, Wenxuan, et al.
Published: (2024)
by: Wu, Wenxuan, et al.
Published: (2024)
Towards Multimodal Query-Based Spatial Audio Source Extraction
by: Yu, Chenxin, et al.
Published: (2025)
by: Yu, Chenxin, et al.
Published: (2025)
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
by: Saijo, Kohei, et al.
Published: (2024)
by: Saijo, Kohei, et al.
Published: (2024)
Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)
by: Ma, Hao, et al.
Published: (2023)
Target Speech Extraction with Pre-trained Self-supervised Learning Models
by: Peng, Junyi, et al.
Published: (2024)
by: Peng, Junyi, et al.
Published: (2024)
Context-Aware Query Refinement for Target Sound Extraction: Handling Partially Matched Queries
by: Sato, Ryo, et al.
Published: (2025)
by: Sato, Ryo, et al.
Published: (2025)
Towards Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
by: Yang, Yifan, et al.
Published: (2026)
by: Yang, Yifan, et al.
Published: (2026)
Text-Queried Target Sound Event Localization
by: Zhao, Jinzheng, et al.
Published: (2024)
by: Zhao, Jinzheng, et al.
Published: (2024)
Training Strategies for Modality Dropout Resilient Multi-Modal Target Speaker Extraction
by: Korse, Srikanth, et al.
Published: (2025)
by: Korse, Srikanth, et al.
Published: (2025)
Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training
by: Li, Yiming, et al.
Published: (2024)
by: Li, Yiming, et al.
Published: (2024)
Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling
by: Wang, Quanxiu, et al.
Published: (2024)
by: Wang, Quanxiu, et al.
Published: (2024)
Enhancing Intelligibility for Generative Target Speech Extraction via Joint Optimization with Target Speaker ASR
by: Ma, Hao, et al.
Published: (2025)
by: Ma, Hao, et al.
Published: (2025)
Bridging the gap between training and inference in LM-based TTS models
by: Zhang, Ruonan, et al.
Published: (2025)
by: Zhang, Ruonan, et al.
Published: (2025)
ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction
by: Wu, Wenxuan, et al.
Published: (2025)
by: Wu, Wenxuan, et al.
Published: (2025)
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer
by: Wang, Helin, et al.
Published: (2024)
by: Wang, Helin, et al.
Published: (2024)
Regularized Contrastive Pre-training for Few-shot Bioacoustic Sound Detection
by: Moummad, Ilyass, et al.
Published: (2023)
by: Moummad, Ilyass, et al.
Published: (2023)
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
by: Chen, Xueyuan, et al.
Published: (2024)
by: Chen, Xueyuan, et al.
Published: (2024)
DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
by: Chen, Weidong, et al.
Published: (2025)
by: Chen, Weidong, et al.
Published: (2025)
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
by: Chen, Xueyuan, et al.
Published: (2024)
by: Chen, Xueyuan, et al.
Published: (2024)
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
by: Wu, Wenxuan, et al.
Published: (2025)
by: Wu, Wenxuan, et al.
Published: (2025)
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
by: Wu, Donghang, et al.
Published: (2024)
by: Wu, Donghang, et al.
Published: (2024)
Leveraging Language Information for Target Language Extraction
by: Yıldırım, Mehmet Sinan, et al.
Published: (2025)
by: Yıldırım, Mehmet Sinan, et al.
Published: (2025)
TSE-PI: Target Sound Extraction under Reverberant Environments with Pitch Information
by: Wang, Yiwen, et al.
Published: (2024)
by: Wang, Yiwen, et al.
Published: (2024)
Progressive Residual Extraction based Pre-training for Speech Representation Learning
by: Wang, Tianrui, et al.
Published: (2024)
by: Wang, Tianrui, et al.
Published: (2024)
TripleC Learning and Lightweight Speech Enhancement for Multi-Condition Target Speech Extraction
by: Huang, Ziling
Published: (2025)
by: Huang, Ziling
Published: (2025)
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries
by: Cai, Pengfei, et al.
Published: (2025)
by: Cai, Pengfei, et al.
Published: (2025)
Self-Guided Target Sound Extraction and Classification Through Universal Sound Separation Model and Multiple Clues
by: Kwon, Younghoo, et al.
Published: (2025)
by: Kwon, Younghoo, et al.
Published: (2025)
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
by: Dai, Dongyang, et al.
Published: (2025)
by: Dai, Dongyang, et al.
Published: (2025)
MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism
by: Ling, Tongtao, et al.
Published: (2025)
by: Ling, Tongtao, et al.
Published: (2025)
Leveraging Sound Source Trajectories for Universal Sound Separation
by: Wu, Donghang, et al.
Published: (2024)
by: Wu, Donghang, et al.
Published: (2024)
Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection
by: Yin, Han, et al.
Published: (2024)
by: Yin, Han, et al.
Published: (2024)
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study
by: Yuan, Yi, et al.
Published: (2023)
by: Yuan, Yi, et al.
Published: (2023)
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
by: Hernandez-Olivan, Carlos, et al.
Published: (2024)
by: Hernandez-Olivan, Carlos, et al.
Published: (2024)
Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters
by: Li, Jiatong, et al.
Published: (2026)
by: Li, Jiatong, et al.
Published: (2026)
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks
by: Jin, Xutong, et al.
Published: (2024)
by: Jin, Xutong, et al.
Published: (2024)
$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction
by: Wu, Wenxuan, et al.
Published: (2025)
by: Wu, Wenxuan, et al.
Published: (2025)
Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis
by: Zhang, Yixiao, et al.
Published: (2025)
by: Zhang, Yixiao, et al.
Published: (2025)
Similar Items
-
Language-Queried Target Sound Extraction Without Parallel Training Data
by: Ma, Hao, et al.
Published: (2024) -
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
by: Wu, Wenxuan, et al.
Published: (2024) -
Towards Multimodal Query-Based Spatial Audio Source Extraction
by: Yu, Chenxin, et al.
Published: (2025) -
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
by: Saijo, Kohei, et al.
Published: (2024) -
Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)