:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ru, Ganghui, Wang, Jieying, Zhao, Jiahao, Wu, Yulun, Yu, Yi, Jiang, Nannan, Wang, Wei, Li, Wei
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2508.09788
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BeatFM: Improving Beat Tracking with Pre-trained Music Foundation Model
by: Ru, Ganghui, et al.
Published: (2025)

Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection
by: Wei, Weixing, et al.
Published: (2025)

Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features
by: Deng, Jiajun, et al.
Published: (2025)

BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
by: Chang, Chih-Cheng, et al.
Published: (2023)

Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
by: Wang, Honghong, et al.
Published: (2025)

MaskBeat: Loopable Drum Beat Generation
by: Lanzendörfer, Luca A., et al.
Published: (2025)

Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment
by: Wang, Ke, et al.
Published: (2025)

The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
by: Ahn, Jaehoon, et al.
Published: (2026)

Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking
by: Gagnere, Antonin, et al.
Published: (2025)

Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)

Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)

Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)

Unsupervised Multi-channel Speech Dereverberation via Diffusion
by: Wu, Yulun, et al.
Published: (2025)

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
by: Huang, Zikai, et al.
Published: (2024)

HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
by: Mu, Bingshen, et al.
Published: (2024)

PhiNet: Speaker Verification with Phonetic Interpretability
by: Ma, Yi, et al.
Published: (2026)

SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)

Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
by: Zang, Yongyi, et al.
Published: (2024)

Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
by: Fan, Cunhang, et al.
Published: (2024)

Improving Code Switching with Supervised Fine Tuning and GELU Adapters
by: Pham, Linh
Published: (2025)

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
by: Hou, Siyuan, et al.
Published: (2024)

Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
by: Xu, Chun, et al.
Published: (2024)

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2023)

Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
by: Ou, Longshen, et al.
Published: (2024)

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)

ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
by: Wang, Yuanyuan, et al.
Published: (2024)

Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
by: Murgul, Sebastian, et al.
Published: (2025)

Algorithms for Collaborative Harmonization
by: Briman, Eyal, et al.
Published: (2025)

BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement
by: Ye, Zhe, et al.
Published: (2026)

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)

FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection
by: Jiang, Han, et al.
Published: (2024)

GeHirNet: A Gender-Aware Hierarchical Model for Voice Pathology Classification
by: Wu, Fan, et al.
Published: (2025)

Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)

Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
by: Ruggiero, Giuseppe, et al.
Published: (2024)

MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)

Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)

UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment
by: Wang, Yuanyuan, et al.
Published: (2026)

Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
by: Wagner, Dominik, et al.
Published: (2025)