Saved in:
| Main Authors: | Ru, Ganghui, Wang, Jieying, Zhao, Jiahao, Wu, Yulun, Yu, Yi, Jiang, Nannan, Wang, Wei, Li, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.09788 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BeatFM: Improving Beat Tracking with Pre-trained Music Foundation Model
by: Ru, Ganghui, et al.
Published: (2025)
by: Ru, Ganghui, et al.
Published: (2025)
Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection
by: Wei, Weixing, et al.
Published: (2025)
by: Wei, Weixing, et al.
Published: (2025)
Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features
by: Deng, Jiajun, et al.
Published: (2025)
by: Deng, Jiajun, et al.
Published: (2025)
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
by: Chang, Chih-Cheng, et al.
Published: (2023)
by: Chang, Chih-Cheng, et al.
Published: (2023)
Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
by: Wang, Honghong, et al.
Published: (2025)
by: Wang, Honghong, et al.
Published: (2025)
MaskBeat: Loopable Drum Beat Generation
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment
by: Wang, Ke, et al.
Published: (2025)
by: Wang, Ke, et al.
Published: (2025)
The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
by: Ahn, Jaehoon, et al.
Published: (2026)
by: Ahn, Jaehoon, et al.
Published: (2026)
Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking
by: Gagnere, Antonin, et al.
Published: (2025)
by: Gagnere, Antonin, et al.
Published: (2025)
Fine-Tuning ASR for Stuttered Speech: Personalized vs. Generalized Approaches
by: Mujtaba, Dena, et al.
Published: (2025)
by: Mujtaba, Dena, et al.
Published: (2025)
Efficient Emotion and Speaker Adaptation in LLM-Based TTS via Characteristic-Specific Partial Fine-Tuning
by: Wang, Tianrui, et al.
Published: (2025)
by: Wang, Tianrui, et al.
Published: (2025)
Metadata-Enhanced Speech Emotion Recognition: Augmented Residual Integration and Co-Attention in Two-Stage Fine-Tuning
by: Wan, Zixiang, et al.
Published: (2024)
by: Wan, Zixiang, et al.
Published: (2024)
Unsupervised Multi-channel Speech Dereverberation via Diffusion
by: Wu, Yulun, et al.
Published: (2025)
by: Wu, Yulun, et al.
Published: (2025)
Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
by: Huang, Zikai, et al.
Published: (2024)
by: Huang, Zikai, et al.
Published: (2024)
HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models
by: Mu, Bingshen, et al.
Published: (2024)
by: Mu, Bingshen, et al.
Published: (2024)
PhiNet: Speaker Verification with Phonetic Interpretability
by: Ma, Yi, et al.
Published: (2026)
by: Ma, Yi, et al.
Published: (2026)
SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
by: Wei, Linye, et al.
Published: (2025)
by: Wei, Linye, et al.
Published: (2025)
Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
by: Zang, Yongyi, et al.
Published: (2024)
by: Zang, Yongyi, et al.
Published: (2024)
Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models
by: Jing, Ruihao, et al.
Published: (2025)
by: Jing, Ruihao, et al.
Published: (2025)
BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
by: Fan, Cunhang, et al.
Published: (2024)
by: Fan, Cunhang, et al.
Published: (2024)
Improving Code Switching with Supervised Fine Tuning and GELU Adapters
by: Pham, Linh
Published: (2025)
by: Pham, Linh
Published: (2025)
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
by: Hou, Siyuan, et al.
Published: (2024)
by: Hou, Siyuan, et al.
Published: (2024)
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
by: Xu, Chun, et al.
Published: (2024)
by: Xu, Chun, et al.
Published: (2024)
Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection
by: Fan, Cunhang, et al.
Published: (2023)
by: Fan, Cunhang, et al.
Published: (2023)
Unifying Symbolic Music Arrangement: Track-Aware Reconstruction and Structured Tokenization
by: Ou, Longshen, et al.
Published: (2024)
by: Ou, Longshen, et al.
Published: (2024)
Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
by: Zheng, Xinhu, et al.
Published: (2024)
by: Zheng, Xinhu, et al.
Published: (2024)
ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
by: Wang, Yuanyuan, et al.
Published: (2024)
by: Wang, Yuanyuan, et al.
Published: (2024)
Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
by: Murgul, Sebastian, et al.
Published: (2025)
by: Murgul, Sebastian, et al.
Published: (2025)
Algorithms for Collaborative Harmonization
by: Briman, Eyal, et al.
Published: (2025)
by: Briman, Eyal, et al.
Published: (2025)
BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement
by: Ye, Zhe, et al.
Published: (2026)
by: Ye, Zhe, et al.
Published: (2026)
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
by: Ding, Shaojin, et al.
Published: (2023)
by: Ding, Shaojin, et al.
Published: (2023)
FGCL: Fine-grained Contrastive Learning For Mandarin Stuttering Event Detection
by: Jiang, Han, et al.
Published: (2024)
by: Jiang, Han, et al.
Published: (2024)
GeHirNet: A Gender-Aware Hierarchical Model for Voice Pathology Classification
by: Wu, Fan, et al.
Published: (2025)
by: Wu, Fan, et al.
Published: (2025)
Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion
by: Ruggiero, Giuseppe, et al.
Published: (2024)
by: Ruggiero, Giuseppe, et al.
Published: (2024)
MusFlow: Multimodal Music Generation via Conditional Flow Matching
by: Song, Jiahao, et al.
Published: (2025)
by: Song, Jiahao, et al.
Published: (2025)
Multi-Level Speaker Representation for Target Speaker Extraction
by: Zhang, Ke, et al.
Published: (2024)
by: Zhang, Ke, et al.
Published: (2024)
UniSRM: A Unified Speech Reward Model for Reasoning-Based Fine-grained Assessment
by: Wang, Yuanyuan, et al.
Published: (2026)
by: Wang, Yuanyuan, et al.
Published: (2026)
Personalized Fine-Tuning with Controllable Synthetic Speech from LLM-Generated Transcripts for Dysarthric Speech Recognition
by: Wagner, Dominik, et al.
Published: (2025)
by: Wagner, Dominik, et al.
Published: (2025)
Similar Items
-
BeatFM: Improving Beat Tracking with Pre-trained Music Foundation Model
by: Ru, Ganghui, et al.
Published: (2025) -
Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection
by: Wei, Weixing, et al.
Published: (2025) -
Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features
by: Deng, Jiajun, et al.
Published: (2025) -
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
by: Chang, Chih-Cheng, et al.
Published: (2023) -
Speech Emotion Recognition Using Fine-Tuned DWFormer:A Study on Track 1 of the IERPChallenge 2024
by: Wang, Honghong, et al.
Published: (2025)