:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chang, Chih-Cheng, Su, Li
Format:	Preprint
Published:	2023
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2312.17156
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features
by: Deng, Jiajun, et al.
Published: (2025)

Beat and Downbeat Tracking in Performance MIDI Using an End-to-End Transformer Architecture
by: Murgul, Sebastian, et al.
Published: (2025)

Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)

HingeNet: A Harmonic-Aware Fine-Tuning Approach for Beat Tracking
by: Ru, Ganghui, et al.
Published: (2025)

MaskBeat: Loopable Drum Beat Generation
by: Lanzendörfer, Luca A., et al.
Published: (2025)

Enhancing Automatic Chord Recognition through LLM Chain-of-Thought Reasoning
by: Chang, Chih-Cheng, et al.
Published: (2025)

The SMC Blind Spot: A Failure Mode Analysis of State-of-the-Art Beat Tracking
by: Ahn, Jaehoon, et al.
Published: (2026)

Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations
by: Wachter, Maximilian, et al.
Published: (2026)

Controlling Contrastive Self-Supervised Learning with Knowledge-Driven Multiple Hypothesis: Application to Beat Tracking
by: Gagnere, Antonin, et al.
Published: (2025)

Beat-It: Beat-Synchronized Multi-Condition 3D Dance Generation
by: Huang, Zikai, et al.
Published: (2024)

Streaming Sortformer: Speaker Cache-Based Online Speaker Diarization with Arrival-Time Ordering
by: Medennikov, Ivan, et al.
Published: (2025)

LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
by: Liang, Di, et al.
Published: (2024)

A New Perspective on Speaker Verification: Joint Modeling with DFSMN and Transformer
by: Wang, Hongyu, et al.
Published: (2023)

SmoothSync: Dual-Stream Diffusion Transformers for Jitter-Robust Beat-Synchronized Gesture Generation from Quantized Audio
by: Jiang, Yujiao, et al.
Published: (2026)

StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture
by: Qiu, Zelin, et al.
Published: (2024)

Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation
by: Bai, Ye, et al.
Published: (2024)

DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions
by: Xue, Ke, et al.
Published: (2025)

Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025)

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer
by: Lei, Ke, et al.
Published: (2026)

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
by: Shakeel, Muhammad, et al.
Published: (2024)

Beat-Based Rhythm Quantization of MIDI Performances
by: Wachter, Maximilian, et al.
Published: (2025)

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)

Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency
by: Xi, Yu, et al.
Published: (2024)

Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement
by: Shi, Hao, et al.
Published: (2025)

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024)

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR
by: Zhao, Wenbo, et al.
Published: (2024)

Accelerating Diffusion Transformer-Based Text-to-Speech with Transformer Layer Caching
by: Sakpiboonchit, Siratish
Published: (2025)

PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data
by: Tan, Chih-Pin, et al.
Published: (2024)

DNN-Based Online Source Counting Based on Spatial Generalized Magnitude Squared Coherence
by: Gode, Henri, et al.
Published: (2026)

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)

Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking
by: Vendrame, Katia, et al.
Published: (2025)

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)

Robust Online Overdetermined Independent Vector Analysis Based on Bilinear Decomposition
by: Chen, Kang, et al.
Published: (2026)

ASTAR-NTU solution to AudioMOS Challenge 2025 Track1
by: Ritter-Gutierrez, Fabian, et al.
Published: (2025)

Relationships between Keywords and Strong Beats in Lyrical Music
by: Liao, Callie C., et al.
Published: (2024)

Beat this! Accurate beat tracking without DBN postprocessing
by: Foscarin, Francesco, et al.
Published: (2024)