Saved in:
| Main Authors: | Tan, Weiting, Chen, Yunmo, Chen, Tongfei, Qin, Guanghui, Xu, Haoran, Zhang, Heidi C., Van Durme, Benjamin, Koehn, Philipp |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01172 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
by: Tan, Weiting, et al.
Published: (2025)
by: Tan, Weiting, et al.
Published: (2025)
SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain
by: Wan, Zixiang, et al.
Published: (2025)
by: Wan, Zixiang, et al.
Published: (2025)
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics
by: Sun, Haoran, et al.
Published: (2024)
by: Sun, Haoran, et al.
Published: (2024)
DiffAnon: Diffusion-based Prosody Control for Voice Anonymization
by: Ulgen, Ismail Rasim, et al.
Published: (2026)
by: Ulgen, Ismail Rasim, et al.
Published: (2026)
Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
by: Ulgen, Ismail Rasim, et al.
Published: (2025)
DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)
by: Ning, Ziqian, et al.
Published: (2023)
Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
by: Shao, Mingchen, et al.
Published: (2025)
by: Shao, Mingchen, et al.
Published: (2025)
SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
by: Chen, Wenxi, et al.
Published: (2025)
by: Chen, Wenxi, et al.
Published: (2025)
SSR: Alignment-Aware Modality Connector for Speech Language Models
by: Tan, Weiting, et al.
Published: (2024)
by: Tan, Weiting, et al.
Published: (2024)
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
by: Zhou, Haoran, et al.
Published: (2025)
by: Zhou, Haoran, et al.
Published: (2025)
Zero-Shot Text-to-Speech from Continuous Text Streams
by: Dang, Trung, et al.
Published: (2024)
by: Dang, Trung, et al.
Published: (2024)
Dynamic Range Compression and Its Effect on Music Genre Classification
by: Madsen III, Arlyn Reese
Published: (2024)
by: Madsen III, Arlyn Reese
Published: (2024)
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
by: Feng, Tao, et al.
Published: (2025)
by: Feng, Tao, et al.
Published: (2025)
StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture
by: Qiu, Zelin, et al.
Published: (2024)
by: Qiu, Zelin, et al.
Published: (2024)
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)
by: Chen, Peikun, et al.
Published: (2024)
Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
Rate-Aware Learned Speech Compression
by: Xu, Jun, et al.
Published: (2025)
by: Xu, Jun, et al.
Published: (2025)
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)
by: Wang, Zhichao, et al.
Published: (2024)
Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample
by: Chen, Zhiyong, et al.
Published: (2024)
by: Chen, Zhiyong, et al.
Published: (2024)
Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)
by: Wu, Weijie, et al.
Published: (2025)
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
by: Chen, Wenxi, et al.
Published: (2024)
by: Chen, Wenxi, et al.
Published: (2024)
Personalized Voice Synthesis through Human-in-the-Loop Coordinate Descent
by: Tian, Yusheng, et al.
Published: (2024)
by: Tian, Yusheng, et al.
Published: (2024)
Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization
by: Ho, Luong, et al.
Published: (2025)
by: Ho, Luong, et al.
Published: (2025)
CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate
by: Wang, Hankun, et al.
Published: (2025)
by: Wang, Hankun, et al.
Published: (2025)
Sparsely Shared LoRA on Whisper for Child Speech Recognition
by: Liu, Wei, et al.
Published: (2023)
by: Liu, Wei, et al.
Published: (2023)
Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency
by: Wang, Haoran, et al.
Published: (2025)
by: Wang, Haoran, et al.
Published: (2025)
StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)
by: Guo, Dake, et al.
Published: (2025)
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection
by: Wang, Chengyou, et al.
Published: (2026)
by: Wang, Chengyou, et al.
Published: (2026)
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)
by: Tao, Dehua, et al.
Published: (2024)
Enhancing Automatic Chord Recognition through LLM Chain-of-Thought Reasoning
by: Chang, Chih-Cheng, et al.
Published: (2025)
by: Chang, Chih-Cheng, et al.
Published: (2025)
SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing
by: Zhang, Hanlin, et al.
Published: (2026)
by: Zhang, Hanlin, et al.
Published: (2026)
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)
by: Dinkel, Heinrich, et al.
Published: (2023)
PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
by: Yu, Guochen, et al.
Published: (2024)
by: Yu, Guochen, et al.
Published: (2024)
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
by: Du, Hui-Peng, et al.
Published: (2024)
by: Du, Hui-Peng, et al.
Published: (2024)
AV-SSAN: Audio-Visual Selective DoA Estimation through Explicit Multi-Band Semantic-Spatial Alignment
by: Chen, Yu, et al.
Published: (2025)
by: Chen, Yu, et al.
Published: (2025)
Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
by: Li, Zhaoqing, et al.
Published: (2025)
by: Li, Zhaoqing, et al.
Published: (2025)
Robust Lossy Audio Compression Identification
by: Koops, Hendrik Vincent, et al.
Published: (2024)
by: Koops, Hendrik Vincent, et al.
Published: (2024)
Similar Items
-
Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
by: Tan, Weiting, et al.
Published: (2025) -
SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain
by: Wan, Zixiang, et al.
Published: (2025) -
StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024) -
Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics
by: Sun, Haoran, et al.
Published: (2024) -
DiffAnon: Diffusion-based Prosody Control for Voice Anonymization
by: Ulgen, Ismail Rasim, et al.
Published: (2026)