:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tan, Weiting, Chen, Yunmo, Chen, Tongfei, Qin, Guanghui, Xu, Haoran, Zhang, Heidi C., Van Durme, Benjamin, Koehn, Philipp
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2402.01172
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation
by: Tan, Weiting, et al.
Published: (2025)

SpecTokenizer: A Lightweight Streaming Codec in the Compressed Spectrum Domain
by: Wan, Zixiang, et al.
Published: (2025)

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)

Neural-Enhanced Dynamic Range Compression Inversion: A Hybrid Approach for Restoring Audio Dynamics
by: Sun, Haoran, et al.
Published: (2024)

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization
by: Ulgen, Ismail Rasim, et al.
Published: (2026)

Objective Evaluation of Prosody and Intelligibility in Speech Synthesis via Conditional Prediction of Discrete Tokens
by: Ulgen, Ismail Rasim, et al.
Published: (2025)

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion
by: Ning, Ziqian, et al.
Published: (2023)

Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai LLM-based ASR
by: Shao, Mingchen, et al.
Published: (2025)

SAC: Neural Speech Codec with Semantic-Acoustic Dual-Stream Quantization
by: Chen, Wenxi, et al.
Published: (2025)

SSR: Alignment-Aware Modality Connector for Speech Language Models
by: Tan, Weiting, et al.
Published: (2024)

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
by: Zhou, Haoran, et al.
Published: (2025)

Zero-Shot Text-to-Speech from Continuous Text Streams
by: Dang, Trung, et al.
Published: (2024)

Dynamic Range Compression and Its Effect on Music Genre Classification
by: Madsen III, Arlyn Reese
Published: (2024)

STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
by: Feng, Tao, et al.
Published: (2025)

StreamAAD: Decoding Spatial Auditory Attention with a Streaming Architecture
by: Qiu, Zelin, et al.
Published: (2024)

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
by: Chen, Peikun, et al.
Published: (2024)

Improving Streaming Speech Recognition With Time-Shifted Contextual Attention And Dynamic Right Context Masking
by: Le, Khanh, et al.
Published: (2025)

Rate-Aware Learned Speech Compression
by: Xu, Jun, et al.
Published: (2025)

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
by: Wang, Zhichao, et al.
Published: (2024)

Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample
by: Chen, Zhiyong, et al.
Published: (2024)

Phoenix-VAD: Streaming Semantic Endpoint Detection for Full-Duplex Speech Interaction
by: Wu, Weijie, et al.
Published: (2025)

SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
by: Chen, Wenxi, et al.
Published: (2024)

Personalized Voice Synthesis through Human-in-the-Loop Coordinate Descent
by: Tian, Yusheng, et al.
Published: (2024)

Dynamic Context-Aware Streaming Pretrained Language Model For Inverse Text Normalization
by: Ho, Luong, et al.
Published: (2025)

CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate
by: Wang, Hankun, et al.
Published: (2025)

Sparsely Shared LoRA on Whisper for Child Speech Recognition
by: Liu, Wei, et al.
Published: (2023)

Towards General Discrete Speech Codec for Complex Acoustic Environments: A Study of Reconstruction and Downstream Task Consistency
by: Wang, Haoran, et al.
Published: (2025)

StreamFlow: Streaming Flow Matching with Block-wise Guided Attention Mask for Speech Token Decoding
by: Guo, Dake, et al.
Published: (2025)

Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)

FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection
by: Wang, Chengyou, et al.
Published: (2026)

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis
by: Tao, Dehua, et al.
Published: (2024)

Enhancing Automatic Chord Recognition through LLM Chain-of-Thought Reasoning
by: Chang, Chih-Cheng, et al.
Published: (2025)

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing
by: Zhang, Hanlin, et al.
Published: (2026)

Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
by: Shi, Jiatong, et al.
Published: (2025)

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge
by: Yu, Guochen, et al.
Published: (2024)

APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
by: Du, Hui-Peng, et al.
Published: (2024)

AV-SSAN: Audio-Visual Selective DoA Estimation through Explicit Multi-Band Semantic-Spatial Alignment
by: Chen, Yu, et al.
Published: (2025)

Unfolding A Few Structures for The Many: Memory-Efficient Compression of Conformer and Speech Foundation Models
by: Li, Zhaoqing, et al.
Published: (2025)

Robust Lossy Audio Compression Identification
by: Koops, Hendrik Vincent, et al.
Published: (2024)