:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Chong, Liu, Yanqing, Zheng, Yang, Zhao, Sheng
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2406.04633
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation
by: Gu, Yi, et al.
Published: (2026)

Spectrogram features for audio and speech analysis
by: McLoughlin, Ian, et al.
Published: (2026)

DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025)

SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG
by: Fan, Cunhang, et al.
Published: (2025)

DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction
by: Fan, Cunhang, et al.
Published: (2025)

Is GAN Necessary for Mel-Spectrogram-based Neural Vocoder?
by: Du, Hui-Peng, et al.
Published: (2025)

A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation
by: Pei, Hanchen, et al.
Published: (2026)

ASM: Audio Spectrogram Mixer
by: Ji, Qingfeng, et al.
Published: (2024)

Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy
by: Xue, Ke, et al.
Published: (2026)

Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
by: Yuan, Ze, et al.
Published: (2024)

BFA: Real-time Multilingual Text-to-speech Forced Alignment
by: Rehman, Abdul, et al.
Published: (2025)

FAST: Fast Audio Spectrogram Transformer
by: Naman, Anugunj, et al.
Published: (2025)

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
by: Cappellazzo, Umberto, et al.
Published: (2023)

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
by: Comunità, Marco, et al.
Published: (2024)

SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
by: Fucci, Dennis, et al.
Published: (2024)

A Survey of Deep Learning for Complex Speech Spectrograms
by: Xie, Yuying, et al.
Published: (2025)

DBMIF: a deep balanced multimodal iterative fusion framework for air- and bone-conduction speech enhancement
by: Wu, Yilei, et al.
Published: (2026)

Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation
by: Ma, Jianbo, et al.
Published: (2026)

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
by: Dixit, Satvik, et al.
Published: (2024)

Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023)

Detection of manatee vocalisations using the Audio Spectrogram Transformer
by: Schiappacasse, Stefano, et al.
Published: (2024)

SPGM: Prioritizing Local Features for enhanced speech separation performance
by: Yip, Jia Qi, et al.
Published: (2023)

Selective Classifier-free Guidance for Zero-shot Text-to-speech
by: Zheng, John, et al.
Published: (2025)

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
by: Chen, Sanyuan, et al.
Published: (2024)

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
by: Wang, Jianzong, et al.
Published: (2023)

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)

I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception
by: Zhang, Jiawei, et al.
Published: (2024)

Fusion of Modulation Spectrogram and SSL with Multi-head Attention for Fake Speech Detection
by: N, Rishith Sadashiv T, et al.
Published: (2025)

Joint Spectrogram Separation and TDOA Estimation using Optimal Transport
by: Fabiani, Linda, et al.
Published: (2025)

Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs
by: Langman, Ryan, et al.
Published: (2024)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)

Evaluating CNN with Stacked Feature Representations and Audio Spectrogram Transformer Models for Sound Classification
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)

A Practical Guide to Spectrogram Analysis for Audio Signal Processing
by: Khodzhaev, Zulfidin
Published: (2024)

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task
by: Phan, Dang Thoai
Published: (2024)

Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers
by: Masuyama, Yoshiki, et al.
Published: (2025)

Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
by: Gogoi, Parismita, et al.
Published: (2025)

Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
by: Selvaraj, Nithish Muthuchamy, et al.
Published: (2023)

FreeCodec: A disentangled neural speech codec with fewer tokens
by: Zheng, Youqiang, et al.
Published: (2024)

Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
by: Huang, Ziling, et al.
Published: (2025)