Saved in:
| Main Authors: | Zhang, Chong, Liu, Yanqing, Zheng, Yang, Zhao, Sheng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.04633 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation
by: Gu, Yi, et al.
Published: (2026)
by: Gu, Yi, et al.
Published: (2026)
Spectrogram features for audio and speech analysis
by: McLoughlin, Ian, et al.
Published: (2026)
by: McLoughlin, Ian, et al.
Published: (2026)
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025)
by: Zhao, Lei, et al.
Published: (2025)
SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction
by: Fan, Cunhang, et al.
Published: (2025)
by: Fan, Cunhang, et al.
Published: (2025)
Is GAN Necessary for Mel-Spectrogram-based Neural Vocoder?
by: Du, Hui-Peng, et al.
Published: (2025)
by: Du, Hui-Peng, et al.
Published: (2025)
A Unified Neural Codec Language Model for Selective Editable Text to Speech Generation
by: Pei, Hanchen, et al.
Published: (2026)
by: Pei, Hanchen, et al.
Published: (2026)
ASM: Audio Spectrogram Mixer
by: Ji, Qingfeng, et al.
Published: (2024)
by: Ji, Qingfeng, et al.
Published: (2024)
Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy
by: Xue, Ke, et al.
Published: (2026)
by: Xue, Ke, et al.
Published: (2026)
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
by: Yuan, Ze, et al.
Published: (2024)
by: Yuan, Ze, et al.
Published: (2024)
BFA: Real-time Multilingual Text-to-speech Forced Alignment
by: Rehman, Abdul, et al.
Published: (2025)
by: Rehman, Abdul, et al.
Published: (2025)
FAST: Fast Audio Spectrogram Transformer
by: Naman, Anugunj, et al.
Published: (2025)
by: Naman, Anugunj, et al.
Published: (2025)
Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)
by: Ahmad, Rehan, et al.
Published: (2026)
Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers
by: Cappellazzo, Umberto, et al.
Published: (2023)
by: Cappellazzo, Umberto, et al.
Published: (2023)
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
by: Comunità, Marco, et al.
Published: (2024)
by: Comunità, Marco, et al.
Published: (2024)
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
by: Fucci, Dennis, et al.
Published: (2024)
by: Fucci, Dennis, et al.
Published: (2024)
A Survey of Deep Learning for Complex Speech Spectrograms
by: Xie, Yuying, et al.
Published: (2025)
by: Xie, Yuying, et al.
Published: (2025)
DBMIF: a deep balanced multimodal iterative fusion framework for air- and bone-conduction speech enhancement
by: Wu, Yilei, et al.
Published: (2026)
by: Wu, Yilei, et al.
Published: (2026)
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation
by: Ma, Jianbo, et al.
Published: (2026)
by: Ma, Jianbo, et al.
Published: (2026)
Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
Boosting Large Language Model for Speech Synthesis: An Empirical Study
by: Hao, Hongkun, et al.
Published: (2023)
by: Hao, Hongkun, et al.
Published: (2023)
Detection of manatee vocalisations using the Audio Spectrogram Transformer
by: Schiappacasse, Stefano, et al.
Published: (2024)
by: Schiappacasse, Stefano, et al.
Published: (2024)
SPGM: Prioritizing Local Features for enhanced speech separation performance
by: Yip, Jia Qi, et al.
Published: (2023)
by: Yip, Jia Qi, et al.
Published: (2023)
Selective Classifier-free Guidance for Zero-shot Text-to-speech
by: Zheng, John, et al.
Published: (2025)
by: Zheng, John, et al.
Published: (2025)
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
by: Chen, Sanyuan, et al.
Published: (2024)
by: Chen, Sanyuan, et al.
Published: (2024)
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
by: Wang, Jianzong, et al.
Published: (2023)
by: Wang, Jianzong, et al.
Published: (2023)
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
by: Jiang, Xiao-Hang, et al.
Published: (2024)
by: Jiang, Xiao-Hang, et al.
Published: (2024)
I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception
by: Zhang, Jiawei, et al.
Published: (2024)
by: Zhang, Jiawei, et al.
Published: (2024)
Fusion of Modulation Spectrogram and SSL with Multi-head Attention for Fake Speech Detection
by: N, Rishith Sadashiv T, et al.
Published: (2025)
by: N, Rishith Sadashiv T, et al.
Published: (2025)
Joint Spectrogram Separation and TDOA Estimation using Optimal Transport
by: Fabiani, Linda, et al.
Published: (2025)
by: Fabiani, Linda, et al.
Published: (2025)
Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs
by: Langman, Ryan, et al.
Published: (2024)
by: Langman, Ryan, et al.
Published: (2024)
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)
by: Liu, Huadai, et al.
Published: (2023)
Evaluating CNN with Stacked Feature Representations and Audio Spectrogram Transformer Models for Sound Classification
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)
by: Dehaghania, Parinaz Binandeh, et al.
Published: (2026)
A Practical Guide to Spectrogram Analysis for Audio Signal Processing
by: Khodzhaev, Zulfidin
Published: (2024)
by: Khodzhaev, Zulfidin
Published: (2024)
Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task
by: Phan, Dang Thoai
Published: (2024)
by: Phan, Dang Thoai
Published: (2024)
Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers
by: Masuyama, Yoshiki, et al.
Published: (2025)
by: Masuyama, Yoshiki, et al.
Published: (2025)
Leveraging AM and FM Rhythm Spectrograms for Dementia Classification and Assessment
by: Gogoi, Parismita, et al.
Published: (2025)
by: Gogoi, Parismita, et al.
Published: (2025)
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
by: Selvaraj, Nithish Muthuchamy, et al.
Published: (2023)
by: Selvaraj, Nithish Muthuchamy, et al.
Published: (2023)
FreeCodec: A disentangled neural speech codec with fewer tokens
by: Zheng, Youqiang, et al.
Published: (2024)
by: Zheng, Youqiang, et al.
Published: (2024)
Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
by: Huang, Ziling, et al.
Published: (2025)
by: Huang, Ziling, et al.
Published: (2025)
Similar Items
-
Investigating Group Relative Policy Optimization for Diffusion Transformer based Text-to-Audio Generation
by: Gu, Yi, et al.
Published: (2026) -
Spectrogram features for audio and speech analysis
by: McLoughlin, Ian, et al.
Published: (2026) -
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
by: Zhao, Lei, et al.
Published: (2025) -
SSM2Mel: State Space Model to Reconstruct Mel Spectrogram from the EEG
by: Fan, Cunhang, et al.
Published: (2025) -
DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction
by: Fan, Cunhang, et al.
Published: (2025)