Saved in:
| Main Authors: | Jia, Yuhang, Wang, Hui, Nie, Xin, Guo, Yujie, Gao, Lianru, Qin, Yong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.11966 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
by: Jia, Yuhang, et al.
Published: (2024)
by: Jia, Yuhang, et al.
Published: (2024)
GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR
by: Guo, Yujie, et al.
Published: (2025)
by: Guo, Yujie, et al.
Published: (2025)
CosyEdit2: Speech-Editing-Oriented Reinforcement Learning Unlocks Better Zero-Shot TTS
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
TTA-Bench: A Comprehensive Benchmark for Evaluating Text-to-Audio Models
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation
by: Liu, Cheng, et al.
Published: (2025)
by: Liu, Cheng, et al.
Published: (2025)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding
by: Zhou, Jiaming, et al.
Published: (2026)
by: Zhou, Jiaming, et al.
Published: (2026)
DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation
by: Li, Baihan, et al.
Published: (2024)
by: Li, Baihan, et al.
Published: (2024)
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
by: Lu, Ye-Xin, et al.
Published: (2024)
by: Lu, Ye-Xin, et al.
Published: (2024)
AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2025)
by: Dai, Yuhang, et al.
Published: (2025)
SemanticAudio: Audio Generation and Editing in Semantic Space
by: Dai, Zheqi, et al.
Published: (2026)
by: Dai, Zheqi, et al.
Published: (2026)
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
by: Chen, Yang, et al.
Published: (2024)
by: Chen, Yang, et al.
Published: (2024)
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
by: Gao, Liting, et al.
Published: (2025)
by: Gao, Liting, et al.
Published: (2025)
SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality
by: Delgado, Pablo M., et al.
Published: (2025)
by: Delgado, Pablo M., et al.
Published: (2025)
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
by: Guo, Tianyu, et al.
Published: (2025)
by: Guo, Tianyu, et al.
Published: (2025)
BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset
by: Fahad, Istiaq Ahmed, et al.
Published: (2025)
by: Fahad, Istiaq Ahmed, et al.
Published: (2025)
AudioChat: Unified Audio Storytelling, Editing, and Understanding with Transfusion Forcing
by: Chen, William, et al.
Published: (2026)
by: Chen, William, et al.
Published: (2026)
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding
by: Ai, Yang, et al.
Published: (2024)
by: Ai, Yang, et al.
Published: (2024)
Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs
by: Xue, Jun, et al.
Published: (2026)
by: Xue, Jun, et al.
Published: (2026)
Evaluating Objective Speech Quality Metrics for Neural Audio Codecs
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect
by: Naouara, Hedi, et al.
Published: (2025)
by: Naouara, Hedi, et al.
Published: (2025)
A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)
by: Welford, Alejandro Sosa, et al.
Published: (2025)
Virtual Consistency for Audio Editing
by: Cervera, Matthieu, et al.
Published: (2025)
by: Cervera, Matthieu, et al.
Published: (2025)
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio
by: Zeng, Siding, et al.
Published: (2024)
by: Zeng, Siding, et al.
Published: (2024)
GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human
by: Lin, Yihang, et al.
Published: (2026)
by: Lin, Yihang, et al.
Published: (2026)
MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model
by: Tao, Ye, et al.
Published: (2025)
by: Tao, Ye, et al.
Published: (2025)
Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling
by: Wang, Quanxiu, et al.
Published: (2024)
by: Wang, Quanxiu, et al.
Published: (2024)
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
by: Jin, Weifei, et al.
Published: (2024)
by: Jin, Weifei, et al.
Published: (2024)
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
by: Xue, Hongfei, et al.
Published: (2024)
by: Xue, Hongfei, et al.
Published: (2024)
TW-Sound580K: A Regional Audio-Text Dataset with Verification-Guided Curation for Localized Audio-Language Modeling
by: Xie, Hao-Hui, et al.
Published: (2026)
by: Xie, Hao-Hui, et al.
Published: (2026)
PhonemeDF: A Synthetic Speech Dataset for Audio Deepfake Detection and Naturalness Evaluation
by: Nallaguntla, Vamshi, et al.
Published: (2026)
by: Nallaguntla, Vamshi, et al.
Published: (2026)
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
by: Zhou, Jiaming, et al.
Published: (2023)
by: Zhou, Jiaming, et al.
Published: (2023)
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
Towards Explicit Acoustic Evidence Perception in Audio LLMs for Speech Deepfake Detection
by: Guo, Xiaoxuan, et al.
Published: (2026)
by: Guo, Xiaoxuan, et al.
Published: (2026)
Steer-by-prior Editing of Symbolic Music Loops
by: Jonason, Nicolas, et al.
Published: (2024)
by: Jonason, Nicolas, et al.
Published: (2024)
The AudioMOS Challenge 2025
by: Huang, Wen-Chin, et al.
Published: (2025)
by: Huang, Wen-Chin, et al.
Published: (2025)
Similar Items
-
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025) -
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
by: Wang, Hui, et al.
Published: (2025) -
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
by: Jia, Yuhang, et al.
Published: (2024) -
GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR
by: Guo, Yujie, et al.
Published: (2025) -
CosyEdit2: Speech-Editing-Oriented Reinforcement Learning Unlocks Better Zero-Shot TTS
by: Chen, Junyang, et al.
Published: (2026)