Saved in:
| Main Authors: | Yang, Xiaoran, Yang, Jianxuan, Guo, Xinyue, Wang, Haoyu, Pan, Ningning, Huang, Gongping |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.06389 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
by: He, Yuxuan, et al.
Published: (2025)
by: He, Yuxuan, et al.
Published: (2025)
MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization
by: Yang, Jianxuan, et al.
Published: (2025)
by: Yang, Jianxuan, et al.
Published: (2025)
AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control
by: Guo, Xinyue, et al.
Published: (2025)
by: Guo, Xinyue, et al.
Published: (2025)
MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025)
by: Zhu, Yike, et al.
Published: (2025)
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)
by: Li, Xiquan, et al.
Published: (2025)
DSFlow: Dual Supervision and Step-Aware Architecture for One-Step Flow Matching Speech Synthesis
by: Lin, Bin, et al.
Published: (2026)
by: Lin, Bin, et al.
Published: (2026)
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation
by: Xie, Tianxin, et al.
Published: (2025)
by: Xie, Tianxin, et al.
Published: (2025)
AlphaFlowTSE: One-Step Generative Target Speaker Extraction via Conditional AlphaFlow
by: Li, Duojia, et al.
Published: (2026)
by: Li, Duojia, et al.
Published: (2026)
Discrete MeanFlow: One-Step Generation via Conditional Transition Kernels
by: Khan, Fairoz Nower, et al.
Published: (2026)
by: Khan, Fairoz Nower, et al.
Published: (2026)
Stochastic MeanFlow Policies: One-Step Generative Control with Entropic Mirror Descent
by: Wang, Zeyuan, et al.
Published: (2026)
by: Wang, Zeyuan, et al.
Published: (2026)
One-Step Generative Policies with Q-Learning: A Reformulation of MeanFlow
by: Wang, Zeyuan, et al.
Published: (2025)
by: Wang, Zeyuan, et al.
Published: (2025)
MeanFlowSE: one-step generative speech enhancement via conditional mean flow
by: Li, Duojia, et al.
Published: (2025)
by: Li, Duojia, et al.
Published: (2025)
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
by: Huang, Jiawen, et al.
Published: (2026)
by: Huang, Jiawen, et al.
Published: (2026)
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
by: Yang, Yudong, et al.
Published: (2025)
by: Yang, Yudong, et al.
Published: (2025)
MeanVoiceFlow: One-step Nonparallel Voice Conversion with Mean Flows
by: Kaneko, Takuhiro, et al.
Published: (2026)
by: Kaneko, Takuhiro, et al.
Published: (2026)
ERF-BA-TFD+: A Multimodal Model for Audio-Visual Deepfake Detection
by: Zhang, Xin, et al.
Published: (2025)
by: Zhang, Xin, et al.
Published: (2025)
ProAV-DiT: A Projected Latent Diffusion Transformer for Efficient Synchronized Audio-Video Generation
by: Sun, Jiahui, et al.
Published: (2025)
by: Sun, Jiahui, et al.
Published: (2025)
Schrödinger Bridge Mamba for One-Step Speech Enhancement
by: Yang, Jing, et al.
Published: (2025)
by: Yang, Jing, et al.
Published: (2025)
DialogGraph-LLM: Graph-Informed LLMs for End-to-End Audio Dialogue Intent Recognition
by: Liu, HongYu, et al.
Published: (2025)
by: Liu, HongYu, et al.
Published: (2025)
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
by: Tan, Zhenxiong, et al.
Published: (2024)
by: Tan, Zhenxiong, et al.
Published: (2024)
Compose Yourself: Average-Velocity Flow Matching for One-Step Speech Enhancement
by: Yang, Gang, et al.
Published: (2025)
by: Yang, Gang, et al.
Published: (2025)
Drum Synthesis from Expressive Drum Grids via Neural Audio Codecs
by: Soiledis, Konstantinos, et al.
Published: (2026)
by: Soiledis, Konstantinos, et al.
Published: (2026)
FlowPortrait: Reinforcement Learning for Audio-Driven Portrait Video Generation
by: Tan, Weiting, et al.
Published: (2026)
by: Tan, Weiting, et al.
Published: (2026)
Shortcut Flow Matching for Speech Enhancement: Step-Invariant flows via single stage training
by: Zhou, Naisong, et al.
Published: (2025)
by: Zhou, Naisong, et al.
Published: (2025)
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
Improvement and Implementation of a Speech Emotion Recognition Model Based on Dual-Layer LSTM
by: Yang, Xiaoran, et al.
Published: (2024)
by: Yang, Xiaoran, et al.
Published: (2024)
MOSS-Audio Technical Report
by: Yang, Chen, et al.
Published: (2026)
by: Yang, Chen, et al.
Published: (2026)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation
by: Wang, Jun, et al.
Published: (2025)
by: Wang, Jun, et al.
Published: (2025)
Break-the-Beat! Controllable MIDI-to-Drum Audio Synthesis
by: Cui, Shuyang, et al.
Published: (2026)
by: Cui, Shuyang, et al.
Published: (2026)
ViSAGe: Video-to-Spatial Audio Generation
by: Kim, Jaeyeon, et al.
Published: (2025)
by: Kim, Jaeyeon, et al.
Published: (2025)
Audio Explanation Synthesis with Generative Foundation Models
by: Akman, Alican, et al.
Published: (2024)
by: Akman, Alican, et al.
Published: (2024)
When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning
by: Mao, Ruixiang, et al.
Published: (2026)
by: Mao, Ruixiang, et al.
Published: (2026)
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
by: Gao, Liting, et al.
Published: (2025)
by: Gao, Liting, et al.
Published: (2025)
OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
by: Li, Maomao, et al.
Published: (2026)
by: Li, Maomao, et al.
Published: (2026)
AVBench: Human-Aligned and Automated Evaluation Benchmark for Audio-Video Generative Models
by: Yang, Jialiang, et al.
Published: (2026)
by: Yang, Jialiang, et al.
Published: (2026)
No Free Lunch from Audio Pretraining in Bioacoustics: A Benchmark Study of Embeddings
by: Chen, Chenggang, et al.
Published: (2025)
by: Chen, Chenggang, et al.
Published: (2025)
MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning
by: Qiang, Chunyu, et al.
Published: (2026)
by: Qiang, Chunyu, et al.
Published: (2026)
Audio ControlNet for Fine-Grained Audio Generation and Editing
by: Zhu, Haina, et al.
Published: (2026)
by: Zhu, Haina, et al.
Published: (2026)
MTAVG-Bench 2.0: Diagnosing Failure Modes of Cinematic Expressiveness in Multi-Talker Audio-Video Generation
by: Li, Haitian, et al.
Published: (2026)
by: Li, Haitian, et al.
Published: (2026)
Similar Items
-
TTMBA: Towards Text To Multiple Sources Binaural Audio Generation
by: He, Yuxuan, et al.
Published: (2025) -
MultiSoundGen: Video-to-Audio Generation for Multi-Event Scenarios via SlowFast Contrastive Audio-Visual Pretraining and Direct Preference Optimization
by: Yang, Jianxuan, et al.
Published: (2025) -
AV-Edit: Multimodal Generative Sound Effect Editing via Audio-Visual Semantic Joint Control
by: Guo, Xinyue, et al.
Published: (2025) -
MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025) -
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)