Saved in:
| Main Authors: | Chen, Liyang, Chen, Hongkai, Cai, Yujun, Li, Sifan, Ye, Qingwen, Wang, Yiwei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.08078 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026)
by: Chen, Liyang, et al.
Published: (2026)
OptiSQL: Executable SQL Generation from Optical Tokens
by: Li, Sifan, et al.
Published: (2026)
by: Li, Sifan, et al.
Published: (2026)
Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector
by: Li, Sifan, et al.
Published: (2025)
by: Li, Sifan, et al.
Published: (2025)
Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control
by: Li, Bingliang, et al.
Published: (2024)
by: Li, Bingliang, et al.
Published: (2024)
Training-Free Multimodal Guidance for Video to Audio Generation
by: Grassucci, Eleonora, et al.
Published: (2025)
by: Grassucci, Eleonora, et al.
Published: (2025)
Towards Trustworthy Audio Deepfake Detection: A Systematic Framework for Diagnosing and Mitigating Gender Bias
by: Fursule, Aishwarya, et al.
Published: (2026)
by: Fursule, Aishwarya, et al.
Published: (2026)
Audio Super-Resolution with Latent Bridge Models
by: Li, Chang, et al.
Published: (2025)
by: Li, Chang, et al.
Published: (2025)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
by: Ishii, Masato, et al.
Published: (2025)
by: Ishii, Masato, et al.
Published: (2025)
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
by: Xiao, Yixuan, et al.
Published: (2026)
by: Xiao, Yixuan, et al.
Published: (2026)
Thinking with Sound: Audio Chain-of-Thought Enables Multimodal Reasoning in Large Audio-Language Models
by: Xiong, Zhen, et al.
Published: (2025)
by: Xiong, Zhen, et al.
Published: (2025)
Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
by: Pang, Jiacheng, et al.
Published: (2026)
by: Pang, Jiacheng, et al.
Published: (2026)
Does Audio Deepfake Detection Generalize?
by: Müller, Nicolas M., et al.
Published: (2022)
by: Müller, Nicolas M., et al.
Published: (2022)
STEP: Detecting Audio Backdoor Attacks via Stability-based Trigger Exposure Profiling
by: Wang, Kun, et al.
Published: (2026)
by: Wang, Kun, et al.
Published: (2026)
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
by: Wu, Daiqing, et al.
Published: (2026)
by: Wu, Daiqing, et al.
Published: (2026)
PACE: Pretrained Audio Continual Learning
by: Li, Chang, et al.
Published: (2026)
by: Li, Chang, et al.
Published: (2026)
SoundReactor: Frame-level Online Video-to-Audio Generation
by: Saito, Koichi, et al.
Published: (2025)
by: Saito, Koichi, et al.
Published: (2025)
An Enhanced Audio Feature Tailored for Anomalous Sound Detection Based on Pre-trained Models
by: Zhong, Guirui, et al.
Published: (2025)
by: Zhong, Guirui, et al.
Published: (2025)
SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos
by: Dellali, Amir, et al.
Published: (2025)
by: Dellali, Amir, et al.
Published: (2025)
Transformer Based Machine Fault Detection From Audio Input
by: Holla, Kiran Voderhobli
Published: (2026)
by: Holla, Kiran Voderhobli
Published: (2026)
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
by: Wang, Lu, et al.
Published: (2025)
by: Wang, Lu, et al.
Published: (2025)
Mitigating Sex Bias in Audio Data-driven COPD and COVID-19 Breathing Pattern Detection Models
by: Pfeifer, Rachel, et al.
Published: (2024)
by: Pfeifer, Rachel, et al.
Published: (2024)
ADD 2022: the First Audio Deep Synthesis Detection Challenge
by: Yi, Jiangyan, et al.
Published: (2022)
by: Yi, Jiangyan, et al.
Published: (2022)
Investigating the Impact of Speech Enhancement on Audio Deepfake Detection in Noisy Environments
by: Anacin, et al.
Published: (2026)
by: Anacin, et al.
Published: (2026)
SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model
by: Li, Yan, et al.
Published: (2024)
by: Li, Yan, et al.
Published: (2024)
GeoSVG-RL: Geometry-Aware Reinforcement Learning for Layout-Constrained Text-to-SVG Diagram Generation
by: Li, Sifan, et al.
Published: (2026)
by: Li, Sifan, et al.
Published: (2026)
Structured-Noise Masked Modeling for Video, Audio and Beyond
by: Bhowmik, Aritra, et al.
Published: (2025)
by: Bhowmik, Aritra, et al.
Published: (2025)
ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)
by: Jimon, Daniel, et al.
Published: (2025)
QAMRO: Quality-aware Adaptive Margin Ranking Optimization for Human-aligned Assessment of Audio Generation Systems
by: Wang, Chien-Chun, et al.
Published: (2025)
by: Wang, Chien-Chun, et al.
Published: (2025)
Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
by: Fedorishin, Dennis, et al.
Published: (2024)
by: Fedorishin, Dennis, et al.
Published: (2024)
Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
by: Wu, Yanru, et al.
Published: (2026)
by: Wu, Yanru, et al.
Published: (2026)
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
by: Tian, Jinchuan, et al.
Published: (2025)
by: Tian, Jinchuan, et al.
Published: (2025)
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
by: Villa-Renteria, Ivan, et al.
Published: (2024)
by: Villa-Renteria, Ivan, et al.
Published: (2024)
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
by: Ishii, Masato, et al.
Published: (2024)
by: Ishii, Masato, et al.
Published: (2024)
Improving Out-of-Domain Audio Deepfake Detection via Layer Selection and Fusion of SSL-Based Countermeasures
by: Serrano, Pierre, et al.
Published: (2025)
by: Serrano, Pierre, et al.
Published: (2025)
Unleashing the Power of Natural Audio Featuring Multiple Sound Sources
by: Cheng, Xize, et al.
Published: (2025)
by: Cheng, Xize, et al.
Published: (2025)
A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation
by: Watcharasupat, Karn N., et al.
Published: (2023)
by: Watcharasupat, Karn N., et al.
Published: (2023)
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
by: Wang, Qizhou, et al.
Published: (2025)
by: Wang, Qizhou, et al.
Published: (2025)
Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search
by: Yu, Tao, et al.
Published: (2026)
by: Yu, Tao, et al.
Published: (2026)
Virtual Consistency for Audio Editing
by: Cervera, Matthieu, et al.
Published: (2025)
by: Cervera, Matthieu, et al.
Published: (2025)
Similar Items
-
AudioRouter: Data Efficient Audio Understanding via RL based Dual Reasoning
by: Chen, Liyang, et al.
Published: (2026) -
OptiSQL: Executable SQL Generation from Optical Tokens
by: Li, Sifan, et al.
Published: (2026) -
Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector
by: Li, Sifan, et al.
Published: (2025) -
Tri-Ergon: Fine-grained Video-to-Audio Generation with Multi-modal Conditions and LUFS Control
by: Li, Bingliang, et al.
Published: (2024) -
Training-Free Multimodal Guidance for Video to Audio Generation
by: Grassucci, Eleonora, et al.
Published: (2025)