Saved in:
| Main Authors: | Yan, WeiRan, Tang, MaoLin, Zhao, Qijun, Chen, Peng, Qi, Dunwu, Hou, Rong, Zhang, Zhihe |
|---|---|
| Format: | Preprint |
| Published: |
2019
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/1912.11333 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning
by: Rong, Yan, et al.
Published: (2025)
by: Rong, Yan, et al.
Published: (2025)
Masked Audio Modeling with CLAP and Multi-Objective Learning
by: Xin, Yifei, et al.
Published: (2024)
by: Xin, Yifei, et al.
Published: (2024)
An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)
by: Triantafyllopoulos, Andreas, et al.
Published: (2024)
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023)
by: Dinkel, Heinrich, et al.
Published: (2023)
AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
by: Lin, Jingru, et al.
Published: (2026)
by: Lin, Jingru, et al.
Published: (2026)
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
by: Wang, Yucheng, et al.
Published: (2026)
by: Wang, Yucheng, et al.
Published: (2026)
MATS: An Audio Language Model under Text-only Supervision
by: Wang, Wen, et al.
Published: (2025)
by: Wang, Wen, et al.
Published: (2025)
AeroGPT: Leveraging Large-Scale Audio Model for Aero-Engine Bearing Fault Diagnosis
by: Liu, Jiale, et al.
Published: (2025)
by: Liu, Jiale, et al.
Published: (2025)
Advancing Continual Learning for Robust Deepfake Audio Classification
by: Dong, Feiyi, et al.
Published: (2024)
by: Dong, Feiyi, et al.
Published: (2024)
Utilizing Speaker Profiles for Impersonation Audio Detection
by: Gu, Hao, et al.
Published: (2024)
by: Gu, Hao, et al.
Published: (2024)
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
by: Lin, Zhaofeng, et al.
Published: (2024)
by: Lin, Zhaofeng, et al.
Published: (2024)
Robust Audio Tagging under Class-wise Supervision Unreliability
by: Hou, Yuanbo, et al.
Published: (2026)
by: Hou, Yuanbo, et al.
Published: (2026)
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
by: Huang, Kuan-Po, et al.
Published: (2025)
by: Huang, Kuan-Po, et al.
Published: (2025)
QuarkAudio Technical Report
by: Liu, Chengwei, et al.
Published: (2025)
by: Liu, Chengwei, et al.
Published: (2025)
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
by: Xu, Xuenan, et al.
Published: (2023)
by: Xu, Xuenan, et al.
Published: (2023)
WeDefense: A Toolkit to Defend Against Fake Audio
by: Zhang, Lin, et al.
Published: (2026)
by: Zhang, Lin, et al.
Published: (2026)
A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining
by: Xiao, Feiyang, et al.
Published: (2024)
by: Xiao, Feiyang, et al.
Published: (2024)
Continuous Learning of Transformer-based Audio Deepfake Detection
by: Le, Tuan Duy Nguyen, et al.
Published: (2024)
by: Le, Tuan Duy Nguyen, et al.
Published: (2024)
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
by: Yang, Dongchao, et al.
Published: (2023)
by: Yang, Dongchao, et al.
Published: (2023)
An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation
by: da Costa, Maurício do V. M., et al.
Published: (2025)
by: da Costa, Maurício do V. M., et al.
Published: (2025)
MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models
by: Gong, Yitian, et al.
Published: (2026)
by: Gong, Yitian, et al.
Published: (2026)
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
by: Jiang, Xiao-Hang, et al.
Published: (2024)
by: Jiang, Xiao-Hang, et al.
Published: (2024)
Leveraging Mamba with Full-Face Vision for Audio-Visual Speech Enhancement
by: Chao, Rong, et al.
Published: (2025)
by: Chao, Rong, et al.
Published: (2025)
Measuring Audio Prompt Adherence with Distribution-based Embedding Distances
by: Grachten, Maarten
Published: (2024)
by: Grachten, Maarten
Published: (2024)
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
by: Ren, Wenze, et al.
Published: (2024)
by: Ren, Wenze, et al.
Published: (2024)
Discrete Audio Representations for Automated Audio Captioning
by: Tian, Jingguang, et al.
Published: (2025)
by: Tian, Jingguang, et al.
Published: (2025)
Pengi: An Audio Language Model for Audio Tasks
by: Deshmukh, Soham, et al.
Published: (2023)
by: Deshmukh, Soham, et al.
Published: (2023)
Exploring Differences between Human Perception and Model Inference in Audio Event Recognition
by: Tan, Yizhou, et al.
Published: (2024)
by: Tan, Yizhou, et al.
Published: (2024)
The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models
by: Yamamoto, Katsuhiko, et al.
Published: (2025)
by: Yamamoto, Katsuhiko, et al.
Published: (2025)
Analysis of ABC Frontend Audio Systems for the NIST-SRE24
by: Barahona, Sara, et al.
Published: (2025)
by: Barahona, Sara, et al.
Published: (2025)
Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
by: Dixit, Satvik, et al.
Published: (2024)
by: Dixit, Satvik, et al.
Published: (2024)
Audio Entailment: Assessing Deductive Reasoning for Audio Understanding
by: Deshmukh, Soham, et al.
Published: (2024)
by: Deshmukh, Soham, et al.
Published: (2024)
SemanticAudio: Audio Generation and Editing in Semantic Space
by: Dai, Zheqi, et al.
Published: (2026)
by: Dai, Zheqi, et al.
Published: (2026)
EmoFake: An Initial Dataset for Emotion Fake Audio Detection
by: Zhao, Yan, et al.
Published: (2022)
by: Zhao, Yan, et al.
Published: (2022)
Can Large Language Models Understand Spatial Audio?
by: Tang, Changli, et al.
Published: (2024)
by: Tang, Changli, et al.
Published: (2024)
Audio Mamba: Pretrained Audio State Space Model For Audio Tagging
by: Lin, Jiaju, et al.
Published: (2024)
by: Lin, Jiaju, et al.
Published: (2024)
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
by: Li, Chenxing, et al.
Published: (2024)
by: Li, Chenxing, et al.
Published: (2024)
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
Similar Items
-
AudioGenie-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning
by: Rong, Yan, et al.
Published: (2025) -
Masked Audio Modeling with CLAP and Multi-Objective Learning
by: Xin, Yifei, et al.
Published: (2024) -
An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
by: Triantafyllopoulos, Andreas, et al.
Published: (2024) -
Streaming Audio Transformers for Online Audio Tagging
by: Dinkel, Heinrich, et al.
Published: (2023) -
AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
by: Lin, Jingru, et al.
Published: (2026)