Saved in:
| Main Authors: | Li, Chang, Zhou, Kanglei, Wang, Liyuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.03355 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Audio Super-Resolution with Latent Bridge Models
by: Li, Chang, et al.
Published: (2025)
by: Li, Chang, et al.
Published: (2025)
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025)
by: Primus, Paul, et al.
Published: (2025)
AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference
by: Shinoda, Risa, et al.
Published: (2026)
by: Shinoda, Risa, et al.
Published: (2026)
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
by: Tabassum, Afrina, et al.
Published: (2024)
by: Tabassum, Afrina, et al.
Published: (2024)
Audio-Visual Continual Test-Time Adaptation without Forgetting
by: Maharana, Sarthak Kumar, et al.
Published: (2026)
by: Maharana, Sarthak Kumar, et al.
Published: (2026)
T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining
by: Yuan, Yi, et al.
Published: (2024)
by: Yuan, Yi, et al.
Published: (2024)
AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)
by: Huang, Hanxun, et al.
Published: (2026)
Unleashing the Power of Natural Audio Featuring Multiple Sound Sources
by: Cheng, Xize, et al.
Published: (2025)
by: Cheng, Xize, et al.
Published: (2025)
Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning
by: Wu, Daiqing, et al.
Published: (2026)
by: Wu, Daiqing, et al.
Published: (2026)
AudioCodecBench: A Comprehensive Benchmark for Audio Codec Evaluation
by: Wang, Lu, et al.
Published: (2025)
by: Wang, Lu, et al.
Published: (2025)
tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models
by: Paissan, Francesco, et al.
Published: (2023)
by: Paissan, Francesco, et al.
Published: (2023)
Detecting and Mitigating Insertion Hallucination in Video-to-Audio Generation
by: Chen, Liyang, et al.
Published: (2025)
by: Chen, Liyang, et al.
Published: (2025)
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection
by: Xiao, Yixuan, et al.
Published: (2026)
by: Xiao, Yixuan, et al.
Published: (2026)
ADNAC: Audio Denoiser using Neural Audio Codec
by: Jimon, Daniel, et al.
Published: (2025)
by: Jimon, Daniel, et al.
Published: (2025)
Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
by: Sgouropoulos, Christos, et al.
Published: (2025)
by: Sgouropoulos, Christos, et al.
Published: (2025)
Heterogeneity-Aware Dataset Scheduling for Efficient Audio Large Language Model Training
by: Wu, Yanru, et al.
Published: (2026)
by: Wu, Yanru, et al.
Published: (2026)
LipsAM: Lipschitz-Continuous Amplitude Modifier for Audio Signal Processing and its Application to Plug-and-Play Dereverberation
by: Matsumoto, Kazuki, et al.
Published: (2026)
by: Matsumoto, Kazuki, et al.
Published: (2026)
SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)
by: Zhang, Yuanhe, et al.
Published: (2026)
Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders
by: Paek, Nathan, et al.
Published: (2025)
by: Paek, Nathan, et al.
Published: (2025)
Virtual Consistency for Audio Editing
by: Cervera, Matthieu, et al.
Published: (2025)
by: Cervera, Matthieu, et al.
Published: (2025)
Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Modal Deep Search
by: Yu, Tao, et al.
Published: (2026)
by: Yu, Tao, et al.
Published: (2026)
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
by: Wang, Wupeng, et al.
Published: (2025)
by: Wang, Wupeng, et al.
Published: (2025)
Segmentwise Pruning in Audio-Language Models
by: Gibier, Marcel, et al.
Published: (2025)
by: Gibier, Marcel, et al.
Published: (2025)
Adapting Neural Audio Codecs to EEG
by: Kastrati, Ard, et al.
Published: (2025)
by: Kastrati, Ard, et al.
Published: (2025)
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
by: Wang, Wupeng, et al.
Published: (2024)
by: Wang, Wupeng, et al.
Published: (2024)
APEX: Audio Prototype EXplanations for Classification Tasks
by: Kawa, Piotr, et al.
Published: (2026)
by: Kawa, Piotr, et al.
Published: (2026)
Investigating Modality Contribution in Audio LLMs for Music
by: Morais, Giovana, et al.
Published: (2025)
by: Morais, Giovana, et al.
Published: (2025)
Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits
by: Ishii, Masato, et al.
Published: (2025)
by: Ishii, Masato, et al.
Published: (2025)
Training-Free Multimodal Guidance for Video to Audio Generation
by: Grassucci, Eleonora, et al.
Published: (2025)
by: Grassucci, Eleonora, et al.
Published: (2025)
Semantic-Aware Confidence Calibration for Automated Audio Captioning
by: Dunker, Lucas, et al.
Published: (2025)
by: Dunker, Lucas, et al.
Published: (2025)
Descriptor-Injected Cross-Modal Learning: A Systematic Exploration of Audio-MIDI Alignment via Spectral and Melodic Features
by: Méndez, Mariano Fernández
Published: (2026)
by: Méndez, Mariano Fernández
Published: (2026)
Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards
by: Fang, Linghan, et al.
Published: (2026)
by: Fang, Linghan, et al.
Published: (2026)
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Continued Pretraining for Low-Resource Swahili ASR: Achieving State-of-the-Art Performance with Minimal Labeled Data
by: Mutisya, Hillary, et al.
Published: (2026)
by: Mutisya, Hillary, et al.
Published: (2026)
FastWave: Optimized Diffusion Model for Audio Super-Resolution
by: Kuznetsov, Nikita, et al.
Published: (2026)
by: Kuznetsov, Nikita, et al.
Published: (2026)
Transformer Based Machine Fault Detection From Audio Input
by: Holla, Kiran Voderhobli
Published: (2026)
by: Holla, Kiran Voderhobli
Published: (2026)
TADA! Tuning Audio Diffusion Models through Activation Steering
by: Staniszewski, Łukasz, et al.
Published: (2026)
by: Staniszewski, Łukasz, et al.
Published: (2026)
Characterizing Continual Learning Scenarios and Strategies for Audio Analysis
by: Bhatt, Ruchi, et al.
Published: (2024)
by: Bhatt, Ruchi, et al.
Published: (2024)
Improving Audio Classification by Transitioning from Zero- to Few-Shot
by: Taylor, James, et al.
Published: (2025)
by: Taylor, James, et al.
Published: (2025)
High-Fidelity Music Vocoder using Neural Audio Codecs
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
Similar Items
-
Audio Super-Resolution with Latent Bridge Models
by: Li, Chang, et al.
Published: (2025) -
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
by: Primus, Paul, et al.
Published: (2025) -
AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference
by: Shinoda, Risa, et al.
Published: (2026) -
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
by: Tabassum, Afrina, et al.
Published: (2024) -
Audio-Visual Continual Test-Time Adaptation without Forgetting
by: Maharana, Sarthak Kumar, et al.
Published: (2026)