Saved in:
| Main Authors: | Makineni, Aditya, Geng, Baocheng, Tian, Qing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.21243 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)
by: Quelennec, Aurian, et al.
Published: (2025)
MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025)
by: Quelennec, Aurian, et al.
Published: (2025)
Structured-Noise Masked Modeling for Video, Audio and Beyond
by: Bhowmik, Aritra, et al.
Published: (2025)
by: Bhowmik, Aritra, et al.
Published: (2025)
Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024)
by: Dawn, Aditya, et al.
Published: (2024)
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)
by: Shi, Yanfeng, et al.
Published: (2026)
Quantum Kernels for Audio Deepfake Detection Using Spectrogram Patch Features
by: Amin, Lisan Al, et al.
Published: (2026)
by: Amin, Lisan Al, et al.
Published: (2026)
AudioMosaic: Contrastive Masked Audio Representation Learning
by: Huang, Hanxun, et al.
Published: (2026)
by: Huang, Hanxun, et al.
Published: (2026)
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking
by: Wang, Yunqiang, et al.
Published: (2026)
by: Wang, Yunqiang, et al.
Published: (2026)
Learning Temporal Resolution in Spectrogram for Audio Classification
by: Liu, Haohe, et al.
Published: (2022)
by: Liu, Haohe, et al.
Published: (2022)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
by: Lee, Kuan-Yi, et al.
Published: (2025)
by: Lee, Kuan-Yi, et al.
Published: (2025)
SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked Transformer
by: Sheng, Zhengyan, et al.
Published: (2025)
by: Sheng, Zhengyan, et al.
Published: (2025)
Domain-Agnostic Causal-Aware Audio Transformer for Infant Cry Classification
by: Owino, Geofrey, et al.
Published: (2025)
by: Owino, Geofrey, et al.
Published: (2025)
Residual Tokens Enhance Masked Autoencoders for Speech Modeling
by: Sadok, Samir, et al.
Published: (2026)
by: Sadok, Samir, et al.
Published: (2026)
Token Pruning in Audio Transformers: Optimizing Performance and Decoding Patch Importance
by: Lee, Taehan, et al.
Published: (2025)
by: Lee, Taehan, et al.
Published: (2025)
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
by: Pascual, Santiago, et al.
Published: (2024)
by: Pascual, Santiago, et al.
Published: (2024)
Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models
by: Li, Yanda, et al.
Published: (2026)
by: Li, Yanda, et al.
Published: (2026)
Universal Sound Separation with Self-Supervised Audio Masked Autoencoder
by: Zhao, Junqi, et al.
Published: (2024)
by: Zhao, Junqi, et al.
Published: (2024)
MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification
by: Liu, HongYu, et al.
Published: (2025)
by: Liu, HongYu, et al.
Published: (2025)
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
by: Cai, Pengfei, et al.
Published: (2024)
by: Cai, Pengfei, et al.
Published: (2024)
TFGA-Net: Temporal-Frequency Graph Attention Network for Brain-Controlled Speaker Extraction
by: Si, Youhao, et al.
Published: (2025)
by: Si, Youhao, et al.
Published: (2025)
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
by: Zhu, Wentao
Published: (2024)
by: Zhu, Wentao
Published: (2024)
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation
by: Wu, Junda, et al.
Published: (2024)
by: Wu, Junda, et al.
Published: (2024)
Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
by: Glazer, Neta, et al.
Published: (2026)
by: Glazer, Neta, et al.
Published: (2026)
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
by: Yao, Yiqun, et al.
Published: (2025)
by: Yao, Yiqun, et al.
Published: (2025)
Fundamental Survey on Neuromorphic Based Audio Classification
by: Basu, Amlan, et al.
Published: (2025)
by: Basu, Amlan, et al.
Published: (2025)
Detect All-Type Deepfake Audio: Wavelet Prompt Tuning for Enhanced Auditory Perception
by: Xie, Yuankun, et al.
Published: (2025)
by: Xie, Yuankun, et al.
Published: (2025)
Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs
by: Xue, Jun, et al.
Published: (2026)
by: Xue, Jun, et al.
Published: (2026)
DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization
by: Lee, Geonyoung, et al.
Published: (2025)
by: Lee, Geonyoung, et al.
Published: (2025)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
by: Zhang, Dan, et al.
Published: (2026)
by: Zhang, Dan, et al.
Published: (2026)
Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms
by: Li, Yupei, et al.
Published: (2026)
by: Li, Yupei, et al.
Published: (2026)
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)
by: Li, Xiquan, et al.
Published: (2025)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
by: Zhang, Ruixing, et al.
Published: (2026)
by: Zhang, Ruixing, et al.
Published: (2026)
Mask2Flow-TSE: Two-Stage Target Speaker Extraction with Masking and Flow Matching
by: Moon, Junwon, et al.
Published: (2026)
by: Moon, Junwon, et al.
Published: (2026)
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
by: Li, Yadong, et al.
Published: (2026)
by: Li, Yadong, et al.
Published: (2026)
MaskSR: Masked Language Model for Full-band Speech Restoration
by: Li, Xu, et al.
Published: (2024)
by: Li, Xu, et al.
Published: (2024)
Stable Audio 3
by: Evans, Zach, et al.
Published: (2026)
by: Evans, Zach, et al.
Published: (2026)
Similar Items
-
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025) -
MATPAC++: Enhanced Masked Latent Prediction for Self-Supervised Audio Representation Learning
by: Quelennec, Aurian, et al.
Published: (2025) -
Structured-Noise Masked Modeling for Video, Audio and Beyond
by: Bhowmik, Aritra, et al.
Published: (2025) -
Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification
by: Dawn, Aditya, et al.
Published: (2024) -
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
by: Shi, Yanfeng, et al.
Published: (2026)