Saved in:
| Main Authors: | Chowdhury, Townim Faisal, Huy, Ta Duc, Pan, Siqi, Stoddard, Jeremy, Liao, Zhibin |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22253 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
by: Zhang, Linhao, et al.
Published: (2026)
by: Zhang, Linhao, et al.
Published: (2026)
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025)
by: Zhang, Wenyu, et al.
Published: (2025)
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
by: Zhang, Wenyu, et al.
Published: (2024)
by: Zhang, Wenyu, et al.
Published: (2024)
Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026)
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026)
MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
by: Ali, Zien Sheikh, et al.
Published: (2026)
by: Ali, Zien Sheikh, et al.
Published: (2026)
Zero-Shot Cognitive Impairment Detection from Speech Using AudioLLM
by: Shahin, Mostafa, et al.
Published: (2025)
by: Shahin, Mostafa, et al.
Published: (2025)
Tadabur: A Large-Scale Quran Audio Dataset
by: Alherran, Faisal
Published: (2026)
by: Alherran, Faisal
Published: (2026)
Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval
by: Yoo, HaeJun, et al.
Published: (2026)
by: Yoo, HaeJun, et al.
Published: (2026)
Interpretable All-Type Audio Deepfake Detection with Audio LLMs via Frequency-Time Reinforcement Learning
by: Xie, Yuankun, et al.
Published: (2026)
by: Xie, Yuankun, et al.
Published: (2026)
Adaptive Discovery of Interpretable Audio Attributes with Multimodal LLMs for Low-Resource Classification
by: Yoshimura, Kosuke, et al.
Published: (2026)
by: Yoshimura, Kosuke, et al.
Published: (2026)
Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features
by: Meng, Hanyu, et al.
Published: (2024)
by: Meng, Hanyu, et al.
Published: (2024)
From Healthy Scans to Annotated Tumors: A Tumor Fabrication Framework for 3D Brain MRI Synthesis
by: Dong, Nayu, et al.
Published: (2025)
by: Dong, Nayu, et al.
Published: (2025)
A Sensitivity Analysis of Multi-Event Audio Grounding in Audio LLMs
by: Lee, Taehan, et al.
Published: (2026)
by: Lee, Taehan, et al.
Published: (2026)
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds
by: Ghosh, Sreyan, et al.
Published: (2024)
by: Ghosh, Sreyan, et al.
Published: (2024)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs
by: Quang, Trung Nguyen, et al.
Published: (2026)
by: Quang, Trung Nguyen, et al.
Published: (2026)
Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information
by: Wang, Shih-Heng, et al.
Published: (2026)
by: Wang, Shih-Heng, et al.
Published: (2026)
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
by: Huang, Zhiqi, et al.
Published: (2024)
by: Huang, Zhiqi, et al.
Published: (2024)
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
by: Jia, Yuhang, et al.
Published: (2025)
by: Jia, Yuhang, et al.
Published: (2025)
AudioRAG: A Challenging Benchmark for Audio Reasoning and Information Retrieval
by: Lin, Jingru, et al.
Published: (2026)
by: Lin, Jingru, et al.
Published: (2026)
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
by: Shao, Weichuang, et al.
Published: (2025)
by: Shao, Weichuang, et al.
Published: (2025)
Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization
by: Zhang, Xiangyu, et al.
Published: (2026)
by: Zhang, Xiangyu, et al.
Published: (2026)
AudioToolAgent: An Agentic Framework for Audio-Language Models
by: Wijngaard, Gijs, et al.
Published: (2025)
by: Wijngaard, Gijs, et al.
Published: (2025)
Continuous Learning of Transformer-based Audio Deepfake Detection
by: Le, Tuan Duy Nguyen, et al.
Published: (2024)
by: Le, Tuan Duy Nguyen, et al.
Published: (2024)
Scaling Audio-Text Retrieval with Multimodal Large Language Models
by: Xu, Jilan, et al.
Published: (2026)
by: Xu, Jilan, et al.
Published: (2026)
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
MMEDIT: A Unified Framework for Multi-Type Audio Editing via Audio Language Model
by: Tao, Ye, et al.
Published: (2025)
by: Tao, Ye, et al.
Published: (2025)
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
by: Mei, Jiahao, et al.
Published: (2026)
by: Mei, Jiahao, et al.
Published: (2026)
AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
by: Sun, Zhe, et al.
Published: (2025)
by: Sun, Zhe, et al.
Published: (2025)
Quantum Kernels for Audio Deepfake Detection Using Spectrogram Patch Features
by: Amin, Lisan Al, et al.
Published: (2026)
by: Amin, Lisan Al, et al.
Published: (2026)
BrewCLIP: A Bifurcated Representation Learning Framework for Audio-Visual Retrieval
by: Lu, Zhenyu, et al.
Published: (2024)
by: Lu, Zhenyu, et al.
Published: (2024)
Codec-Robust Attacks on Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026)
by: Roh, Jaechul, et al.
Published: (2026)
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
by: Sun, Siqi, et al.
Published: (2024)
by: Sun, Siqi, et al.
Published: (2024)
Toward a Sparse and Interpretable Audio Codec
by: Vinyard, John
Published: (2025)
by: Vinyard, John
Published: (2025)
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
by: Liu, Chengwei, et al.
Published: (2025)
by: Liu, Chengwei, et al.
Published: (2025)
Network Modulation Synthesis: New Algorithms for Generating Musical Audio Using Autoencoder Networks
by: Hyrkas, Jeremy
Published: (2021)
by: Hyrkas, Jeremy
Published: (2021)
ConceptCaps: a Distilled Concept Dataset for Interpretability in Music Models
by: Sienkiewicz, Bruno, et al.
Published: (2026)
by: Sienkiewicz, Bruno, et al.
Published: (2026)
ATIR: Towards Audio-Text Interleaved Contextual Retrieval
by: Zhao, Tong, et al.
Published: (2026)
by: Zhao, Tong, et al.
Published: (2026)
Investigating Modality Contribution in Audio LLMs for Music
by: Morais, Giovana, et al.
Published: (2025)
by: Morais, Giovana, et al.
Published: (2025)
Towards Privacy-Preserving Audio Classification Systems
by: Chhaglani, Bhawana, et al.
Published: (2024)
by: Chhaglani, Bhawana, et al.
Published: (2024)
Similar Items
-
Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs
by: Zhang, Linhao, et al.
Published: (2026) -
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
by: Zhang, Wenyu, et al.
Published: (2025) -
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
by: Zhang, Wenyu, et al.
Published: (2024) -
Multi-Task Instruction Tuning via Data Scheduling for Low-Resource Arabic AudioLLMs
by: Bhatti, Hunzalah Hassan, et al.
Published: (2026) -
MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs
by: Ali, Zien Sheikh, et al.
Published: (2026)