:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Brosnan, Trey
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Sound Artificial Intelligence Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2601.02357
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Enhanced Automatic Drum Transcription via Drum Stem Source Separation
von: Riley, Xavier, et al.
Veröffentlicht: (2025)

Generating High-quality Symbolic Music Using Fine-grained Discriminators
von: Zhang, Zhedong, et al.
Veröffentlicht: (2024)

ECOSoundSet: a finely annotated dataset for the automated acoustic identification of Orthoptera and Cicadidae in North, Central and temperate Western Europe
von: Funosas, David, et al.
Veröffentlicht: (2025)

Deep learning for music generation. Four approaches and their comparative evaluation
von: Paroiu, Razvan, et al.
Veröffentlicht: (2025)

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
von: Chen, Shunian, et al.
Veröffentlicht: (2025)

Where are we in audio deepfake detection? A systematic analysis over generative and detection models
von: Li, Xiang, et al.
Veröffentlicht: (2024)

MaskBeat: Loopable Drum Beat Generation
von: Lanzendörfer, Luca A., et al.
Veröffentlicht: (2025)

DOSE : Drum One-Shot Extraction from Music Mixture
von: Hwang, Suntae, et al.
Veröffentlicht: (2025)

Drum-to-Vocal Percussion Sound Conversion and Its Evaluation Methodology
von: Nobukawa, Rinka, et al.
Veröffentlicht: (2025)

The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling
von: O'Reilly, Patrick, et al.
Veröffentlicht: (2025)

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
von: Niu, Yadong, et al.
Veröffentlicht: (2026)

Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation
von: Wu, Junda, et al.
Veröffentlicht: (2024)

Recomposer: Event-roll-guided generative audio editing
von: Ellis, Daniel P. W., et al.
Veröffentlicht: (2025)

Linear RNNs for autoregressive generation of long music samples
von: Szewczyk, Konrad, et al.
Veröffentlicht: (2025)

Symbotunes: unified hub for symbolic music generative models
von: Skierś, Paweł, et al.
Veröffentlicht: (2024)

Toward Deep Drum Source Separation
von: Mezza, Alessandro Ilic, et al.
Veröffentlicht: (2023)

InstructAudio: Unified speech and music generation with natural language instruction
von: Qiang, Chunyu, et al.
Veröffentlicht: (2025)

CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment
von: Liu, Hanwen, et al.
Veröffentlicht: (2026)

When Scaling Fails: Mitigating Audio Perception Decay of LALMs via Multi-Step Perception-Aware Reasoning
von: Mao, Ruixiang, et al.
Veröffentlicht: (2026)

Mitigating Latent Mismatch in cVAE-Based Singing Voice Synthesis via Flow Matching
von: Yun, Minhyeok, et al.
Veröffentlicht: (2026)

LLM-Guided Reinforcement Learning for Audio-Visual Speech Enhancement
von: Chen, Chih-Ning, et al.
Veröffentlicht: (2026)

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models
von: Xiong, Jiaqi, et al.
Veröffentlicht: (2026)

Learning Physiology-Informed Vocal Spectrotemporal Representations for Speech Emotion Recognition
von: Zhang, Xu, et al.
Veröffentlicht: (2026)

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting
von: Lee, Ching Ho, et al.
Veröffentlicht: (2026)

HierCon: Hierarchical Contrastive Attention for Audio Deepfake Detection
von: Liang, Zhili Nicholas, et al.
Veröffentlicht: (2026)

TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection
von: Ma, Chengyuan, et al.
Veröffentlicht: (2026)

ES4R: Speech Encoding Based on Prepositive Affective Modeling for Empathetic Response Generation
von: Gao, Zhuoyue, et al.
Veröffentlicht: (2026)

CaSNet: Compress-and-Send Network Based Multi-Device Speech Enhancement Model for Distributed Microphone Arrays
von: Jiang, Chengqian, et al.
Veröffentlicht: (2026)

ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
von: Li, Haitao, et al.
Veröffentlicht: (2026)

JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
von: Zhang, Leying, et al.
Veröffentlicht: (2026)

Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
von: Kim, David Joohun, et al.
Veröffentlicht: (2026)

ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations
von: Wang, Kexue, et al.
Veröffentlicht: (2026)

HASS: Hierarchical Simulation of Logopenic Aphasic Speech for Scalable PPA Detection
von: Li, Harrison, et al.
Veröffentlicht: (2026)

Zero-Shot Parkinson's Disease Detection from Speech: Comparing Large Audio and Language Models
von: Kabir, Muhammad Ashad, et al.
Veröffentlicht: (2026)

Semantic visually-guided acoustic highlighting with large vision-language models
von: Huang, Junhua, et al.
Veröffentlicht: (2026)

SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training
von: Mei, Xinhao, et al.
Veröffentlicht: (2026)

CORD: Bridging the Audio-Text Reasoning Gap via Weighted On-policy Cross-modal Distillation
von: Hu, Jing, et al.
Veröffentlicht: (2026)

MoE Adapter for Large Audio Language Models: Sparsity, Disentanglement, and Gradient-Conflict-Free
von: Lei, Yishu, et al.
Veröffentlicht: (2026)

Summary of The Inaugural Music Source Restoration Challenge
von: Zang, Yongyi, et al.
Veröffentlicht: (2026)

LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
von: Oshima, Ryutaro, et al.
Veröffentlicht: (2026)