:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Owino, Geofrey, Kasamani, Bernard Shibwabo, Abdelmoniem, Ahmed M., Wornyo, Edem
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Sound Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2512.16271
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Privacy-Enhancing Infant Cry Classification with Federated Transformers and Denoising Regularization
von: Owino, Geofrey, et al.
Veröffentlicht: (2025)

Infant Cry Detection Using Causal Temporal Representation
von: Fu, Minghao, et al.
Veröffentlicht: (2025)

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds
von: Budaghyan, David, et al.
Veröffentlicht: (2023)

InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
von: Hong, Mengze, et al.
Veröffentlicht: (2024)

Fairness-Aware Partial-label Domain Adaptation for Voice Classification of Parkinson's and ALS
von: Francesconi, Arianna, et al.
Veröffentlicht: (2026)

Audio-Guided Dynamic Modality Fusion with Stereo-Aware Attention for Audio-Visual Navigation
von: Li, Jia, et al.
Veröffentlicht: (2025)

HarmonicAttack: An Adaptive Cross-Domain Audio Watermark Removal
von: Li, Kexin, et al.
Veröffentlicht: (2025)

Geometry-Aware Optimization for Respiratory Sound Classification: Enhancing Sensitivity with SAM-Optimized Audio Spectrogram Transformers
von: Işık, Atakan, et al.
Veröffentlicht: (2025)

Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
von: Quelennec, Aurian, et al.
Veröffentlicht: (2025)

Full-Frequency Temporal Patching and Structured Masking for Enhanced Audio Classification
von: Makineni, Aditya, et al.
Veröffentlicht: (2025)

Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection
von: Chen, Meng, et al.
Veröffentlicht: (2026)

PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs
von: Dementyev, Artem, et al.
Veröffentlicht: (2026)

Domain Adaptation Method and Modality Gap Impact in Audio-Text Models for Prototypical Sound Classification
von: Acevedo, Emiliano, et al.
Veröffentlicht: (2025)

OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
von: Biswas, Subrata, et al.
Veröffentlicht: (2025)

Structured-Noise Masked Modeling for Video, Audio and Beyond
von: Bhowmik, Aritra, et al.
Veröffentlicht: (2025)

MSMT-FN: Multi-segment Multi-task Fusion Network for Marketing Audio Classification
von: Liu, HongYu, et al.
Veröffentlicht: (2025)

HRTFformer: A Spatially-Aware Transformer for Individual HRTF Upsampling in Immersive Audio Rendering
von: Hu, Xuyi, et al.
Veröffentlicht: (2025)

Representation-Regularized Convolutional Audio Transformer for Audio Understanding
von: Han, Bing, et al.
Veröffentlicht: (2026)

Toward Noise-Aware Audio Deepfake Detection: Survey, SNR-Benchmarks, and Practical Recipes
von: Sen, Udayon, et al.
Veröffentlicht: (2025)

GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking
von: Wang, Yunqiang, et al.
Veröffentlicht: (2026)

DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer
von: Liu, Yisu, et al.
Veröffentlicht: (2025)

Are Audio-Language Models Listening? Audio-Specialist Heads for Adaptive Audio Steering
von: Glazer, Neta, et al.
Veröffentlicht: (2026)

Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
von: Weng, Yuzhe, et al.
Veröffentlicht: (2026)

Bridging Biological Hearing and Neuromorphic Computing: End-to-End Time-Domain Audio Signal Processing with Reservoir Computing
von: Sebastian, Rinku, et al.
Veröffentlicht: (2026)

Mitigating Stethoscope-Induced Shortcuts in Respiratory Sound Classification under Federated Domain Generalization with Causality-Inspired Interventions
von: Koo, Heejoon, et al.
Veröffentlicht: (2026)

Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles
von: Toikkanen, Miika, et al.
Veröffentlicht: (2025)

RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification
von: Kim, June-Woo, et al.
Veröffentlicht: (2024)

Fundamental Survey on Neuromorphic Based Audio Classification
von: Basu, Amlan, et al.
Veröffentlicht: (2025)

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation
von: Feng, Bo-Han, et al.
Veröffentlicht: (2026)

AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
von: Wang, Junyou, et al.
Veröffentlicht: (2025)

Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
von: Zhang, Dan, et al.
Veröffentlicht: (2026)

Cross-Domain Audio Deepfake Detection: Dataset and Analysis
von: Li, Yuang, et al.
Veröffentlicht: (2024)

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
von: Li, Xiquan, et al.
Veröffentlicht: (2025)

AudioMotionBench: Evaluating Auditory Motion Perception in Audio LLMs
von: Sun, Zhe, et al.
Veröffentlicht: (2025)

The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization
von: Zhang, Ruixing, et al.
Veröffentlicht: (2026)

Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
von: Huang, Jiawen, et al.
Veröffentlicht: (2026)

Stable Audio 3
von: Evans, Zach, et al.
Veröffentlicht: (2026)

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization
von: Labrak, Yanis, et al.
Veröffentlicht: (2026)

Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
von: Lee, Kuan-Yi, et al.
Veröffentlicht: (2025)

AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders
von: Aparin, Georgii, et al.
Veröffentlicht: (2026)