:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Xu, Xiran, Yan, Yujie, Wu, Xihong, Chen, Jing
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Sound Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2602.23960
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Using Ear-EEG to Decode Auditory Attention in Multiple-speaker Environment
di: Zhu, Haolin, et al.
Pubblicazione: (2024)

The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
di: You, Yuhuan, et al.
Pubblicazione: (2026)

Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording
di: Wang, Bo, et al.
Pubblicazione: (2024)

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
di: Songyi, Li, et al.
Pubblicazione: (2026)

MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification
di: Jinghua, Liang, et al.
Pubblicazione: (2026)

Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
di: Tiwari, Upasana, et al.
Pubblicazione: (2025)

MindMelody: A Closed-Loop EEG-Driven System for Personalized Music Intervention
di: Zhang, Yimeng, et al.
Pubblicazione: (2026)

Hyperbolic Additive Margin Softmax with Hierarchical Information for Speaker Verification
di: Fang, Zhihua, et al.
Pubblicazione: (2026)

Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs
di: Xue, Jun, et al.
Pubblicazione: (2026)

Switchable deep beamformer for high-quality and real-time passive acoustic mapping
di: Zeng, Yi, et al.
Pubblicazione: (2024)

A DenseNet-based method for decoding auditory spatial attention with EEG
di: Xu, Xiran, et al.
Pubblicazione: (2023)

Reject Threshold Adaptation for Open-Set Model Attribution of Deepfake Audio
di: Yan, Xinrui, et al.
Pubblicazione: (2024)

CIPHER: Conformer-based Inference of Phonemes from High-density EEG
di: Madishetty, Varshith
Pubblicazione: (2026)

SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton
di: He, Xuzheng, et al.
Pubblicazione: (2026)

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering
di: Bertolino, Gaia A., et al.
Pubblicazione: (2026)

Smark: A Watermark for Text-to-Speech Diffusion Models via Discrete Wavelet Transform
di: Zhang, Yichuan, et al.
Pubblicazione: (2025)

Do Models Hear Like Us? Probing the Representational Alignment of Audio LLMs and Naturalistic EEG
di: Yang, Haoyun, et al.
Pubblicazione: (2026)

SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding
di: Zhang, Ziyang, et al.
Pubblicazione: (2024)

Dynamic Fusion Multimodal Network for SpeechWellness Detection
di: Sun, Wenqiang, et al.
Pubblicazione: (2025)

Efficient Long-Sequence Diffusion Modeling for Symbolic Music Generation
di: Xu, Jinhan, et al.
Pubblicazione: (2026)

EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction
di: Jing, Chong, et al.
Pubblicazione: (2026)

CompLex: Music Theory Lexicon Constructed by Autonomous Agents for Automatic Music Generation
di: Hu, Zhejing, et al.
Pubblicazione: (2025)

SyncSpeech: Efficient and Low-Latency Text-to-Speech based on Temporal Masked Transformer
di: Sheng, Zhengyan, et al.
Pubblicazione: (2025)

GeHirNet: A Gender-Aware Hierarchical Model for Voice Pathology Classification
di: Wu, Fan, et al.
Pubblicazione: (2025)

GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
di: Zuo, Heda, et al.
Pubblicazione: (2025)

RAS: a Reliability Oriented Metric for Automatic Speech Recognition
di: Huang, Wenbin, et al.
Pubblicazione: (2026)

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
di: Chen, Yifu, et al.
Pubblicazione: (2025)

Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals
di: Lee, Jung-Sun, et al.
Pubblicazione: (2024)

Hierarchical Graph Neural Network for Compressed Speech Steganalysis
di: Hemis, Mustapha, et al.
Pubblicazione: (2025)

Hear: Hierarchically Enhanced Aesthetic Representations For Multidimensional Music Evaluation
di: Liu, Shuyang, et al.
Pubblicazione: (2025)

Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels
di: Weng, Yuzhe, et al.
Pubblicazione: (2026)

DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching
di: Chen, Wei, et al.
Pubblicazione: (2025)

NarraScore: Bridging Visual Narrative and Musical Dynamics via Hierarchical Affective Control
di: Wen, Yufan, et al.
Pubblicazione: (2026)

MuseCPBench: an Empirical Study of Music Editing Methods through Music Context Preservation
di: Vishe, Yash, et al.
Pubblicazione: (2025)

UniSRCodec: Unified and Low-Bitrate Single Codebook Codec with Sub-Band Reconstruction
di: Zhang, Zhisheng, et al.
Pubblicazione: (2026)

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
di: Luo, Dan, et al.
Pubblicazione: (2025)

Toward Complex-Valued Neural Networks for Waveform Generation
di: Oh, Hyung-Seok, et al.
Pubblicazione: (2026)

Evaluating Neural Networks Architectures for Spring Reverb Modelling
di: Papaleo, Francesco, et al.
Pubblicazione: (2024)

Evaluating Semantic Fragility in Text-to-Audio Generation Systems Under Controlled Prompt Perturbations
di: Wu, Jiahui
Pubblicazione: (2026)

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations
di: Liao, Huan, et al.
Pubblicazione: (2025)