:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Xie, Yadong, Li, Fan, Wu, Yue, Wang, Yu
Formato:	Preprint
Publicado:	2025
Materias:	Sound
Acceso en línea:	https://arxiv.org/abs/2504.00435
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

HearFit+: Personalized Fitness Monitoring via Audio Signals on Smart Speakers
por: Xie, Yadong, et al.
Publicado: (2025)

HearSmoking: Smoking Detection in Driving Environment via Acoustic Sensing on Smartphones
por: Xie, Yadong, et al.
Publicado: (2025)

D3-Guard: Acoustic-based Drowsy Driving Detection Using Smartphones
por: Xie, Yadong, et al.
Publicado: (2025)

Detecting abnormal heart sound using mobile phones and on-device IConNet
por: Vu, Linh, et al.
Publicado: (2024)

UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
por: Li, Yadong, et al.
Publicado: (2026)

LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment
por: Mei, Jiahao, et al.
Publicado: (2025)

Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
por: Sun, Xingwei, et al.
Publicado: (2025)

LLaDA-TTS: Unifying Speech Synthesis and Zero-Shot Editing via Masked Diffusion Modeling
por: Fan, Xiaoyu, et al.
Publicado: (2026)

Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment
por: Liu, Yunyi, et al.
Publicado: (2025)

Noise-Robust Sound Event Detection and Counting via Language-Queried Sound Separation
por: Chen, Yuanjian, et al.
Publicado: (2025)

Directional sound transmission and reception of the beluga whale ().
por: Ou, Wenzhan, et al.
Publicado: (2025)

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
por: Mei, Jiahao, et al.
Publicado: (2026)

Neural personal sound zones with flexible bright zone control
por: Zhu, Wenye, et al.
Publicado: (2025)

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
por: Xie, Jingran, et al.
Publicado: (2025)

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
por: Li, Yue, et al.
Publicado: (2024)

Fine-tune the pretrained ATST model for sound event detection
por: Shao, Nian, et al.
Publicado: (2023)

Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
por: Yue, Haobo, et al.
Publicado: (2024)

StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak
por: Li, Hongyi, et al.
Publicado: (2025)

A Robust framework for sound event localization and detection on real recordings
por: Kim, Jin Sob, et al.
Publicado: (2025)

When Pamplona sounds different: the soundscape transformation of San Fermin through intelligent acoustic sensors and a sound repository
por: Sagasti, Amaia, et al.
Publicado: (2025)

StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
por: Li, Fengjin, et al.
Publicado: (2025)

Differentiable physics for sound field reconstruction
por: Verburg, Samuel A., et al.
Publicado: (2025)

Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
por: Xie, Jingran, et al.
Publicado: (2025)

Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
por: Wang, Jialing, et al.
Publicado: (2026)

Frequency-aware convolution for sound event detection
por: Song, Tao, et al.
Publicado: (2024)

The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
por: Dinkel, Heinrich, et al.
Publicado: (2026)

Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
por: Wang, Hualei, et al.
Publicado: (2025)

Robust detection of overlapping bioacoustic sound events
por: Mahon, Louis, et al.
Publicado: (2025)

Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
por: Zhang, Li, et al.
Publicado: (2024)

The Neural-SRP method for positional sound source localization
por: Grinstein, Eric, et al.
Publicado: (2024)

Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting
por: Ai, Zhiqi, et al.
Publicado: (2025)

SemanticVocoder: Bridging Audio Generation and Audio Understanding via Semantic Latents
por: Xie, Zeyu, et al.
Publicado: (2026)

The language of sound search: Examining User Queries in Audio Search Engines
por: Weck, Benno, et al.
Publicado: (2024)

DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement
por: Wu, Minghui, et al.
Publicado: (2026)

Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models
por: Li, Xiquan, et al.
Publicado: (2026)

Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation
por: Wang, Qi, et al.
Publicado: (2025)

Some clues to build a sound analysis relevant to hearing
por: Millot, Laurent
Publicado: (2024)

Interaural time difference loss for binaural target sound extraction
por: Hernandez-Olivan, Carlos, et al.
Publicado: (2024)

Onset and offset weighted loss function for sound event detection
por: Song, Tao
Publicado: (2024)

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
por: Niu, Yadong, et al.
Publicado: (2026)