Guardado en:
| Autores principales: | Xie, Yadong, Li, Fan, Wu, Yue, Wang, Yu |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2504.00435 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
HearFit+: Personalized Fitness Monitoring via Audio Signals on Smart Speakers
por: Xie, Yadong, et al.
Publicado: (2025)
por: Xie, Yadong, et al.
Publicado: (2025)
HearSmoking: Smoking Detection in Driving Environment via Acoustic Sensing on Smartphones
por: Xie, Yadong, et al.
Publicado: (2025)
por: Xie, Yadong, et al.
Publicado: (2025)
D3-Guard: Acoustic-based Drowsy Driving Detection Using Smartphones
por: Xie, Yadong, et al.
Publicado: (2025)
por: Xie, Yadong, et al.
Publicado: (2025)
Detecting abnormal heart sound using mobile phones and on-device IConNet
por: Vu, Linh, et al.
Publicado: (2024)
por: Vu, Linh, et al.
Publicado: (2024)
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
por: Li, Yadong, et al.
Publicado: (2026)
por: Li, Yadong, et al.
Publicado: (2026)
LARA-Gen: Enabling Continuous Emotion Control for Music Generation Models via Latent Affective Representation Alignment
por: Mei, Jiahao, et al.
Publicado: (2025)
por: Mei, Jiahao, et al.
Publicado: (2025)
Efficient Speech Enhancement via Embeddings from Pre-trained Generative Audioencoders
por: Sun, Xingwei, et al.
Publicado: (2025)
por: Sun, Xingwei, et al.
Publicado: (2025)
LLaDA-TTS: Unifying Speech Synthesis and Zero-Shot Editing via Masked Diffusion Modeling
por: Fan, Xiaoyu, et al.
Publicado: (2026)
por: Fan, Xiaoyu, et al.
Publicado: (2026)
Text2Move: Text-to-moving sound generation via trajectory prediction and temporal alignment
por: Liu, Yunyi, et al.
Publicado: (2025)
por: Liu, Yunyi, et al.
Publicado: (2025)
Noise-Robust Sound Event Detection and Counting via Language-Queried Sound Separation
por: Chen, Yuanjian, et al.
Publicado: (2025)
por: Chen, Yuanjian, et al.
Publicado: (2025)
Directional sound transmission and reception of the beluga whale ().
por: Ou, Wenzhan, et al.
Publicado: (2025)
por: Ou, Wenzhan, et al.
Publicado: (2025)
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
por: Mei, Jiahao, et al.
Publicado: (2026)
por: Mei, Jiahao, et al.
Publicado: (2026)
Neural personal sound zones with flexible bright zone control
por: Zhu, Wenye, et al.
Publicado: (2025)
por: Zhu, Wenye, et al.
Publicado: (2025)
Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
por: Xie, Jingran, et al.
Publicado: (2025)
por: Xie, Jingran, et al.
Publicado: (2025)
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
por: Li, Yue, et al.
Publicado: (2024)
por: Li, Yue, et al.
Publicado: (2024)
Fine-tune the pretrained ATST model for sound event detection
por: Shao, Nian, et al.
Publicado: (2023)
por: Shao, Nian, et al.
Publicado: (2023)
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
por: Yue, Haobo, et al.
Publicado: (2024)
por: Yue, Haobo, et al.
Publicado: (2024)
StyleBreak: Revealing Alignment Vulnerabilities in Large Audio-Language Models via Style-Aware Audio Jailbreak
por: Li, Hongyi, et al.
Publicado: (2025)
por: Li, Hongyi, et al.
Publicado: (2025)
A Robust framework for sound event localization and detection on real recordings
por: Kim, Jin Sob, et al.
Publicado: (2025)
por: Kim, Jin Sob, et al.
Publicado: (2025)
When Pamplona sounds different: the soundscape transformation of San Fermin through intelligent acoustic sensors and a sound repository
por: Sagasti, Amaia, et al.
Publicado: (2025)
por: Sagasti, Amaia, et al.
Publicado: (2025)
StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion
por: Li, Fengjin, et al.
Publicado: (2025)
por: Li, Fengjin, et al.
Publicado: (2025)
Differentiable physics for sound field reconstruction
por: Verburg, Samuel A., et al.
Publicado: (2025)
por: Verburg, Samuel A., et al.
Publicado: (2025)
Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
por: Xie, Jingran, et al.
Publicado: (2025)
por: Xie, Jingran, et al.
Publicado: (2025)
Ti-Audio: The First Multi-Dialectal End-to-End Speech LLM for Tibetan
por: Wang, Jialing, et al.
Publicado: (2026)
por: Wang, Jialing, et al.
Publicado: (2026)
Frequency-aware convolution for sound event detection
por: Song, Tao, et al.
Publicado: (2024)
por: Song, Tao, et al.
Publicado: (2024)
The Interspeech 2026 Audio Encoder Capability Challenge for Large Audio Language Models
por: Dinkel, Heinrich, et al.
Publicado: (2026)
por: Dinkel, Heinrich, et al.
Publicado: (2026)
Vox-Evaluator: Enhancing Stability and Fidelity for Zero-shot TTS with A Multi-Level Evaluator
por: Wang, Hualei, et al.
Publicado: (2025)
por: Wang, Hualei, et al.
Publicado: (2025)
Robust detection of overlapping bioacoustic sound events
por: Mahon, Louis, et al.
Publicado: (2025)
por: Mahon, Louis, et al.
Publicado: (2025)
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
por: Zhang, Li, et al.
Publicado: (2024)
por: Zhang, Li, et al.
Publicado: (2024)
The Neural-SRP method for positional sound source localization
por: Grinstein, Eric, et al.
Publicado: (2024)
por: Grinstein, Eric, et al.
Publicado: (2024)
Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting
por: Ai, Zhiqi, et al.
Publicado: (2025)
por: Ai, Zhiqi, et al.
Publicado: (2025)
SemanticVocoder: Bridging Audio Generation and Audio Understanding via Semantic Latents
por: Xie, Zeyu, et al.
Publicado: (2026)
por: Xie, Zeyu, et al.
Publicado: (2026)
The language of sound search: Examining User Queries in Audio Search Engines
por: Weck, Benno, et al.
Publicado: (2024)
por: Weck, Benno, et al.
Publicado: (2024)
DARS: Dysarthria-Aware Rhythm-Style Synthesis for ASR Enhancement
por: Wu, Minghui, et al.
Publicado: (2026)
por: Wu, Minghui, et al.
Publicado: (2026)
Resonate: Reinforcing Text-to-Audio Generation via Online Feedback from Large Audio Language Models
por: Li, Xiquan, et al.
Publicado: (2026)
por: Li, Xiquan, et al.
Publicado: (2026)
Fed-PISA: Federated Voice Cloning via Personalized Identity-Style Adaptation
por: Wang, Qi, et al.
Publicado: (2025)
por: Wang, Qi, et al.
Publicado: (2025)
Some clues to build a sound analysis relevant to hearing
por: Millot, Laurent
Publicado: (2024)
por: Millot, Laurent
Publicado: (2024)
Interaural time difference loss for binaural target sound extraction
por: Hernandez-Olivan, Carlos, et al.
Publicado: (2024)
por: Hernandez-Olivan, Carlos, et al.
Publicado: (2024)
Onset and offset weighted loss function for sound event detection
por: Song, Tao
Publicado: (2024)
por: Song, Tao
Publicado: (2024)
ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding
por: Niu, Yadong, et al.
Publicado: (2026)
por: Niu, Yadong, et al.
Publicado: (2026)
Ejemplares similares
-
HearFit+: Personalized Fitness Monitoring via Audio Signals on Smart Speakers
por: Xie, Yadong, et al.
Publicado: (2025) -
HearSmoking: Smoking Detection in Driving Environment via Acoustic Sensing on Smartphones
por: Xie, Yadong, et al.
Publicado: (2025) -
D3-Guard: Acoustic-based Drowsy Driving Detection Using Smartphones
por: Xie, Yadong, et al.
Publicado: (2025) -
Detecting abnormal heart sound using mobile phones and on-device IConNet
por: Vu, Linh, et al.
Publicado: (2024) -
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction
por: Li, Yadong, et al.
Publicado: (2026)