:: Library Catalog

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xu, Yiwen, Hou, Qinyang, Wan, Hongyu, Prpa, Mirjana
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Audio and Speech Processing Artificial Intelligence Sound
Online-Zugang:	https://arxiv.org/abs/2409.15623
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
von: Ranjan, Rishabh, et al.
Veröffentlicht: (2025)

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding
von: Huo, Mingyue, et al.
Veröffentlicht: (2025)

SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
von: Hou, Yixuan, et al.
Veröffentlicht: (2025)

ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech
von: Kashyap, Gautam Siddharth, et al.
Veröffentlicht: (2025)

Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages
von: Ranjan, Rishabh, et al.
Veröffentlicht: (2025)

Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework
von: Byun, Kyungguen, et al.
Veröffentlicht: (2025)

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
von: Wang, Zhichao, et al.
Veröffentlicht: (2024)

Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection
von: Mariotte, Théo, et al.
Veröffentlicht: (2024)

Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
von: Chinchmalatpure, Prajwal, et al.
Veröffentlicht: (2025)

Efficient Scaling for LLM-based ASR
von: Mu, Bingshen, et al.
Veröffentlicht: (2025)

D3-Guard: Acoustic-based Drowsy Driving Detection Using Smartphones
von: Xie, Yadong, et al.
Veröffentlicht: (2025)

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
von: Wang, Helin, et al.
Veröffentlicht: (2024)

DENSE: Dynamic Embedding Causal Target Speech Extraction
von: Wang, Yiwen, et al.
Veröffentlicht: (2024)

Springboard, Roadblock or "Crutch"?: How Transgender Users Leverage Voice Changers for Gender Presentation in Social Virtual Reality
von: Povinelli, Kassie, et al.
Veröffentlicht: (2024)

An Investigation Into Explainable Audio Hate Speech Detection
von: An, Jinmyeong, et al.
Veröffentlicht: (2024)

Amplifying Artifacts with Speech Enhancement in Voice Anti-spoofing
von: Trachu, Thanapat, et al.
Veröffentlicht: (2025)

Speech Synthesis along Perceptual Voice Quality Dimensions
von: Rautenberg, Frederik, et al.
Veröffentlicht: (2025)

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
von: Bai, Ye, et al.
Veröffentlicht: (2024)

Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
von: Kim, David Joohun, et al.
Veröffentlicht: (2026)

SF-Speech: Straightened Flow for Zero-Shot Voice Clone
von: Li, Xuyuan, et al.
Veröffentlicht: (2024)

The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction
von: Huang, Wen-Chin, et al.
Veröffentlicht: (2024)

RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
von: Bargum, Anders R., et al.
Veröffentlicht: (2024)

End-to-End Integration of Speech Emotion Recognition with Voice Activity Detection using Self-Supervised Learning Features
von: Yamashita, Natsuo, et al.
Veröffentlicht: (2024)

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
von: Peri, Raghuveer, et al.
Veröffentlicht: (2024)

Speech to Speech Synthesis for Voice Impersonation
von: Johnson, Bjorn, et al.
Veröffentlicht: (2026)

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
von: Byun, Kyungguen, et al.
Veröffentlicht: (2024)

Loudspeaker Beamforming to Enhance Speech Recognition Performance of Voice Driven Applications
von: de Groot, Dimme, et al.
Veröffentlicht: (2025)

NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
von: Park, Nohil, et al.
Veröffentlicht: (2024)

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration
von: Kirdey, Stanislav
Veröffentlicht: (2025)

SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative Decoding
von: Wei, Linye, et al.
Veröffentlicht: (2025)

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
von: Wang, Yuanyuan, et al.
Veröffentlicht: (2025)

LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
von: Zhao, Xiaohan, et al.
Veröffentlicht: (2025)

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
von: Zheng, Zhisheng, et al.
Veröffentlicht: (2025)

Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora
von: Suda, Hitoshi, et al.
Veröffentlicht: (2025)

VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech
von: Kim, Heeseung, et al.
Veröffentlicht: (2024)

VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis
von: Jung, Jaemin, et al.
Veröffentlicht: (2024)

ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment
von: Zhao, Shengkui, et al.
Veröffentlicht: (2025)

When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
von: Satish, Shree Harsha Bokkahalli, et al.
Veröffentlicht: (2025)

Modeling of Speech-dependent Own Voice Transfer Characteristics for Hearables with In-ear Microphones
von: Ohlenbusch, Mattes, et al.
Veröffentlicht: (2023)

Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables
von: Ohlenbusch, Mattes, et al.
Veröffentlicht: (2023)