Saved in:
| Main Authors: | Sun, Qinggang, Wang, Kejun |
|---|---|
| Format: | Preprint |
| Published: |
2022
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2207.11749 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment
by: Nasriddinov, Firdavs, et al.
Published: (2024)
by: Nasriddinov, Firdavs, et al.
Published: (2024)
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
by: Firc, Anton, et al.
Published: (2025)
by: Firc, Anton, et al.
Published: (2025)
HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)
by: Khushiyant, et al.
Published: (2026)
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)
by: Hori, Takaaki, et al.
Published: (2025)
Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)
by: Sun, Qiyang, et al.
Published: (2024)
Graph Connectionist Temporal Classification for Phoneme Recognition
by: Grafé, Henry, et al.
Published: (2025)
by: Grafé, Henry, et al.
Published: (2025)
Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)
by: Wei, Megan, et al.
Published: (2024)
Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)
by: Liu, Yucheng, et al.
Published: (2025)
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)
by: Lasbordes, Maxence, et al.
Published: (2025)
Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection
by: Cao, Xinwei, et al.
Published: (2026)
by: Cao, Xinwei, et al.
Published: (2026)
Matlab-based Epoch Extraction for Speaker Differentiation
by: Li, Kunlun, et al.
Published: (2024)
by: Li, Kunlun, et al.
Published: (2024)
Quantization for OpenAI's Whisper Models: A Comparative Analysis
by: Andreyev, Allison
Published: (2025)
by: Andreyev, Allison
Published: (2025)
Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements
by: BN, Suhas, et al.
Published: (2025)
by: BN, Suhas, et al.
Published: (2025)
Understanding the Algorithm Behind Audio Key Detection
by: Silva, Henrique Perez G.
Published: (2025)
by: Silva, Henrique Perez G.
Published: (2025)
Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
by: Li, Jingyu, et al.
Published: (2025)
by: Li, Jingyu, et al.
Published: (2025)
A study on audio synchronous steganography detection and distributed guide inference model based on sliding spectral features and intelligent inference drive
by: Meng, Wei
Published: (2025)
by: Meng, Wei
Published: (2025)
Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS
by: Ethiraj, Vignesh, et al.
Published: (2025)
by: Ethiraj, Vignesh, et al.
Published: (2025)
STAR: Speech-to-Audio Generation via Representation Learning
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
M2D-CLAP: Exploring General-purpose Audio-Language Representations Beyond CLAP
by: Niizumi, Daisuke, et al.
Published: (2025)
by: Niizumi, Daisuke, et al.
Published: (2025)
FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS
by: Zheng, Zihao, et al.
Published: (2026)
by: Zheng, Zihao, et al.
Published: (2026)
PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
by: Zheng, Zihao, et al.
Published: (2025)
by: Zheng, Zihao, et al.
Published: (2025)
FakeSound: Deepfake General Audio Detection
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
From Black Box to Glass Box: Cross-Model ASR Disagreement to Prioto Review in Ambient AI Scribe Documentation
by: Karbalaie, Abdolamir, et al.
Published: (2026)
by: Karbalaie, Abdolamir, et al.
Published: (2026)
Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
by: Sun, Ling, et al.
Published: (2025)
by: Sun, Ling, et al.
Published: (2025)
Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)
by: Jain, Sarthak, et al.
Published: (2024)
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities
by: Papyan, Narek, et al.
Published: (2024)
by: Papyan, Narek, et al.
Published: (2024)
The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities
by: Giacomelli, Stefano, et al.
Published: (2024)
by: Giacomelli, Stefano, et al.
Published: (2024)
IF-D: A High-Frequency, General-Purpose Inertial Foundation Dataset for Self-Supervised Learning
by: Ferreira, Patrick, et al.
Published: (2025)
by: Ferreira, Patrick, et al.
Published: (2025)
MVTamperBench: Evaluating Robustness of Vision-Language Models
by: Agarwal, Amit, et al.
Published: (2024)
by: Agarwal, Amit, et al.
Published: (2024)
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
by: Wu, Junyan, et al.
Published: (2024)
by: Wu, Junyan, et al.
Published: (2024)
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
by: Zhang, Haichao, et al.
Published: (2025)
by: Zhang, Haichao, et al.
Published: (2025)
Associative Syntax and Maximal Repetitions reveal context-dependent complexity in fruit bat communication
by: Assom, Luigi
Published: (2025)
by: Assom, Luigi
Published: (2025)
Similar Items
-
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment
by: Nasriddinov, Firdavs, et al.
Published: (2024) -
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
by: Firc, Anton, et al.
Published: (2025) -
HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026) -
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025) -
Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)