:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sun, Qinggang, Wang, Kejun
Format:	Preprint
Published:	2022
Subjects:	Sound Artificial Intelligence Machine Learning Audio and Speech Processing 68T07, 94A12, 76Q05, 68U99 I.2.6; I.5.4; J.2
Online Access:	https://arxiv.org/abs/2207.11749
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment
by: Nasriddinov, Firdavs, et al.
Published: (2024)

STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
by: Firc, Anton, et al.
Published: (2025)

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)

Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)

Graph Connectionist Temporal Classification for Phoneme Recognition
by: Grafé, Henry, et al.
Published: (2025)

Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)

Local Diagnostics of Continuous Normalizing Flow for Out-of-Distribution Detection
by: Cao, Xinwei, et al.
Published: (2026)

Matlab-based Epoch Extraction for Speaker Differentiation
by: Li, Kunlun, et al.
Published: (2024)

Quantization for OpenAI's Whisper Models: A Comparative Analysis
by: Andreyev, Allison
Published: (2025)

Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements
by: BN, Suhas, et al.
Published: (2025)

Understanding the Algorithm Behind Audio Key Detection
by: Silva, Henrique Perez G.
Published: (2025)

Benchmarking Foundation Speech and Language Models for Alzheimer's Disease and Related Dementia Detection from Spontaneous Speech
by: Li, Jingyu, et al.
Published: (2025)

A study on audio synchronous steganography detection and distributed guide inference model based on sliding spectral features and intelligent inference drive
by: Meng, Wei
Published: (2025)

Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS
by: Ethiraj, Vignesh, et al.
Published: (2025)

STAR: Speech-to-Audio Generation via Representation Learning
by: Xie, Zeyu, et al.
Published: (2025)

M2D-CLAP: Exploring General-purpose Audio-Language Representations Beyond CLAP
by: Niizumi, Daisuke, et al.
Published: (2025)

FakeSound2: A Benchmark for Explainable and Generalizable Deepfake Sound Detection
by: Xie, Zeyu, et al.
Published: (2025)

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
by: Xie, Zeyu, et al.
Published: (2024)

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
by: Xie, Zeyu, et al.
Published: (2024)

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS
by: Zheng, Zihao, et al.
Published: (2026)

PicoAudio2: Temporal Controllable Text-to-Audio Generation with Natural Language Description
by: Zheng, Zihao, et al.
Published: (2025)

FakeSound: Deepfake General Audio Detection
by: Xie, Zeyu, et al.
Published: (2024)

From Black Box to Glass Box: Cross-Model ASR Disagreement to Prioto Review in Ambient AI Scribe Documentation
by: Karbalaie, Abdolamir, et al.
Published: (2026)

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
by: Sun, Ling, et al.
Published: (2025)

Investigating Prosodic Signatures via Speech Pre-Trained Models for Audio Deepfake Source Attribution
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
by: Phukan, Orchid Chetia, et al.
Published: (2024)

Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)

SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
by: Jain, Sarthak, et al.
Published: (2024)

Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
by: Phukan, Orchid Chetia, et al.
Published: (2024)

AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities
by: Papyan, Narek, et al.
Published: (2024)

The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities
by: Giacomelli, Stefano, et al.
Published: (2024)

IF-D: A High-Frequency, General-Purpose Inertial Foundation Dataset for Self-Supervised Learning
by: Ferreira, Patrick, et al.
Published: (2025)

MVTamperBench: Evaluating Robustness of Vision-Language Models
by: Agarwal, Amit, et al.
Published: (2024)

Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
by: Wu, Junyan, et al.
Published: (2024)

VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
by: Zhang, Haichao, et al.
Published: (2025)

Associative Syntax and Maximal Repetitions reveal context-dependent complexity in fruit bat communication
by: Assom, Luigi
Published: (2025)