:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Giordano, Marco, Giacomelli, Stefano, Rinaldi, Claudia, Graziosi, Fabio
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence Audio and Speech Processing 68T07 (Primary), 68T10 (Secondary) B.1.5; B.4.5; C.3; C.4; I.2; K.4; J.2
Online Access:	https://arxiv.org/abs/2507.01563
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection
by: Giacomelli, Stefano, et al.
Published: (2025)

The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities
by: Giacomelli, Stefano, et al.
Published: (2024)

The OCON model: an old but gold solution for distributable supervised classification
by: Giacomelli, Stefano, et al.
Published: (2024)

Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations
by: Combes, Paolo, et al.
Published: (2025)

The evolution of inharmonicity and noisiness in contemporary popular music
by: Deruty, Emmanuel, et al.
Published: (2024)

Detecting and Preventing Latent Risk Accumulation in High-Performance Software Systems
by: Arafat, Jahidul, et al.
Published: (2025)

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)

STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
by: Firc, Anton, et al.
Published: (2025)

Graph Connectionist Temporal Classification for Phoneme Recognition
by: Grafé, Henry, et al.
Published: (2025)

TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection
by: Siddiqui, Yousuf Ahmed, et al.
Published: (2025)

Leveraging large multimodal models for audio-video deepfake detection: a pilot study
by: Cao, Songjun, et al.
Published: (2026)

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
by: Chen, Kuan-Yu, et al.
Published: (2025)

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)

Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)

M2D-CLAP: Exploring General-purpose Audio-Language Representations Beyond CLAP
by: Niizumi, Daisuke, et al.
Published: (2025)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)

Quantization for OpenAI's Whisper Models: A Comparative Analysis
by: Andreyev, Allison
Published: (2025)

Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements
by: BN, Suhas, et al.
Published: (2025)

PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset
by: Hou, Yang, et al.
Published: (2024)

Simultaneous source separation of unknown numbers of single-channel underwater acoustic signals based on deep neural networks with separator-decoder structure
by: Sun, Qinggang, et al.
Published: (2022)

Polarization-Based Eye Tracking with Personalized Siamese Architectures
by: Kalkanli, Beyza, et al.
Published: (2026)

Transforming faces into video stories -- VideoFace2.0
by: Brkljač, Branko, et al.
Published: (2025)

Implementation and Evaluation of Fast Raft for Hierarchical Consensus
by: Melnychuk, Anton, et al.
Published: (2025)

Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey
by: Ma, Siteng, et al.
Published: (2025)

Task-Aligned Self-Supervised Learning for Medical Image Analysis: A Systematic Review and Practical Design Guidelines
by: Wimalasiri, Chathura
Published: (2026)

BEC: Bit-Level Static Analysis for Reliability against Soft Errors
by: Ko, Yousun, et al.
Published: (2024)

OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)

Person detection and re-identification in open-world settings of retail stores and public spaces
by: Brkljač, Branko, et al.
Published: (2025)

Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
by: Du, Wenzhang
Published: (2025)

Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)

SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)

Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
by: Wu, Junyan, et al.
Published: (2024)

Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis
by: Salehi, Pegah, et al.
Published: (2024)

A Cost-Effective Eye-Tracker for Early Detection of Mild Cognitive Impairment
by: Greco, Danilo, et al.
Published: (2024)

Developing an aeroponic smart experimental greenhouse for controlling irrigation and plant disease detection using deep learning and IoT
by: Narimani, Mohammadreza, et al.
Published: (2025)

AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities
by: Papyan, Narek, et al.
Published: (2024)

Depth Priors in Removal Neural Radiance Fields
by: Guo, Zhihao, et al.
Published: (2024)

Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations
by: Hauer, Christopher
Published: (2026)