Saved in:
| Main Authors: | Giordano, Marco, Giacomelli, Stefano, Rinaldi, Claudia, Graziosi, Fabio |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.01563 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection
by: Giacomelli, Stefano, et al.
Published: (2025)
by: Giacomelli, Stefano, et al.
Published: (2025)
The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities
by: Giacomelli, Stefano, et al.
Published: (2024)
by: Giacomelli, Stefano, et al.
Published: (2024)
The OCON model: an old but gold solution for distributable supervised classification
by: Giacomelli, Stefano, et al.
Published: (2024)
by: Giacomelli, Stefano, et al.
Published: (2024)
Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations
by: Combes, Paolo, et al.
Published: (2025)
by: Combes, Paolo, et al.
Published: (2025)
The evolution of inharmonicity and noisiness in contemporary popular music
by: Deruty, Emmanuel, et al.
Published: (2024)
by: Deruty, Emmanuel, et al.
Published: (2024)
Detecting and Preventing Latent Risk Accumulation in High-Performance Software Systems
by: Arafat, Jahidul, et al.
Published: (2025)
by: Arafat, Jahidul, et al.
Published: (2025)
HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)
by: Khushiyant, et al.
Published: (2026)
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
by: Firc, Anton, et al.
Published: (2025)
by: Firc, Anton, et al.
Published: (2025)
Graph Connectionist Temporal Classification for Phoneme Recognition
by: Grafé, Henry, et al.
Published: (2025)
by: Grafé, Henry, et al.
Published: (2025)
TRACES: Temporal Recall with Contextual Embeddings for Real-Time Video Anomaly Detection
by: Siddiqui, Yousuf Ahmed, et al.
Published: (2025)
by: Siddiqui, Yousuf Ahmed, et al.
Published: (2025)
Leveraging large multimodal models for audio-video deepfake detection: a pilot study
by: Cao, Songjun, et al.
Published: (2026)
by: Cao, Songjun, et al.
Published: (2026)
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
by: Chen, Kuan-Yu, et al.
Published: (2025)
by: Chen, Kuan-Yu, et al.
Published: (2025)
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)
by: Hori, Takaaki, et al.
Published: (2025)
Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)
by: Sun, Qiyang, et al.
Published: (2024)
M2D-CLAP: Exploring General-purpose Audio-Language Representations Beyond CLAP
by: Niizumi, Daisuke, et al.
Published: (2025)
by: Niizumi, Daisuke, et al.
Published: (2025)
Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)
by: Liu, Yucheng, et al.
Published: (2025)
Prevailing Research Areas for Music AI in the Era of Foundation Models
by: Wei, Megan, et al.
Published: (2024)
by: Wei, Megan, et al.
Published: (2024)
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)
by: Lasbordes, Maxence, et al.
Published: (2025)
Quantization for OpenAI's Whisper Models: A Comparative Analysis
by: Andreyev, Allison
Published: (2025)
by: Andreyev, Allison
Published: (2025)
Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements
by: BN, Suhas, et al.
Published: (2025)
by: BN, Suhas, et al.
Published: (2025)
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset
by: Hou, Yang, et al.
Published: (2024)
by: Hou, Yang, et al.
Published: (2024)
Simultaneous source separation of unknown numbers of single-channel underwater acoustic signals based on deep neural networks with separator-decoder structure
by: Sun, Qinggang, et al.
Published: (2022)
by: Sun, Qinggang, et al.
Published: (2022)
Polarization-Based Eye Tracking with Personalized Siamese Architectures
by: Kalkanli, Beyza, et al.
Published: (2026)
by: Kalkanli, Beyza, et al.
Published: (2026)
Transforming faces into video stories -- VideoFace2.0
by: Brkljač, Branko, et al.
Published: (2025)
by: Brkljač, Branko, et al.
Published: (2025)
Implementation and Evaluation of Fast Raft for Hierarchical Consensus
by: Melnychuk, Anton, et al.
Published: (2025)
by: Melnychuk, Anton, et al.
Published: (2025)
Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey
by: Ma, Siteng, et al.
Published: (2025)
by: Ma, Siteng, et al.
Published: (2025)
Task-Aligned Self-Supervised Learning for Medical Image Analysis: A Systematic Review and Practical Design Guidelines
by: Wimalasiri, Chathura
Published: (2026)
by: Wimalasiri, Chathura
Published: (2026)
BEC: Bit-Level Static Analysis for Reliability against Soft Errors
by: Ko, Yousun, et al.
Published: (2024)
by: Ko, Yousun, et al.
Published: (2024)
OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)
Person detection and re-identification in open-world settings of retail stores and public spaces
by: Brkljač, Branko, et al.
Published: (2025)
by: Brkljač, Branko, et al.
Published: (2025)
Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
by: Du, Wenzhang
Published: (2025)
by: Du, Wenzhang
Published: (2025)
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
SemAlignVC: Enhancing zero-shot timbre conversion using semantic alignment
by: Mehta, Shivam, et al.
Published: (2025)
by: Mehta, Shivam, et al.
Published: (2025)
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
by: Wu, Junyan, et al.
Published: (2024)
by: Wu, Junyan, et al.
Published: (2024)
Comparative Analysis of Audio Feature Extraction for Real-Time Talking Portrait Synthesis
by: Salehi, Pegah, et al.
Published: (2024)
by: Salehi, Pegah, et al.
Published: (2024)
A Cost-Effective Eye-Tracker for Early Detection of Mild Cognitive Impairment
by: Greco, Danilo, et al.
Published: (2024)
by: Greco, Danilo, et al.
Published: (2024)
Developing an aeroponic smart experimental greenhouse for controlling irrigation and plant disease detection using deep learning and IoT
by: Narimani, Mohammadreza, et al.
Published: (2025)
by: Narimani, Mohammadreza, et al.
Published: (2025)
AI-based Drone Assisted Human Rescue in Disaster Environments: Challenges and Opportunities
by: Papyan, Narek, et al.
Published: (2024)
by: Papyan, Narek, et al.
Published: (2024)
Depth Priors in Removal Neural Radiance Fields
by: Guo, Zhihao, et al.
Published: (2024)
by: Guo, Zhihao, et al.
Published: (2024)
Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations
by: Hauer, Christopher
Published: (2026)
by: Hauer, Christopher
Published: (2026)
Similar Items
-
From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection
by: Giacomelli, Stefano, et al.
Published: (2025) -
The OCON model: an old but green solution for distributable supervised classification for acoustic monitoring in smart cities
by: Giacomelli, Stefano, et al.
Published: (2024) -
The OCON model: an old but gold solution for distributable supervised classification
by: Giacomelli, Stefano, et al.
Published: (2024) -
Neural Proxies for Sound Synthesizers: Learning Perceptually Informed Preset Representations
by: Combes, Paolo, et al.
Published: (2025) -
The evolution of inharmonicity and noisiness in contemporary popular music
by: Deruty, Emmanuel, et al.
Published: (2024)