Saved in:
Bibliographic Details
Main Authors: Zhang, Yucong, Liu, Juan, Tian, Yao, Liu, Haifeng, Li, Ming
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.03610
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • In contrast to human speech, machine-generated sounds of the same type often exhibit consistent frequency characteristics and discernible temporal periodicity. However, leveraging these dual attributes in anomaly detection remains relatively under-explored. In this paper, we propose an automated dual-path framework that learns prominent frequency and temporal patterns for diverse machine types. One pathway uses a novel Frequency-and-Time Excited Network (FTE-Net) to learn the salient features across frequency and time axes of the spectrogram. It incorporates a Frequency-and-Time Chunkwise Encoder (FTC-Encoder) and an excitation network. The other pathway uses a 1D convolutional network for utterance-level spectrum. Experimental results on the DCASE 2023 task 2 dataset show the state-of-the-art performance of our proposed method. Moreover, visualizations of the intermediate feature maps in the excitation network are provided to illustrate the effectiveness of our method.