Saved in:
Bibliographic Details
Main Authors: Liu, Xiaojing, Ai, Hongwei, Reiss, Joshua D.
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2404.17821
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • We propose a speech enhancement system for multitrack audio. The system will minimize auditory masking while allowing one to hear multiple simultaneous speakers. The system can be used in multiple communication scenarios e.g., teleconferencing, invoice gaming, and live streaming. The ITU-R BS.1387 Perceptual Evaluation of Audio Quality (PEAQ) model is used to evaluate the amount of masking in the audio signals. Different audio effects e.g., level balance, equalization, dynamic range compression, and spatialization are applied via an iterative Harmony searching algorithm that aims to minimize the masking. In the subjective listening test, the designed system can compete with mixes by professional sound engineers and outperforms mixes by existing auto-mixing systems.