Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wang, Shaoying, Zhou, Hansong, Yuan, Yukun, Zhang, Xiaonan
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2512.09285
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914191376384000
author	Wang, Shaoying Zhou, Hansong Yuan, Yukun Zhang, Xiaonan
author_facet	Wang, Shaoying Zhou, Hansong Yuan, Yukun Zhang, Xiaonan
contents	Multi-participant meetings occur across various domains, such as business negotiations and medical consultations, during which sensitive information like trade secrets, business strategies, and patient conditions is often discussed. Previous research has demonstrated that attackers with mmWave radars outside the room can overhear meeting content by detecting minute speech-induced vibrations on objects. However, these eavesdropping attacks cannot differentiate which speech content comes from which person in a multi-participant meeting, leading to potential misunderstandings and poor decision-making. In this paper, we answer the question ``who speaks what''. By leveraging the spatial diversity introduced by ubiquitous objects, we propose an attack system that enables attackers to remotely eavesdrop on in-person conversations without requiring prior knowledge, such as identities, the number of participants, or seating arrangements. Since participants in in-person meetings are typically seated at different locations, their speech induces distinct vibration patterns on nearby objects. To exploit this, we design a noise-robust unsupervised approach for distinguishing participants by detecting speech-induced vibration differences in the frequency domain. Meanwhile, a deep learning-based framework is explored to combine signals from objects for speech quality enhancement. We validate the proof-of-concept attack on speech classification and signal enhancement through extensive experiments. The experimental results show that our attack can achieve the speech classification accuracy of up to $0.99$ with several participants in a meeting room. Meanwhile, our attack demonstrates consistent speech quality enhancement across all real-world scenarios, including different distances between the radar and the objects.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_09285
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Who Speaks What from Afar: Eavesdropping In-Person Conversations via mmWave Sensing Wang, Shaoying Zhou, Hansong Yuan, Yukun Zhang, Xiaonan Sound Multi-participant meetings occur across various domains, such as business negotiations and medical consultations, during which sensitive information like trade secrets, business strategies, and patient conditions is often discussed. Previous research has demonstrated that attackers with mmWave radars outside the room can overhear meeting content by detecting minute speech-induced vibrations on objects. However, these eavesdropping attacks cannot differentiate which speech content comes from which person in a multi-participant meeting, leading to potential misunderstandings and poor decision-making. In this paper, we answer the question ``who speaks what''. By leveraging the spatial diversity introduced by ubiquitous objects, we propose an attack system that enables attackers to remotely eavesdrop on in-person conversations without requiring prior knowledge, such as identities, the number of participants, or seating arrangements. Since participants in in-person meetings are typically seated at different locations, their speech induces distinct vibration patterns on nearby objects. To exploit this, we design a noise-robust unsupervised approach for distinguishing participants by detecting speech-induced vibration differences in the frequency domain. Meanwhile, a deep learning-based framework is explored to combine signals from objects for speech quality enhancement. We validate the proof-of-concept attack on speech classification and signal enhancement through extensive experiments. The experimental results show that our attack can achieve the speech classification accuracy of up to $0.99$ with several participants in a meeting room. Meanwhile, our attack demonstrates consistent speech quality enhancement across all real-world scenarios, including different distances between the radar and the objects.
title	Who Speaks What from Afar: Eavesdropping In-Person Conversations via mmWave Sensing
topic	Sound
url	https://arxiv.org/abs/2512.09285

Similar Items