Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hu, Ruihan, Shang, Yu-Ming, Luo, Wei, Tao, Ye, Zhang, Xi
Format:	Preprint
Published:	2026
Subjects:	Cryptography and Security
Online Access:	https://arxiv.org/abs/2601.13607
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914265758171136
author	Hu, Ruihan Shang, Yu-Ming Luo, Wei Tao, Ye Zhang, Xi
author_facet	Hu, Ruihan Shang, Yu-Ming Luo, Wei Tao, Ye Zhang, Xi
contents	Large Reasoning Models (LRMs) have rapidly gained prominence for their strong performance in solving complex tasks. Many modern black-box LRMs expose the intermediate reasoning traces through APIs to improve transparency (e.g., Gemini-2.5 and Claude-sonnet). Despite their benefits, we find that these traces can leak membership signals, creating a new privacy threat even without access to token logits used in prior attacks. In this work, we initiate the first systematic exploration of Membership Inference Attacks (MIAs) on black-box LRMs. Our preliminary analysis shows that LRMs produce confident, recall-like reasoning traces on familiar training member samples but more hesitant, inference-like reasoning traces on non-members. The representations of these traces are continuously distributed in the semantic latent space, spanning from familiar to unfamiliar samples. Building on this observation, we propose BlackSpectrum, the first membership inference attack framework targeting the black-box LRMs. The key idea is to construct a recall-inference axis in the semantic latent space, based on representations derived from the exposed traces. By locating where a query sample falls along this axis, the attacker can obtain a membership score and predict how likely it is to be a member of the training data. Additionally, to address the limitations of outdated datasets unsuited to modern LRMs, we provide two new datasets to support future research, arXivReasoning and BookReasoning. Empirically, exposing reasoning traces significantly increases the vulnerability of LRMs to membership inference attacks, leading to large gains in attack performance. Our findings highlight the need for LRM companies to balance transparency in intermediate reasoning traces with privacy preservation.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_13607
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models Hu, Ruihan Shang, Yu-Ming Luo, Wei Tao, Ye Zhang, Xi Cryptography and Security Large Reasoning Models (LRMs) have rapidly gained prominence for their strong performance in solving complex tasks. Many modern black-box LRMs expose the intermediate reasoning traces through APIs to improve transparency (e.g., Gemini-2.5 and Claude-sonnet). Despite their benefits, we find that these traces can leak membership signals, creating a new privacy threat even without access to token logits used in prior attacks. In this work, we initiate the first systematic exploration of Membership Inference Attacks (MIAs) on black-box LRMs. Our preliminary analysis shows that LRMs produce confident, recall-like reasoning traces on familiar training member samples but more hesitant, inference-like reasoning traces on non-members. The representations of these traces are continuously distributed in the semantic latent space, spanning from familiar to unfamiliar samples. Building on this observation, we propose BlackSpectrum, the first membership inference attack framework targeting the black-box LRMs. The key idea is to construct a recall-inference axis in the semantic latent space, based on representations derived from the exposed traces. By locating where a query sample falls along this axis, the attacker can obtain a membership score and predict how likely it is to be a member of the training data. Additionally, to address the limitations of outdated datasets unsuited to modern LRMs, we provide two new datasets to support future research, arXivReasoning and BookReasoning. Empirically, exposing reasoning traces significantly increases the vulnerability of LRMs to membership inference attacks, leading to large gains in attack performance. Our findings highlight the need for LRM companies to balance transparency in intermediate reasoning traces with privacy preservation.
title	When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models
topic	Cryptography and Security
url	https://arxiv.org/abs/2601.13607

Similar Items