Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lang, Yicheng, Guo, Kehan, Huang, Yue, Zhou, Yujun, Zhuang, Haomin, Yang, Tianyu, Su, Yao, Zhang, Xiangliang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.13996
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910835875512320
author	Lang, Yicheng Guo, Kehan Huang, Yue Zhou, Yujun Zhuang, Haomin Yang, Tianyu Su, Yao Zhang, Xiangliang
author_facet	Lang, Yicheng Guo, Kehan Huang, Yue Zhou, Yujun Zhuang, Haomin Yang, Tianyu Su, Yao Zhang, Xiangliang
contents	Due to the widespread use of LLMs and the rising critical ethical and safety concerns, LLM unlearning methods have been developed to remove harmful knowledge and undesirable capabilities. In this context, evaluations are mostly based on single-value metrics such as QA accuracy. However, these metrics often fail to capture the nuanced retention of harmful knowledge components, making it difficult to assess the true effectiveness of unlearning. To address this issue, we propose UNCD (UNlearning evaluation via Cognitive Diagnosis), a novel framework that leverages Cognitive Diagnosis Modeling for fine-grained evaluation of LLM unlearning. Our dedicated benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities. Moreover, we introduce UNCD-Agent, which refines unlearning by diagnosing knowledge remnants and generating targeted unlearning data. Extensive experiments across eight unlearning methods and two base models demonstrate that UNCD not only enhances evaluation but also effectively facilitates the removal of harmful LLM abilities.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_13996
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis Lang, Yicheng Guo, Kehan Huang, Yue Zhou, Yujun Zhuang, Haomin Yang, Tianyu Su, Yao Zhang, Xiangliang Machine Learning Due to the widespread use of LLMs and the rising critical ethical and safety concerns, LLM unlearning methods have been developed to remove harmful knowledge and undesirable capabilities. In this context, evaluations are mostly based on single-value metrics such as QA accuracy. However, these metrics often fail to capture the nuanced retention of harmful knowledge components, making it difficult to assess the true effectiveness of unlearning. To address this issue, we propose UNCD (UNlearning evaluation via Cognitive Diagnosis), a novel framework that leverages Cognitive Diagnosis Modeling for fine-grained evaluation of LLM unlearning. Our dedicated benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities. Moreover, we introduce UNCD-Agent, which refines unlearning by diagnosing knowledge remnants and generating targeted unlearning data. Extensive experiments across eight unlearning methods and two base models demonstrate that UNCD not only enhances evaluation but also effectively facilitates the removal of harmful LLM abilities.
title	Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
topic	Machine Learning
url	https://arxiv.org/abs/2502.13996

Similar Items