Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ji, Xu, Zhang, Jianyi, Zhou, Ziyin, Zhao, Zhangchi, Qiao, Qianqian, Han, Kaiying, Hossen, Md Imran, Hei, Xiali
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2405.00718
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917655352442880
author	Ji, Xu Zhang, Jianyi Zhou, Ziyin Zhao, Zhangchi Qiao, Qianqian Han, Kaiying Hossen, Md Imran Hei, Xiali
author_facet	Ji, Xu Zhang, Jianyi Zhou, Ziyin Zhao, Zhangchi Qiao, Qianqian Han, Kaiying Hossen, Md Imran Hei, Xiali
contents	Ensuring the resilience of Large Language Models (LLMs) against malicious exploitation is paramount, with recent focus on mitigating offensive responses. Yet, the understanding of cant or dark jargon remains unexplored. This paper introduces a domain-specific Cant dataset and CantCounter evaluation framework, employing Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis stages. Experiments reveal LLMs, including ChatGPT, are susceptible to cant bypassing filters, with varying recognition accuracy influenced by question types, setups, and prompt clues. Updated models exhibit higher acceptance rates for cant queries. Moreover, LLM reactions differ across domains, e.g., reluctance to engage in racism versus LGBT topics. These findings underscore LLMs' understanding of cant and reflect training data characteristics and vendor approaches to sensitive topics. Additionally, we assess LLMs' ability to demonstrate reasoning capabilities. Access to our datasets and code is available at https://github.com/cistineup/CantCounter.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_00718
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models Ji, Xu Zhang, Jianyi Zhou, Ziyin Zhao, Zhangchi Qiao, Qianqian Han, Kaiying Hossen, Md Imran Hei, Xiali Computation and Language Artificial Intelligence Ensuring the resilience of Large Language Models (LLMs) against malicious exploitation is paramount, with recent focus on mitigating offensive responses. Yet, the understanding of cant or dark jargon remains unexplored. This paper introduces a domain-specific Cant dataset and CantCounter evaluation framework, employing Fine-Tuning, Co-Tuning, Data-Diffusion, and Data-Analysis stages. Experiments reveal LLMs, including ChatGPT, are susceptible to cant bypassing filters, with varying recognition accuracy influenced by question types, setups, and prompt clues. Updated models exhibit higher acceptance rates for cant queries. Moreover, LLM reactions differ across domains, e.g., reluctance to engage in racism versus LGBT topics. These findings underscore LLMs' understanding of cant and reflect training data characteristics and vendor approaches to sensitive topics. Additionally, we assess LLMs' ability to demonstrate reasoning capabilities. Access to our datasets and code is available at https://github.com/cistineup/CantCounter.
title	Can't say cant? Measuring and Reasoning of Dark Jargons in Large Language Models
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2405.00718

Similar Items