Saved in:
| Main Authors: | Lin, Jiongliang, Guo, Yiwen, Chen, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.13402 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing
by: He, Yifeng, et al.
Published: (2024)
by: He, Yifeng, et al.
Published: (2024)
Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
by: Chen, Jiaqi, et al.
Published: (2024)
by: Chen, Jiaqi, et al.
Published: (2024)
DUP: Detection-guided Unlearning for Backdoor Purification in Language Models
by: Hu, Man, et al.
Published: (2025)
by: Hu, Man, et al.
Published: (2025)
Exploring Backdoor Vulnerabilities of Chat Models
by: Hao, Yunzhuo, et al.
Published: (2024)
by: Hao, Yunzhuo, et al.
Published: (2024)
Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026)
by: Min, Nay Myat, et al.
Published: (2026)
Code Vulnerability Detection Across Different Programming Languages with AI Models
by: Humran, Hael Abdulhakim Ali, et al.
Published: (2025)
by: Humran, Hael Abdulhakim Ali, et al.
Published: (2025)
Large Language Model Sentinel: LLM Agent for Adversarial Purification
by: Lin, Guang, et al.
Published: (2024)
by: Lin, Guang, et al.
Published: (2024)
Reverse-Engineering Model Editing on Language Models
by: Sun, Zhiyu, et al.
Published: (2026)
by: Sun, Zhiyu, et al.
Published: (2026)
STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models
by: Wang, Xunguang, et al.
Published: (2025)
by: Wang, Xunguang, et al.
Published: (2025)
Distract Large Language Models for Automatic Jailbreak Attack
by: Xiao, Zeguan, et al.
Published: (2024)
by: Xiao, Zeguan, et al.
Published: (2024)
PRISON: Unmasking the Criminal Potential of Large Language Models
by: Wu, Xinyi, et al.
Published: (2025)
by: Wu, Xinyi, et al.
Published: (2025)
Text Embedding Inversion Security for Multilingual Language Models
by: Chen, Yiyi, et al.
Published: (2024)
by: Chen, Yiyi, et al.
Published: (2024)
Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring
by: Hua, Peichun, et al.
Published: (2025)
by: Hua, Peichun, et al.
Published: (2025)
Is the System Message Really Important to Jailbreaks in Large Language Models?
by: Zou, Xiaotian, et al.
Published: (2024)
by: Zou, Xiaotian, et al.
Published: (2024)
CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering
by: Chen, Baicheng, et al.
Published: (2026)
by: Chen, Baicheng, et al.
Published: (2026)
Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey
by: Kheddar, Hamza
Published: (2024)
by: Kheddar, Hamza
Published: (2024)
Citation: A Key to Building Responsible and Accountable Large Language Models
by: Huang, Jie, et al.
Published: (2023)
by: Huang, Jie, et al.
Published: (2023)
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
by: Cao, Yuanpu, et al.
Published: (2023)
by: Cao, Yuanpu, et al.
Published: (2023)
A Survey of Recent Backdoor Attacks and Defenses in Large Language Models
by: Zhao, Shuai, et al.
Published: (2024)
by: Zhao, Shuai, et al.
Published: (2024)
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
by: Zhao, Shiji, et al.
Published: (2025)
by: Zhao, Shiji, et al.
Published: (2025)
JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
by: Luo, Weidi, et al.
Published: (2024)
by: Luo, Weidi, et al.
Published: (2024)
DMFI: A Dual-Modality Log Analysis Framework for Insider Threat Detection with LoRA-Tuned Language Models
by: Kong, Kaichuan, et al.
Published: (2025)
by: Kong, Kaichuan, et al.
Published: (2025)
REEF: Representation Encoding Fingerprints for Large Language Models
by: Zhang, Jie, et al.
Published: (2024)
by: Zhang, Jie, et al.
Published: (2024)
Resource Consumption Threats in Large Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)
by: Zhang, Yuanhe, et al.
Published: (2026)
Efficient Detection of Toxic Prompts in Large Language Models
by: Liu, Yi, et al.
Published: (2024)
by: Liu, Yi, et al.
Published: (2024)
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
by: Liu, Yanjiang, et al.
Published: (2025)
by: Liu, Yanjiang, et al.
Published: (2025)
Quantifying Association Capabilities of Large Language Models and Its Implications on Privacy Leakage
by: Shao, Hanyin, et al.
Published: (2023)
by: Shao, Hanyin, et al.
Published: (2023)
CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models
by: Zhou, Guanghao, et al.
Published: (2025)
by: Zhou, Guanghao, et al.
Published: (2025)
Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models
by: Huang, Chung-ju, et al.
Published: (2026)
by: Huang, Chung-ju, et al.
Published: (2026)
Prefix Guidance: A Steering Wheel for Large Language Models to Defend Against Jailbreak Attacks
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond
by: Wang, Xuhong, et al.
Published: (2024)
by: Wang, Xuhong, et al.
Published: (2024)
From Thinking to Output: Chain-of-Thought and Text Generation Characteristics in Reasoning Language Models
by: Liu, Junhao, et al.
Published: (2025)
by: Liu, Junhao, et al.
Published: (2025)
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
by: Li, Hao, et al.
Published: (2024)
by: Li, Hao, et al.
Published: (2024)
Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models
by: Chen, Zhuo, et al.
Published: (2024)
by: Chen, Zhuo, et al.
Published: (2024)
Window-based Membership Inference Attacks Against Fine-tuned Large Language Models
by: Chen, Yuetian, et al.
Published: (2026)
by: Chen, Yuetian, et al.
Published: (2026)
Medical MLLM is Vulnerable: Cross-Modality Jailbreak and Mismatched Attacks on Medical Multimodal Large Language Models
by: Huang, Xijie, et al.
Published: (2024)
by: Huang, Xijie, et al.
Published: (2024)
NSmark: Null Space Based Black-box Watermarking Defense Framework for Language Models
by: Zhao, Haodong, et al.
Published: (2024)
by: Zhao, Haodong, et al.
Published: (2024)
Watermarking Language Models for Many Adaptive Users
by: Cohen, Aloni, et al.
Published: (2024)
by: Cohen, Aloni, et al.
Published: (2024)
Imperceptible Jailbreaking against Large Language Models
by: Gao, Kuofeng, et al.
Published: (2025)
by: Gao, Kuofeng, et al.
Published: (2025)
Toward Cybersecurity-Expert Small Language Models
by: Levi, Matan, et al.
Published: (2025)
by: Levi, Matan, et al.
Published: (2025)
Similar Items
-
UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing
by: He, Yifeng, et al.
Published: (2024) -
Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection
by: Chen, Jiaqi, et al.
Published: (2024) -
DUP: Detection-guided Unlearning for Backdoor Purification in Language Models
by: Hu, Man, et al.
Published: (2025) -
Exploring Backdoor Vulnerabilities of Chat Models
by: Hao, Yunzhuo, et al.
Published: (2024) -
Layerwise Convergence Fingerprints for Runtime Misbehavior Detection in Large Language Models
by: Min, Nay Myat, et al.
Published: (2026)