Saved in:
| Main Authors: | García-Carrasco, Jorge, Maté, Alejandro, Trujillo, Juan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.19842 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
by: Liu, Frank Weizhen, et al.
Published: (2024)
by: Liu, Frank Weizhen, et al.
Published: (2024)
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
by: Zhang, Anqi, et al.
Published: (2024)
by: Zhang, Anqi, et al.
Published: (2024)
Publicly-Detectable Watermarking for Language Models
by: Fairoze, Jaiden, et al.
Published: (2023)
by: Fairoze, Jaiden, et al.
Published: (2023)
How Vulnerable Are Edge LLMs?
by: Ding, Ao, et al.
Published: (2026)
by: Ding, Ao, et al.
Published: (2026)
Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting
by: Liu, Fuqiang, et al.
Published: (2024)
by: Liu, Fuqiang, et al.
Published: (2024)
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
by: Zhuo, Terry Yue, et al.
Published: (2025)
by: Zhuo, Terry Yue, et al.
Published: (2025)
Detecting Pretraining Data from Large Language Models
by: Shi, Weijia, et al.
Published: (2023)
by: Shi, Weijia, et al.
Published: (2023)
In Vino Veritas and Vulnerabilities: Examining LLM Safety via Drunk Language Inducement
by: Shetty, Anudeex, et al.
Published: (2026)
by: Shetty, Anudeex, et al.
Published: (2026)
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
by: Xu, Jiashu, et al.
Published: (2023)
by: Xu, Jiashu, et al.
Published: (2023)
What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference
by: Fan, Mingyuan, et al.
Published: (2026)
by: Fan, Mingyuan, et al.
Published: (2026)
The Hidden Cost of Modeling P(X): Vulnerability to Membership Inference Attacks in Generative Text Classifiers
by: Makroo, Owais, et al.
Published: (2025)
by: Makroo, Owais, et al.
Published: (2025)
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
by: Sivapiromrat, Sanhanat, et al.
Published: (2025)
by: Sivapiromrat, Sanhanat, et al.
Published: (2025)
Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities
by: Krishna, Arjun, et al.
Published: (2025)
by: Krishna, Arjun, et al.
Published: (2025)
Interpreting the Repeated Token Phenomenon in Large Language Models
by: Yona, Itay, et al.
Published: (2025)
by: Yona, Itay, et al.
Published: (2025)
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
by: Huo, Mingjia, et al.
Published: (2024)
by: Huo, Mingjia, et al.
Published: (2024)
Digger: Detecting Copyright Content Mis-usage in Large Language Model Training
by: Li, Haodong, et al.
Published: (2024)
by: Li, Haodong, et al.
Published: (2024)
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
by: Price, Sara, et al.
Published: (2024)
by: Price, Sara, et al.
Published: (2024)
Copyright-Protected Language Generation via Adaptive Model Fusion
by: Abad, Javier, et al.
Published: (2024)
by: Abad, Javier, et al.
Published: (2024)
Hijacking Large Language Models via Adversarial In-Context Learning
by: Zhou, Xiangyu, et al.
Published: (2023)
by: Zhou, Xiangyu, et al.
Published: (2023)
Every Character Counts: From Vulnerability to Defense in Phishing Detection
by: Chiper, Maria, et al.
Published: (2025)
by: Chiper, Maria, et al.
Published: (2025)
How does GPT-2 Predict Acronyms? Extracting and Understanding a Circuit via Mechanistic Interpretability
by: García-Carrasco, Jorge, et al.
Published: (2024)
by: García-Carrasco, Jorge, et al.
Published: (2024)
Cross-Entropy Attacks to Language Models via Rare Event Simulation
by: Ni, Mingze, et al.
Published: (2025)
by: Ni, Mingze, et al.
Published: (2025)
SecureNet: A Comparative Study of DeBERTa and Large Language Models for Phishing Detection
by: Mahendru, Sakshi, et al.
Published: (2024)
by: Mahendru, Sakshi, et al.
Published: (2024)
An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks
by: Boreiko, Valentyn, et al.
Published: (2024)
by: Boreiko, Valentyn, et al.
Published: (2024)
WET: Overcoming Paraphrasing Vulnerabilities in Embeddings-as-a-Service with Linear Transformation Watermarks
by: Shetty, Anudeex, et al.
Published: (2024)
by: Shetty, Anudeex, et al.
Published: (2024)
Watermarking Language Models through Language Models
by: Dasgupta, Agnibh, et al.
Published: (2024)
by: Dasgupta, Agnibh, et al.
Published: (2024)
Detecting Training Data of Large Language Models via Expectation Maximization
by: Kim, Gyuwan, et al.
Published: (2024)
by: Kim, Gyuwan, et al.
Published: (2024)
Conti Inc.: Understanding the Internal Discussions of a large Ransomware-as-a-Service Operator with Machine Learning
by: Ruellan, Estelle, et al.
Published: (2023)
by: Ruellan, Estelle, et al.
Published: (2023)
Tracing Privacy Leakage of Language Models to Training Data via Adjusted Influence Functions
by: Liu, Jinxin, et al.
Published: (2024)
by: Liu, Jinxin, et al.
Published: (2024)
Robust and Secure Code Watermarking for Large Language Models via ML/Crypto Codesign
by: Zhang, Ruisi, et al.
Published: (2025)
by: Zhang, Ruisi, et al.
Published: (2025)
Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation
by: Guo, Wenkai, et al.
Published: (2025)
by: Guo, Wenkai, et al.
Published: (2025)
Time Will Tell: Timing Side Channels via Output Token Count in Large Language Models
by: Zhang, Tianchen, et al.
Published: (2024)
by: Zhang, Tianchen, et al.
Published: (2024)
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
by: Geng, Jiahui, et al.
Published: (2025)
by: Geng, Jiahui, et al.
Published: (2025)
Empirical Analysis of Large Vision-Language Models against Goal Hijacking via Visual Prompt Injection
by: Kimura, Subaru, et al.
Published: (2024)
by: Kimura, Subaru, et al.
Published: (2024)
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration
by: Fu, Wenjie, et al.
Published: (2023)
by: Fu, Wenjie, et al.
Published: (2023)
Model Provenance Testing for Large Language Models
by: Nikolic, Ivica, et al.
Published: (2025)
by: Nikolic, Ivica, et al.
Published: (2025)
Towards the Anonymization of the Language Modeling
by: Boutet, Antoine, et al.
Published: (2025)
by: Boutet, Antoine, et al.
Published: (2025)
On the Learnability of Watermarks for Language Models
by: Gu, Chenchen, et al.
Published: (2023)
by: Gu, Chenchen, et al.
Published: (2023)
Exposing the Systematic Vulnerability of Open-Weight Models to Prefill Attacks
by: Struppek, Lukas, et al.
Published: (2026)
by: Struppek, Lukas, et al.
Published: (2026)
Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking
by: Xu, Yijie, et al.
Published: (2025)
by: Xu, Yijie, et al.
Published: (2025)
Similar Items
-
Exploring Vulnerabilities and Protections in Large Language Models: A Survey
by: Liu, Frank Weizhen, et al.
Published: (2024) -
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
by: Zhang, Anqi, et al.
Published: (2024) -
Publicly-Detectable Watermarking for Language Models
by: Fairoze, Jaiden, et al.
Published: (2023) -
How Vulnerable Are Edge LLMs?
by: Ding, Ao, et al.
Published: (2026) -
Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting
by: Liu, Fuqiang, et al.
Published: (2024)