:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Thebaud, Thomas, Lan, Gaël Le, Larcher, Anthony
Format:	Preprint
Published:	2024
Subjects:	Cryptography and Security Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.08918
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
by: Shen, Huanming, et al.
Published: (2025)

A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
by: Kamel, Kamel, et al.
Published: (2025)

DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
by: An, Hyeseon, et al.
Published: (2025)

Experimental Validation of Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous Vehicles
by: Dasgupta, Sagar, et al.
Published: (2024)

Discovering Spoofing Attempts on Language Model Watermarks
by: Gloaguen, Thibaud, et al.
Published: (2024)

GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
by: Mohammadi, Ahmad, et al.
Published: (2025)

Entropy-Synchronized Neural Hashing for Unsupervised Ransomware Detection
by: Idliman, Peter, et al.
Published: (2025)

Reimagining Safety Alignment with An Image
by: Xia, Yifan, et al.
Published: (2025)

Towards Unsupervised Adversarial Document Detection in Retrieval Augmented Generation Systems
by: Levi, Patrick
Published: (2026)

Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
by: Liu, Xuechen, et al.
Published: (2024)

Biometrics in Extended Reality: A Review
by: Agarwal, Ayush, et al.
Published: (2024)

Unsupervised Threat Hunting using Continuous Bag-of-Terms-and-Time (CBoTT)
by: Kayhan, Varol, et al.
Published: (2024)

Adversarial Evasion in Non-Stationary Malware Detection: Minimizing Drift Signals through Similarity-Constrained Perturbations
by: Acharya, Pawan, et al.
Published: (2026)

The Cognitive Firewall:Securing Browser Based AI Agents Against Indirect Prompt Injection Via Hybrid Edge Cloud Defense
by: Lan, Qianlong, et al.
Published: (2026)

Agent Safety Alignment via Reinforcement Learning
by: Sha, Zeyang, et al.
Published: (2025)

EVA: Editing for Versatile Alignment against Jailbreaks
by: Wang, Yi, et al.
Published: (2026)

UK AISI Alignment Evaluation Case-Study
by: Souly, Alexandra, et al.
Published: (2026)

LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors
by: Lan, Tianwei, et al.
Published: (2025)

Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs
by: Ferrand, Jean-Charles Noirot, et al.
Published: (2025)

Measuring Safety Alignment Effects in Autonomous Security Agents
by: David, Isaac, et al.
Published: (2026)

Sequential Behavioral Watermarking for LLM Agents
by: An, Hyeseon, et al.
Published: (2026)

VisuoAlign: Safety Alignment of LVLMs with Multimodal Tree Search
by: Li, MingSheng, et al.
Published: (2025)

FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment
by: Kuznetsov, Daniel, et al.
Published: (2026)

Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning
by: Filho, Sebastião Alves de Jesus, et al.
Published: (2025)

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
by: Shen, Yaling, et al.
Published: (2025)

Ablating Safety: Mechanisms for Removing Alignment in Language Models for Security Applications
by: David, Isaac, et al.
Published: (2026)

Matching Ranks Over Probability Yields Truly Deep Safety Alignment
by: Vega, Jason, et al.
Published: (2025)

Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
by: Campbell, David, et al.
Published: (2026)

PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
by: Li, Nanxi, et al.
Published: (2025)

Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution
by: Shi, Guoxin, et al.
Published: (2026)

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents
by: Luo, Mingyu, et al.
Published: (2026)

SpoofTrackBench: Interpretable AI for Spoof-Aware UAV Tracking and Benchmarking
by: Le, Van, et al.
Published: (2025)

AgentMark: Utility-Preserving Behavioral Watermarking for Agents
by: Huang, Kaibo, et al.
Published: (2026)

Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?
by: Yin, Qingyu, et al.
Published: (2025)

On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment
by: Ball, Sarah, et al.
Published: (2025)

SafeThinker: Reasoning about Risk to Deepen Safety Beyond Shallow Alignment
by: Fang, Xianya, et al.
Published: (2026)

Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
by: Yu, Jeongmin, et al.
Published: (2025)

An Investigation into the Performance of Non-Contrastive Self-Supervised Learning Methods for Network Intrusion Detection
by: Fard, Hamed, et al.
Published: (2025)

Safety Layers in Aligned Large Language Models: The Key to LLM Security
by: Li, Shen, et al.
Published: (2024)

Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace
by: Lan, Qianlong, et al.
Published: (2026)