Guardado en:
| Autores principales: | Li, Zaitang, Chen, Pin-Yu, Ho, Tsung-Yi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2412.17544 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
por: Li, Zaitang, et al.
Publicado: (2023)
por: Li, Zaitang, et al.
Publicado: (2023)
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
por: Xiong, Chen, et al.
Publicado: (2026)
por: Xiong, Chen, et al.
Publicado: (2026)
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
por: Hu, Xiaomeng, et al.
Publicado: (2024)
por: Hu, Xiaomeng, et al.
Publicado: (2024)
Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models
por: Zhu, Chaoyi, et al.
Publicado: (2025)
por: Zhu, Chaoyi, et al.
Publicado: (2025)
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
por: Hu, Xiaomeng, et al.
Publicado: (2025)
por: Hu, Xiaomeng, et al.
Publicado: (2025)
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
por: Xiong, Chen, et al.
Publicado: (2025)
por: Xiong, Chen, et al.
Publicado: (2025)
Defining and Evaluating Physical Safety for Large Language Models
por: Tang, Yung-Chen, et al.
Publicado: (2024)
por: Tang, Yung-Chen, et al.
Publicado: (2024)
NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks
por: Lan, Yi-Shan, et al.
Publicado: (2024)
por: Lan, Yi-Shan, et al.
Publicado: (2024)
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
por: Xiong, Chen, et al.
Publicado: (2025)
por: Xiong, Chen, et al.
Publicado: (2025)
Rethinking Jailbreak Detection of Large Vision Language Models with Representational Contrastive Scoring
por: Hua, Peichun, et al.
Publicado: (2025)
por: Hua, Peichun, et al.
Publicado: (2025)
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
por: Hu, Xiaomeng, et al.
Publicado: (2024)
por: Hu, Xiaomeng, et al.
Publicado: (2024)
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
por: Nian, Yi, et al.
Publicado: (2025)
por: Nian, Yi, et al.
Publicado: (2025)
OrgAgent: Organize Your Multi-Agent System like a Company
por: Wang, Yiru, et al.
Publicado: (2026)
por: Wang, Yiru, et al.
Publicado: (2026)
TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models
por: Yin, Xiangyu, et al.
Publicado: (2025)
por: Yin, Xiangyu, et al.
Publicado: (2025)
Prefill-level Jailbreak: A Black-Box Risk Analysis of Large Language Models
por: Li, Yakai, et al.
Publicado: (2025)
por: Li, Yakai, et al.
Publicado: (2025)
Jailbreaking Large Vision Language Models in Intelligent Transportation Systems
por: Das, Badhan Chandra, et al.
Publicado: (2025)
por: Das, Badhan Chandra, et al.
Publicado: (2025)
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
por: Wang, Ruofan, et al.
Publicado: (2024)
por: Wang, Ruofan, et al.
Publicado: (2024)
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
por: Zou, Lancheng, et al.
Publicado: (2025)
por: Zou, Lancheng, et al.
Publicado: (2025)
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
por: Tao, Xijia, et al.
Publicado: (2024)
por: Tao, Xijia, et al.
Publicado: (2024)
Jailbreaking Vision-Language Models Through the Visual Modality
por: Azulay, Aharon, et al.
Publicado: (2026)
por: Azulay, Aharon, et al.
Publicado: (2026)
Differentiable Prompt Learning for Vision Language Models
por: Huang, Zhenhan, et al.
Publicado: (2024)
por: Huang, Zhenhan, et al.
Publicado: (2024)
The Cost of Thinking: Increased Jailbreak Risk in Large Language Models
por: Yang, Fan
Publicado: (2025)
por: Yang, Fan
Publicado: (2025)
White-box Multimodal Jailbreaks Against Large Vision-Language Models
por: Wang, Ruofan, et al.
Publicado: (2024)
por: Wang, Ruofan, et al.
Publicado: (2024)
KCLNet: Electrically Equivalence-Oriented Graph Representation Learning for Analog Circuits
por: Xu, Peng, et al.
Publicado: (2026)
por: Xu, Peng, et al.
Publicado: (2026)
Jailbreaks on Vision Language Model via Multimodal Reasoning
por: Noheria, Aarush, et al.
Publicado: (2026)
por: Noheria, Aarush, et al.
Publicado: (2026)
Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks
por: Yan, Yu, et al.
Publicado: (2026)
por: Yan, Yu, et al.
Publicado: (2026)
Text is All You Need for Vision-Language Model Jailbreaking
por: Chen, Yihang, et al.
Publicado: (2026)
por: Chen, Yihang, et al.
Publicado: (2026)
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
por: Cui, Kaiyuan, et al.
Publicado: (2026)
por: Cui, Kaiyuan, et al.
Publicado: (2026)
A Cross-Language Investigation into Jailbreak Attacks in Large Language Models
por: Li, Jie, et al.
Publicado: (2024)
por: Li, Jie, et al.
Publicado: (2024)
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
por: Teng, Ma, et al.
Publicado: (2024)
por: Teng, Ma, et al.
Publicado: (2024)
Playing Language Game with LLMs Leads to Jailbreaking
por: Peng, Yu, et al.
Publicado: (2024)
por: Peng, Yu, et al.
Publicado: (2024)
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
por: Zhao, Yunhan, et al.
Publicado: (2024)
por: Zhao, Yunhan, et al.
Publicado: (2024)
Knowledge-to-Jailbreak: Investigating Knowledge-driven Jailbreaking Attacks for Large Language Models
por: Tu, Shangqing, et al.
Publicado: (2024)
por: Tu, Shangqing, et al.
Publicado: (2024)
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
por: Song, Zirui, et al.
Publicado: (2025)
por: Song, Zirui, et al.
Publicado: (2025)
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification
por: Wan, Yuxuan, et al.
Publicado: (2026)
por: Wan, Yuxuan, et al.
Publicado: (2026)
Depth Charge: Jailbreak Large Language Models from Deep Safety Attention Heads
por: Wu, Jinman, et al.
Publicado: (2026)
por: Wu, Jinman, et al.
Publicado: (2026)
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
por: Zhou, Weikang, et al.
Publicado: (2024)
por: Zhou, Weikang, et al.
Publicado: (2024)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
por: Lee, Kuan-Yi, et al.
Publicado: (2025)
por: Lee, Kuan-Yi, et al.
Publicado: (2025)
A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
por: Xu, Zihao, et al.
Publicado: (2024)
por: Xu, Zihao, et al.
Publicado: (2024)
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
por: Liang, Shuang, et al.
Publicado: (2025)
por: Liang, Shuang, et al.
Publicado: (2025)
Ejemplares similares
-
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
por: Li, Zaitang, et al.
Publicado: (2023) -
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
por: Xiong, Chen, et al.
Publicado: (2026) -
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
por: Hu, Xiaomeng, et al.
Publicado: (2024) -
Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models
por: Zhu, Chaoyi, et al.
Publicado: (2025) -
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
por: Hu, Xiaomeng, et al.
Publicado: (2025)