Saved in:
| Main Authors: | Chi, Jianfeng, Karn, Ujjwal, Zhan, Hongyuan, Smith, Eric, Rando, Javier, Zhang, Yiming, Plawiak, Kate, Coudert, Zacharie Delpierre, Upasani, Kartikeya, Pasupuleti, Mahesh |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.10414 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations
by: Fedorov, Igor, et al.
Published: (2024)
by: Fedorov, Igor, et al.
Published: (2024)
Backtracking Improves Generation Safety
by: Zhang, Yiming, et al.
Published: (2024)
by: Zhang, Yiming, et al.
Published: (2024)
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
by: Zhang, Jingyu, et al.
Published: (2025)
by: Zhang, Jingyu, et al.
Published: (2025)
Self-Guard: Empower the LLM to Safeguard Itself
by: Wang, Zezhong, et al.
Published: (2023)
by: Wang, Zezhong, et al.
Published: (2023)
Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles
by: Bugaud, Zacharie
Published: (2026)
by: Bugaud, Zacharie
Published: (2026)
ProGuard: Towards Proactive Multimodal Safeguard
by: Yu, Shaohan, et al.
Published: (2025)
by: Yu, Shaohan, et al.
Published: (2025)
LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
by: Helff, Lukas, et al.
Published: (2024)
by: Helff, Lukas, et al.
Published: (2024)
GLiGuard: Schema-Conditioned Classification for LLM Safeguard
by: Zaratiana, Urchade, et al.
Published: (2026)
by: Zaratiana, Urchade, et al.
Published: (2026)
SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia
by: Tasawong, Panuthep, et al.
Published: (2026)
by: Tasawong, Panuthep, et al.
Published: (2026)
VLM-Guard: Safeguarding Vision-Language Models via Fulfilling Safety Alignment Gap
by: Liu, Qin, et al.
Published: (2025)
by: Liu, Qin, et al.
Published: (2025)
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
by: Chehbouni, Khaoula, et al.
Published: (2024)
by: Chehbouni, Khaoula, et al.
Published: (2024)
GrandGuard: Taxonomy, Benchmark, and Safeguards for Elderly-Chatbot Interaction Safety
by: Fan, Changxuan, et al.
Published: (2026)
by: Fan, Changxuan, et al.
Published: (2026)
The Llama 3 Herd of Models
by: Grattafiori, Aaron, et al.
Published: (2024)
by: Grattafiori, Aaron, et al.
Published: (2024)
GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors
by: Zeng, Yaopei, et al.
Published: (2025)
by: Zeng, Yaopei, et al.
Published: (2025)
Large Reasoning Models Learn Better Alignment from Flawed Thinking
by: Peng, ShengYun, et al.
Published: (2025)
by: Peng, ShengYun, et al.
Published: (2025)
RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting
by: Jiang, Yilei, et al.
Published: (2024)
by: Jiang, Yilei, et al.
Published: (2024)
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
by: Wang, Yuquan, et al.
Published: (2025)
by: Wang, Yuquan, et al.
Published: (2025)
HomeGuard: VLM-based Embodied Safeguard for Identifying Contextual Risk in Household Task
by: Lu, Xiaoya, et al.
Published: (2026)
by: Lu, Xiaoya, et al.
Published: (2026)
TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems
by: Wang, Kai, et al.
Published: (2026)
by: Wang, Kai, et al.
Published: (2026)
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
by: Li, Qinfeng, et al.
Published: (2024)
by: Li, Qinfeng, et al.
Published: (2024)
ChocoLlama: Lessons Learned From Teaching Llamas Dutch
by: Meeus, Matthieu, et al.
Published: (2024)
by: Meeus, Matthieu, et al.
Published: (2024)
NegVQA: Can Vision Language Models Understand Negation?
by: Zhang, Yuhui, et al.
Published: (2025)
by: Zhang, Yuhui, et al.
Published: (2025)
BayTTA: Uncertainty-aware medical image classification with optimized test-time augmentation using Bayesian model averaging
by: Sherkatghanad, Zeinab, et al.
Published: (2024)
by: Sherkatghanad, Zeinab, et al.
Published: (2024)
Llama-Mob: Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction
by: Tang, Peizhi, et al.
Published: (2024)
by: Tang, Peizhi, et al.
Published: (2024)
FaceSwapGuard: Safeguarding Facial Privacy from DeepFake Threats through Identity Obfuscation
by: Wang, Li, et al.
Published: (2025)
by: Wang, Li, et al.
Published: (2025)
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs
by: Niu, Jingcheng, et al.
Published: (2025)
by: Niu, Jingcheng, et al.
Published: (2025)
Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder
by: Yang, Xianjun, et al.
Published: (2025)
by: Yang, Xianjun, et al.
Published: (2025)
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering
by: Ha, Cuong Nhat, et al.
Published: (2024)
by: Ha, Cuong Nhat, et al.
Published: (2024)
MGH Radiology Llama: A Llama 3 70B Model for Radiology
by: Shi, Yucheng, et al.
Published: (2024)
by: Shi, Yucheng, et al.
Published: (2024)
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
by: Thawakar, Omkar, et al.
Published: (2024)
by: Thawakar, Omkar, et al.
Published: (2024)
SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems
by: Shan, Wenliang, et al.
Published: (2025)
by: Shan, Wenliang, et al.
Published: (2025)
Internal Activation Revision: Safeguarding Vision Language Models Without Parameter Update
by: Li, Qing, et al.
Published: (2025)
by: Li, Qing, et al.
Published: (2025)
Creating Open Source Conversation
by: Sheehan, Kate
Published: (2009)
by: Sheehan, Kate
Published: (2009)
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
by: Gade, Pranav, et al.
Published: (2023)
by: Gade, Pranav, et al.
Published: (2023)
VisionGuard: Synergistic Framework for Helmet Violation Detection
by: Nguyen, Lam-Huy, et al.
Published: (2025)
by: Nguyen, Lam-Huy, et al.
Published: (2025)
Towards Better Health Conversations: The Benefits of Context-seeking
by: Sayres, Rory, et al.
Published: (2025)
by: Sayres, Rory, et al.
Published: (2025)
Guarding Terrains with Guards on a Line
by: Kang, Byeonguk, et al.
Published: (2025)
by: Kang, Byeonguk, et al.
Published: (2025)
Enhance Vision-Language Alignment with Noise
by: Huang, Sida, et al.
Published: (2024)
by: Huang, Sida, et al.
Published: (2024)
Lattice‐Based Public Auditing Schemes for Cloud Storage Security: A Comprehensive Survey
by: Renuka Cheeturi, et al.
Published: (2026)
by: Renuka Cheeturi, et al.
Published: (2026)
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
by: Ngong, Ivoline, et al.
Published: (2025)
by: Ngong, Ivoline, et al.
Published: (2025)
Similar Items
-
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations
by: Fedorov, Igor, et al.
Published: (2024) -
Backtracking Improves Generation Safety
by: Zhang, Yiming, et al.
Published: (2024) -
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
by: Zhang, Jingyu, et al.
Published: (2025) -
Self-Guard: Empower the LLM to Safeguard Itself
by: Wang, Zezhong, et al.
Published: (2023) -
Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles
by: Bugaud, Zacharie
Published: (2026)