Saved in:
| Main Authors: | Tang, Kunsheng, Zhou, Wenbo, Zhang, Jie, Liu, Aishan, Deng, Gelei, Li, Shuai, Qi, Peigui, Zhang, Weiming, Zhang, Tianwei, Yu, Nenghai |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.12494 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
by: Qi, Peigui, et al.
Published: (2025)
by: Qi, Peigui, et al.
Published: (2025)
PoseGuard: Pose-Guided Generation with Safety Guardrails
by: Wang, Kongxin, et al.
Published: (2025)
by: Wang, Kongxin, et al.
Published: (2025)
VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts
by: Qi, Peigui, et al.
Published: (2026)
by: Qi, Peigui, et al.
Published: (2026)
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
by: Li, Shuai, et al.
Published: (2023)
by: Li, Shuai, et al.
Published: (2023)
Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures
by: Su, Yanghao, et al.
Published: (2026)
by: Su, Yanghao, et al.
Published: (2026)
State-Dependent Safety Failures in Multi-Turn Language Model Interaction
by: Li, Pengcheng, et al.
Published: (2026)
by: Li, Pengcheng, et al.
Published: (2026)
When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
by: Wang, Cheng, et al.
Published: (2025)
by: Wang, Cheng, et al.
Published: (2025)
Clean Image May be Dangerous: Data Poisoning Attacks Against Deep Hashing
by: Li, Shuai, et al.
Published: (2025)
by: Li, Shuai, et al.
Published: (2025)
InferDPT: Privacy-Preserving Inference for Closed-box Large Language Model
by: Tong, Meng, et al.
Published: (2023)
by: Tong, Meng, et al.
Published: (2023)
Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models
by: Wu, Jialin, et al.
Published: (2026)
by: Wu, Jialin, et al.
Published: (2026)
FaceTracer: Unveiling Source Identities from Swapped Face Images and Videos for Fraud Prevention
by: Zhang, Zhongyi, et al.
Published: (2024)
by: Zhang, Zhongyi, et al.
Published: (2024)
IRCopilot: Automated Incident Response with Large Language Models
by: Lin, Xihuan, et al.
Published: (2025)
by: Lin, Xihuan, et al.
Published: (2025)
Model X-ray:Detecting Backdoored Models via Decision Boundary
by: Su, Yanghao, et al.
Published: (2024)
by: Su, Yanghao, et al.
Published: (2024)
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
by: Feng, Weitao, et al.
Published: (2024)
by: Feng, Weitao, et al.
Published: (2024)
BURN: Backdoor Unlearning via Adversarial Boundary Analysis
by: Su, Yanghao, et al.
Published: (2025)
by: Su, Yanghao, et al.
Published: (2025)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing
by: Xiao, Yisong, et al.
Published: (2024)
by: Xiao, Yisong, et al.
Published: (2024)
When Search Goes Wrong: Red-Teaming Web-Augmented Large Language Models
by: Ou, Haoran, et al.
Published: (2025)
by: Ou, Haoran, et al.
Published: (2025)
Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges
by: Yang, Xianglin, et al.
Published: (2026)
by: Yang, Xianglin, et al.
Published: (2026)
EditMark: Watermarking Large Language Models based on Model Editing
by: Li, Shuai, et al.
Published: (2025)
by: Li, Shuai, et al.
Published: (2025)
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
by: Zhang, Tao, et al.
Published: (2024)
by: Zhang, Tao, et al.
Published: (2024)
SQL Injection Jailbreak: A Structural Disaster of Large Language Models
by: Zhao, Jiawei, et al.
Published: (2024)
by: Zhao, Jiawei, et al.
Published: (2024)
Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
by: Deng, Gelei, et al.
Published: (2024)
by: Deng, Gelei, et al.
Published: (2024)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models
by: Zhao, Jiawei, et al.
Published: (2023)
by: Zhao, Jiawei, et al.
Published: (2023)
Assessing Gender and Racial Bias in Large Language Model‐Powered Virtual Reference
by: Jieli Liu, et al.
Published: (2024)
by: Jieli Liu, et al.
Published: (2024)
Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
by: Xu, Rongwu, et al.
Published: (2024)
by: Xu, Rongwu, et al.
Published: (2024)
MES-RAG: Bringing Multi-modal, Entity-Storage, and Secure Enhancements to RAG
by: Wu, Pingyu, et al.
Published: (2025)
by: Wu, Pingyu, et al.
Published: (2025)
Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems
by: Zhang, Jie, et al.
Published: (2025)
by: Zhang, Jie, et al.
Published: (2025)
Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
by: Cui, Chenhang, et al.
Published: (2024)
by: Cui, Chenhang, et al.
Published: (2024)
BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models
by: Shang, Xiuwei, et al.
Published: (2025)
by: Shang, Xiuwei, et al.
Published: (2025)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora
by: Derner, Erik, et al.
Published: (2024)
by: Derner, Erik, et al.
Published: (2024)
Evaluating Gender Bias in Large Language Models
by: Döll, Michael, et al.
Published: (2024)
by: Döll, Michael, et al.
Published: (2024)
Correction to “Assessing Gender and Racial Bias in Large Language Model‐Powered Virtual Reference”
Published: (2025)
Published: (2025)
Beyond Retrieval: Improving Evidence Quality for LLM-based Multimodal Fact-Checking
by: Ou, Haoran, et al.
Published: (2025)
by: Ou, Haoran, et al.
Published: (2025)
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
by: Deng, Gelei, et al.
Published: (2023)
by: Deng, Gelei, et al.
Published: (2023)
Mind Your HEARTBEAT! Claw Background Execution Inherently Enables Silent Memory Pollution
by: Zhang, Yechao, et al.
Published: (2026)
by: Zhang, Yechao, et al.
Published: (2026)
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
by: Zhang, Yunqi, et al.
Published: (2024)
by: Zhang, Yunqi, et al.
Published: (2024)
Gender-Neutral Large Language Models for Medical Applications: Reducing Bias in PubMed Abstracts
by: Schaefer, Elizabeth, et al.
Published: (2025)
by: Schaefer, Elizabeth, et al.
Published: (2025)
DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems
by: Ou, Haoran, et al.
Published: (2026)
by: Ou, Haoran, et al.
Published: (2026)
In-Contextual Gender Bias Suppression for Large Language Models
by: Oba, Daisuke, et al.
Published: (2023)
by: Oba, Daisuke, et al.
Published: (2023)
Locating and Mitigating Gender Bias in Large Language Models
by: Cai, Yuchen, et al.
Published: (2024)
by: Cai, Yuchen, et al.
Published: (2024)
Similar Items
-
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
by: Qi, Peigui, et al.
Published: (2025) -
PoseGuard: Pose-Guided Generation with Safety Guardrails
by: Wang, Kongxin, et al.
Published: (2025) -
VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts
by: Qi, Peigui, et al.
Published: (2026) -
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
by: Li, Shuai, et al.
Published: (2023) -
Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures
by: Su, Yanghao, et al.
Published: (2026)