Saved in:
| Main Authors: | Xiong, Chen, Chen, Pin-Yu, Ho, Tsung-Yi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.00781 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024)
by: Tang, Yung-Chen, et al.
Published: (2024)
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
by: Li, Zaitang, et al.
Published: (2024)
by: Li, Zaitang, et al.
Published: (2024)
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
by: Xiong, Chen, et al.
Published: (2026)
by: Xiong, Chen, et al.
Published: (2026)
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
by: Xiong, Chen, et al.
Published: (2025)
by: Xiong, Chen, et al.
Published: (2025)
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models
by: Li, Zaitang, et al.
Published: (2023)
by: Li, Zaitang, et al.
Published: (2023)
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
by: Hu, Xiaomeng, et al.
Published: (2025)
by: Hu, Xiaomeng, et al.
Published: (2025)
Chain-of-Programming (CoP) : Empowering Large Language Models for Geospatial Code Generation
by: Hou, Shuyang, et al.
Published: (2024)
by: Hou, Shuyang, et al.
Published: (2024)
NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks
by: Lan, Yi-Shan, et al.
Published: (2024)
by: Lan, Yi-Shan, et al.
Published: (2024)
Curiosity-driven Red-teaming for Large Language Models
by: Hong, Zhang-Wei, et al.
Published: (2024)
by: Hong, Zhang-Wei, et al.
Published: (2024)
Optimization-Free Universal Watermark Forgery with Regenerative Diffusion Models
by: Zhu, Chaoyi, et al.
Published: (2025)
by: Zhu, Chaoyi, et al.
Published: (2025)
OrgAgent: Organize Your Multi-Agent System like a Company
by: Wang, Yiru, et al.
Published: (2026)
by: Wang, Yiru, et al.
Published: (2026)
Atoxia: Red-teaming Large Language Models with Target Toxic Answers
by: Du, Yuhao, et al.
Published: (2024)
by: Du, Yuhao, et al.
Published: (2024)
Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance?
by: Prandi, Matteo, et al.
Published: (2025)
by: Prandi, Matteo, et al.
Published: (2025)
PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
by: Zou, Lancheng, et al.
Published: (2025)
by: Zou, Lancheng, et al.
Published: (2025)
Red-teaming Activation Probes using Prompted LLMs
by: Blandfort, Phil, et al.
Published: (2025)
by: Blandfort, Phil, et al.
Published: (2025)
Fundamental Safety-Capability Trade-offs in Fine-tuning Large Language Models
by: Chen, Pin-Yu, et al.
Published: (2025)
by: Chen, Pin-Yu, et al.
Published: (2025)
Duwak: Dual Watermarks in Large Language Models
by: Zhu, Chaoyi, et al.
Published: (2024)
by: Zhu, Chaoyi, et al.
Published: (2024)
ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users
by: Li, Guanlin, et al.
Published: (2024)
by: Li, Guanlin, et al.
Published: (2024)
Unraveling the cognitive patterns of Large Language Models through module communities
by: Bhandari, Kushal Raj, et al.
Published: (2025)
by: Bhandari, Kushal Raj, et al.
Published: (2025)
Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
by: Lee, Kuan-Yi, et al.
Published: (2025)
by: Lee, Kuan-Yi, et al.
Published: (2025)
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
by: Hu, Xiaomeng, et al.
Published: (2024)
by: Hu, Xiaomeng, et al.
Published: (2024)
Red-teaming the Multimodal Reasoning: Jailbreaking Vision-Language Models via Cross-modal Entanglement Attacks
by: Yan, Yu, et al.
Published: (2026)
by: Yan, Yu, et al.
Published: (2026)
Electronic Circuit Principles of Large Language Models
by: Chen, Qiguang, et al.
Published: (2025)
by: Chen, Qiguang, et al.
Published: (2025)
KCLNet: Electrically Equivalence-Oriented Graph Representation Learning for Analog Circuits
by: Xu, Peng, et al.
Published: (2026)
by: Xu, Peng, et al.
Published: (2026)
P-Aligner: Enabling Pre-Alignment of Language Models via Principled Instruction Synthesis
by: Song, Feifan, et al.
Published: (2025)
by: Song, Feifan, et al.
Published: (2025)
D2S-FLOW: Automated Parameter Extraction from Datasheets for SPICE Model Generation Using Large Language Models
by: Chen, Hong Cai, et al.
Published: (2025)
by: Chen, Hong Cai, et al.
Published: (2025)
Patching LLM Like Software: A Lightweight Method for Improving Safety Policy in Large Language Models
by: Arif, Huzaifa, et al.
Published: (2025)
by: Arif, Huzaifa, et al.
Published: (2025)
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
by: Ying, Yuchen, et al.
Published: (2026)
by: Ying, Yuchen, et al.
Published: (2026)
LLM Agents Should Employ Security Principles
by: Zhang, Kaiyuan, et al.
Published: (2025)
by: Zhang, Kaiyuan, et al.
Published: (2025)
AgenticRed: Evolving Agentic Systems for Red-Teaming
by: Yuan, Jiayi, et al.
Published: (2026)
by: Yuan, Jiayi, et al.
Published: (2026)
Computational Safety for Generative AI: A Signal Processing Perspective
by: Chen, Pin-Yu
Published: (2025)
by: Chen, Pin-Yu
Published: (2025)
Agentic Reasoning for Large Language Models
by: Wei, Tianxin, et al.
Published: (2026)
by: Wei, Tianxin, et al.
Published: (2026)
Red Teaming Large Reasoning Models
by: Chen, Jiawei, et al.
Published: (2025)
by: Chen, Jiawei, et al.
Published: (2025)
SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning
by: Jia, Furong, et al.
Published: (2026)
by: Jia, Furong, et al.
Published: (2026)
Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis
by: Jiao, Zhengbo, et al.
Published: (2026)
by: Jiao, Zhengbo, et al.
Published: (2026)
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift
by: An, Shengwei, et al.
Published: (2023)
by: An, Shengwei, et al.
Published: (2023)
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
by: Yu, Jiahao, et al.
Published: (2023)
by: Yu, Jiahao, et al.
Published: (2023)
CoDA: Agentic Systems for Collaborative Data Visualization
by: Chen, Zichen, et al.
Published: (2025)
by: Chen, Zichen, et al.
Published: (2025)
Active Attacks: Red-teaming LLMs via Adaptive Environments
by: Yun, Taeyoung, et al.
Published: (2025)
by: Yun, Taeyoung, et al.
Published: (2025)
Similar Items
-
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024) -
Retention Score: Quantifying Jailbreak Risks for Vision Language Models
by: Li, Zaitang, et al.
Published: (2024) -
Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models
by: Xiong, Chen, et al.
Published: (2026) -
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
by: Hu, Xiaomeng, et al.
Published: (2024) -
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
by: Xiong, Chen, et al.
Published: (2025)