Saved in:
| Main Authors: | Kezins, Nikita, Ekka, Urbas, Berrang, Pascal, Arnaboldi, Luca |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.10901 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models
by: Yin, Jianyao, et al.
Published: (2025)
by: Yin, Jianyao, et al.
Published: (2025)
Automatic LLM Red Teaming
by: Belaire, Roman, et al.
Published: (2025)
by: Belaire, Roman, et al.
Published: (2025)
PAC-Bayesian Generalization Guarantees for Fairness on Stochastic and Deterministic Classifiers
by: Bastian, Julien, et al.
Published: (2026)
by: Bastian, Julien, et al.
Published: (2026)
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
by: Arnaboldi, Luca, et al.
Published: (2024)
by: Arnaboldi, Luca, et al.
Published: (2024)
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD
by: Arnaboldi, Luca, et al.
Published: (2023)
by: Arnaboldi, Luca, et al.
Published: (2023)
Link Stealing Attacks Against Inductive Graph Neural Networks
by: Wu, Yixin, et al.
Published: (2024)
by: Wu, Yixin, et al.
Published: (2024)
Safe LLM-Controlled Robots with Formal Guarantees via Reachability Analysis
by: Hafez, Ahmad, et al.
Published: (2025)
by: Hafez, Ahmad, et al.
Published: (2025)
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
by: Sharma, Mrinank, et al.
Published: (2025)
by: Sharma, Mrinank, et al.
Published: (2025)
Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents
by: He, Pengfei, et al.
Published: (2026)
by: He, Pengfei, et al.
Published: (2026)
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
by: Arnaboldi, Luca, et al.
Published: (2024)
by: Arnaboldi, Luca, et al.
Published: (2024)
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
by: Dandi, Yatin, et al.
Published: (2024)
by: Dandi, Yatin, et al.
Published: (2024)
Beyond Benchmarks: Dynamic, Automatic And Systematic Red-Teaming Agents For Trustworthy Medical Language Models
by: Pan, Jiazhen, et al.
Published: (2025)
by: Pan, Jiazhen, et al.
Published: (2025)
Guardrails in Logit Space: Safety Token Regularization for LLM Alignment
by: Bach, Thong, et al.
Published: (2026)
by: Bach, Thong, et al.
Published: (2026)
Adaptive Instruction Composition for Automated LLM Red-Teaming
by: Zymet, Jesse, et al.
Published: (2026)
by: Zymet, Jesse, et al.
Published: (2026)
LLM-Assisted Red Teaming of Diffusion Models through "Failures Are Fated, But Can Be Faded"
by: Sagar, Som, et al.
Published: (2024)
by: Sagar, Som, et al.
Published: (2024)
OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents
by: Li, Xinyu, et al.
Published: (2026)
by: Li, Xinyu, et al.
Published: (2026)
MAD-MAX: Modular And Diverse Malicious Attack MiXtures for Automated LLM Red Teaming
by: Schoepf, Stefan, et al.
Published: (2025)
by: Schoepf, Stefan, et al.
Published: (2025)
Capability-Based Scaling Trends for LLM-Based Red-Teaming
by: Panfilov, Alexander, et al.
Published: (2025)
by: Panfilov, Alexander, et al.
Published: (2025)
Abstractive Red-Teaming of Language Model Character
by: Rahn, Nate, et al.
Published: (2026)
by: Rahn, Nate, et al.
Published: (2026)
Adversarial Robustness Guarantees for Quantum Classifiers
by: Dowling, Neil, et al.
Published: (2024)
by: Dowling, Neil, et al.
Published: (2024)
Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance
by: Kwon, Minchan, et al.
Published: (2026)
by: Kwon, Minchan, et al.
Published: (2026)
Geometric Red-Teaming for Robotic Manipulation
by: Goel, Divyam, et al.
Published: (2025)
by: Goel, Divyam, et al.
Published: (2025)
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
by: Taheri, Hossein, et al.
Published: (2024)
by: Taheri, Hossein, et al.
Published: (2024)
From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
by: Sinha, Anusha, et al.
Published: (2025)
by: Sinha, Anusha, et al.
Published: (2025)
Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning
by: Dandi, Yatin, et al.
Published: (2026)
by: Dandi, Yatin, et al.
Published: (2026)
Asymptotics of SGD in Sequence-Single Index Models and Single-Layer Attention Networks
by: Arnaboldi, Luca, et al.
Published: (2025)
by: Arnaboldi, Luca, et al.
Published: (2025)
RedRFT: A Light-Weight Benchmark for Reinforcement Fine-Tuning-Based Red Teaming
by: Zheng, Xiang, et al.
Published: (2025)
by: Zheng, Xiang, et al.
Published: (2025)
Explainable Clustering Beyond Worst-Case Guarantees
by: Fleissner, Maximilian, et al.
Published: (2024)
by: Fleissner, Maximilian, et al.
Published: (2024)
Interpretability Guarantees with Merlin-Arthur Classifiers
by: Wäldchen, Stephan, et al.
Published: (2022)
by: Wäldchen, Stephan, et al.
Published: (2022)
Formal Mechanistic Interpretability: Automated Circuit Discovery with Provable Guarantees
by: Hadad, Itamar, et al.
Published: (2026)
by: Hadad, Itamar, et al.
Published: (2026)
The 4/$δ$ Bound: Designing Predictable LLM-Verifier Systems for Formal Method Guarantee
by: Dantas, PIerre, et al.
Published: (2025)
by: Dantas, PIerre, et al.
Published: (2025)
Quantifying Multimodal Capabilities: Formal Generalization Guarantees in Pairwise Metric Learning
by: Zhou, Richeng, et al.
Published: (2026)
by: Zhou, Richeng, et al.
Published: (2026)
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
by: Deng, Yihe, et al.
Published: (2025)
by: Deng, Yihe, et al.
Published: (2025)
Red-Teaming Segment Anything Model
by: Jankowski, Krzysztof, et al.
Published: (2024)
by: Jankowski, Krzysztof, et al.
Published: (2024)
ReactionTeam: Teaming Experts for Divergent Thinking Beyond Typical Reaction Patterns
by: Guo, Taicheng, et al.
Published: (2023)
by: Guo, Taicheng, et al.
Published: (2023)
Embodied Red Teaming for Auditing Robotic Foundation Models
by: Karnik, Sathwik, et al.
Published: (2024)
by: Karnik, Sathwik, et al.
Published: (2024)
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
by: Pavlova, Maya, et al.
Published: (2024)
by: Pavlova, Maya, et al.
Published: (2024)
Red-Teaming for Inducing Societal Bias in Large Language Models
by: Luo, Chu Fei, et al.
Published: (2024)
by: Luo, Chu Fei, et al.
Published: (2024)
UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning
by: Zhang, Jiawei, et al.
Published: (2025)
by: Zhang, Jiawei, et al.
Published: (2025)
Efficient Evaluation of LLM Performance with Statistical Guarantees
by: Wu, Skyler, et al.
Published: (2026)
by: Wu, Skyler, et al.
Published: (2026)
Similar Items
-
3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models
by: Yin, Jianyao, et al.
Published: (2025) -
Automatic LLM Red Teaming
by: Belaire, Roman, et al.
Published: (2025) -
PAC-Bayesian Generalization Guarantees for Fairness on Stochastic and Deterministic Classifiers
by: Bastian, Julien, et al.
Published: (2026) -
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
by: Arnaboldi, Luca, et al.
Published: (2024) -
Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD
by: Arnaboldi, Luca, et al.
Published: (2023)