:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Yang, Xiaoxue, Lee, Jaeha, Dick, Anna-Katharina, Timm, Jasper, Xie, Fei, Cruz, Diogo
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Machine Learning
Accesso online:	https://arxiv.org/abs/2508.07646
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
di: He, Zhida, et al.
Pubblicazione: (2026)

Automating Deception: Scalable Multi-Turn LLM Jailbreaks
di: Kumarappan, Adarsh, et al.
Pubblicazione: (2025)

Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
di: Li, Songze, et al.
Pubblicazione: (2026)

Jailbreaking is (Mostly) Simpler Than You Think
di: Russinovich, Mark, et al.
Pubblicazione: (2025)

How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
di: Feldman, Shai, et al.
Pubblicazione: (2026)

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
di: Li, Nathaniel, et al.
Pubblicazione: (2024)

X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
di: Rahman, Salman, et al.
Pubblicazione: (2025)

Exploring the holographic entropy cone via reinforcement learning
di: He, Temple, et al.
Pubblicazione: (2026)

Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
di: Huang, Zixuan, et al.
Pubblicazione: (2025)

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
di: Reddy, Aashray, et al.
Pubblicazione: (2025)

Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles
di: Yang, Fan, et al.
Pubblicazione: (2024)

TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards
di: Xiong, Xiqiao, et al.
Pubblicazione: (2025)

Understanding the learned look-ahead behavior of chess neural networks
di: Cruz, Diogo
Pubblicazione: (2025)

RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
di: Jiang, Yifan, et al.
Pubblicazione: (2024)

Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO
di: Lee, Jaeha, et al.
Pubblicazione: (2025)

AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
di: Reddy, Aashray, et al.
Pubblicazione: (2025)

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
di: Li, Junbo, et al.
Pubblicazione: (2025)

Understanding the Emergence of Seemingly Useless Features in Next-Token Predictors
di: Rofin, Mark, et al.
Pubblicazione: (2026)

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses
di: Shrestha, Robik, et al.
Pubblicazione: (2022)

A Simpler Alternative to Variational Regularized Counterfactual Risk Minimization
di: Bakker, Hua Chang, et al.
Pubblicazione: (2024)

Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective
di: Zhang, Jinouwen, et al.
Pubblicazione: (2024)

Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives
di: Schwinn, Leo, et al.
Pubblicazione: (2025)

Intent Laundering: AI Safety Datasets Are Not What They Seem
di: Golchin, Shahriar, et al.
Pubblicazione: (2026)

Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design
di: Wei, Quan, et al.
Pubblicazione: (2025)

From Sparsity to Simplicity: Enabling Simpler Sequential Replacements via Sparse Attention Distillation
di: Ren, Yuxin, et al.
Pubblicazione: (2026)

SimpleFold: Folding Proteins is Simpler than You Think
di: Wang, Yuyang, et al.
Pubblicazione: (2025)

Comparator-Adaptive $Φ$-Regret: Improved Bounds, Simpler Algorithms, and Applications to Games
di: Hait, Soumita, et al.
Pubblicazione: (2025)

The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth
di: Quétu, Victor, et al.
Pubblicazione: (2024)

Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection
di: Yang, Lin, et al.
Pubblicazione: (2023)

OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
di: Koo, Jahyun, et al.
Pubblicazione: (2024)

BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models
di: Lee, Isack, et al.
Pubblicazione: (2024)

Position: Understanding LLMs Requires More Than Statistical Generalization
di: Reizinger, Patrik, et al.
Pubblicazione: (2024)

MPLite: Multi-Aspect Pretraining for Mining Clinical Health Records
di: Yang, Eric, et al.
Pubblicazione: (2024)

Seemingly Redundant Modules Enhance Robust Odor Learning in Fruit Flies
di: Li, Haiyang, et al.
Pubblicazione: (2025)

In-Place Feedback: Reliable Refinement for Multi-Turn Expert-LLM Collaboration
di: Choi, Youngbin, et al.
Pubblicazione: (2025)

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing
di: Yang, Ning, et al.
Pubblicazione: (2026)

Out-of-Domain Intent Detection Considering Multi-Turn Dialogue Contexts
di: Lang, Hao, et al.
Pubblicazione: (2023)

Mitigating Conversational Inertia in Multi-Turn Agents
di: Wan, Yang, et al.
Pubblicazione: (2026)

Discovering Hierarchy-Grounded Domains with Adaptive Granularity for Clinical Domain Generalization
di: Hu, Pengfei, et al.
Pubblicazione: (2025)

Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better
di: Balmaseda, Vicente, et al.
Pubblicazione: (2024)