Saved in:
| Main Authors: | Angell, Rico, Brinkmann, Jannik, He, He |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.12913 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training
by: Shah, Avidan, et al.
Published: (2026)
by: Shah, Avidan, et al.
Published: (2026)
Polynomial Precision Dependence Solutions to Alignment Research Center Matrix Completion Problems
by: Angell, Rico
Published: (2024)
by: Angell, Rico
Published: (2024)
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
by: Karvonen, Adam, et al.
Published: (2024)
by: Karvonen, Adam, et al.
Published: (2024)
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching
by: Angell, Rico, et al.
Published: (2023)
by: Angell, Rico, et al.
Published: (2023)
Estimating Tail Risks in Language Model Output Distributions
by: Angell, Rico, et al.
Published: (2026)
by: Angell, Rico, et al.
Published: (2026)
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
by: Brinkmann, Jannik, et al.
Published: (2024)
by: Brinkmann, Jannik, et al.
Published: (2024)
In-Context Algebra
by: Todd, Eric, et al.
Published: (2025)
by: Todd, Eric, et al.
Published: (2025)
Representational Transfer Learning for Matrix Completion
by: He, Yong, et al.
Published: (2024)
by: He, Yong, et al.
Published: (2024)
Mechanisms of AI Protein Folding in ESMFold
by: Lu, Kevin, et al.
Published: (2026)
by: Lu, Kevin, et al.
Published: (2026)
Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data
by: Lyu, He, et al.
Published: (2026)
by: Lyu, He, et al.
Published: (2026)
Understanding and Enhancing the Transferability of Jailbreaking Attacks
by: Lin, Runqi, et al.
Published: (2025)
by: Lin, Runqi, et al.
Published: (2025)
On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation
by: Han, Andy, et al.
Published: (2026)
by: Han, Andy, et al.
Published: (2026)
Transfer Learning of Multiobjective Indirect Low-Thrust Trajectories Using Diffusion Models and Markov Chain Monte Carlo
by: Graebner, Jannik, et al.
Published: (2026)
by: Graebner, Jannik, et al.
Published: (2026)
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
by: Brinkmann, Jannik, et al.
Published: (2025)
by: Brinkmann, Jannik, et al.
Published: (2025)
The Environmental Impact of Ensemble Techniques in Recommender Systems
by: Nitschke, Jannik
Published: (2025)
by: Nitschke, Jannik
Published: (2025)
Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
by: Li, Songze, et al.
Published: (2026)
by: Li, Songze, et al.
Published: (2026)
Trustworthy Transfer Learning: A Survey
by: Wu, Jun, et al.
Published: (2024)
by: Wu, Jun, et al.
Published: (2024)
Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles: A Virtual Reality Study
by: Li, Danya, et al.
Published: (2026)
by: Li, Danya, et al.
Published: (2026)
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
by: Lin, Runqi, et al.
Published: (2025)
by: Lin, Runqi, et al.
Published: (2025)
SharedRep-RLHF: A Shared Representation Approach to RLHF with Diverse Preferences
by: Mukherjee, Arpan, et al.
Published: (2025)
by: Mukherjee, Arpan, et al.
Published: (2025)
Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks
by: Nurlanov, Zhakshylyk, et al.
Published: (2026)
by: Nurlanov, Zhakshylyk, et al.
Published: (2026)
Learning Shared Representations for Multi-Task Linear Bandits
by: Lin, Jiabin, et al.
Published: (2026)
by: Lin, Jiabin, et al.
Published: (2026)
Learning with Shared Representations: Statistical Rates and Efficient Algorithms
by: Niu, Xiaochun, et al.
Published: (2024)
by: Niu, Xiaochun, et al.
Published: (2024)
Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
by: Cui, Kaiyuan, et al.
Published: (2026)
by: Cui, Kaiyuan, et al.
Published: (2026)
Towards Predicting the Success of Transfer-based Attacks by Quantifying Shared Feature Representations
by: Dale, Ashley S., et al.
Published: (2024)
by: Dale, Ashley S., et al.
Published: (2024)
Robust Knowledge Transfer in Tiered Reinforcement Learning
by: Huang, Jiawei, et al.
Published: (2023)
by: Huang, Jiawei, et al.
Published: (2023)
Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
by: Huang, Zixuan, et al.
Published: (2025)
by: Huang, Zixuan, et al.
Published: (2025)
Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation
by: Rico, Jonathan Adam, et al.
Published: (2025)
by: Rico, Jonathan Adam, et al.
Published: (2025)
Learning Relational Tabular Data without Shared Features
by: Wu, Zhaomin, et al.
Published: (2025)
by: Wu, Zhaomin, et al.
Published: (2025)
TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering
by: Thornton, Scott
Published: (2026)
by: Thornton, Scott
Published: (2026)
Learning Shared Representations from Unpaired Data
by: Yacobi, Amitai, et al.
Published: (2025)
by: Yacobi, Amitai, et al.
Published: (2025)
Algorithms for the preordering problem and their application to the task of jointly clustering and ordering the accounts of a social network
by: Irmai, Jannik, et al.
Published: (2025)
by: Irmai, Jannik, et al.
Published: (2025)
Evaluating the performance-deviation of itemKNN in RecBole and LensKit
by: Schmidt, Michael, et al.
Published: (2024)
by: Schmidt, Michael, et al.
Published: (2024)
Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations
by: Li, Kevin Y., et al.
Published: (2026)
by: Li, Kevin Y., et al.
Published: (2026)
Hierarchical Successor Representation for Robust Transfer
by: Yu, Changmin, et al.
Published: (2026)
by: Yu, Changmin, et al.
Published: (2026)
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
by: Xiang, Yongli, et al.
Published: (2025)
by: Xiang, Yongli, et al.
Published: (2025)
Transfer Learning for Kernel-based Regression
by: Wang, Chao, et al.
Published: (2023)
by: Wang, Chao, et al.
Published: (2023)
Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
by: He, Zhida, et al.
Published: (2026)
by: He, Zhida, et al.
Published: (2026)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
by: Chao, Patrick, et al.
Published: (2024)
by: Chao, Patrick, et al.
Published: (2024)
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
by: Yang, Junxiao, et al.
Published: (2025)
by: Yang, Junxiao, et al.
Published: (2025)
Similar Items
-
Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training
by: Shah, Avidan, et al.
Published: (2026) -
Polynomial Precision Dependence Solutions to Alignment Research Center Matrix Completion Problems
by: Angell, Rico
Published: (2024) -
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
by: Karvonen, Adam, et al.
Published: (2024) -
Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching
by: Angell, Rico, et al.
Published: (2023) -
Estimating Tail Risks in Language Model Output Distributions
by: Angell, Rico, et al.
Published: (2026)