:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Angell, Rico, Brinkmann, Jannik, He, He
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.12913
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training
by: Shah, Avidan, et al.
Published: (2026)

Polynomial Precision Dependence Solutions to Alignment Research Center Matrix Completion Problems
by: Angell, Rico
Published: (2024)

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
by: Karvonen, Adam, et al.
Published: (2024)

Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching
by: Angell, Rico, et al.
Published: (2023)

Estimating Tail Risks in Language Model Output Distributions
by: Angell, Rico, et al.
Published: (2026)

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
by: Brinkmann, Jannik, et al.
Published: (2024)

In-Context Algebra
by: Todd, Eric, et al.
Published: (2025)

Representational Transfer Learning for Matrix Completion
by: He, Yong, et al.
Published: (2024)

Mechanisms of AI Protein Folding in ESMFold
by: Lu, Kevin, et al.
Published: (2026)

Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data
by: Lyu, He, et al.
Published: (2026)

Understanding and Enhancing the Transferability of Jailbreaking Attacks
by: Lin, Runqi, et al.
Published: (2025)

On-Policy Consistency Training Improves LLM Safety with Minimal Capability Degradation
by: Han, Andy, et al.
Published: (2026)

Transfer Learning of Multiobjective Indirect Low-Thrust Trajectories Using Diffusion Models and Markov Chain Monte Carlo
by: Graebner, Jannik, et al.
Published: (2026)

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
by: Brinkmann, Jannik, et al.
Published: (2025)

The Environmental Impact of Ensemble Techniques in Recommender Systems
by: Nitschke, Jannik
Published: (2025)

Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models
by: Li, Songze, et al.
Published: (2026)

Trustworthy Transfer Learning: A Survey
by: Wu, Jun, et al.
Published: (2024)

Eye Gaze-Informed and Context-Aware Pedestrian Trajectory Prediction in Shared Spaces with Automated Shuttles: A Virtual Reality Study
by: Li, Danya, et al.
Published: (2026)

FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
by: Lin, Runqi, et al.
Published: (2025)

SharedRep-RLHF: A Shared Representation Approach to RLHF with Diverse Preferences
by: Mukherjee, Arpan, et al.
Published: (2025)

Jailbreaking LLMs Without Gradients or Priors: Effective and Transferable Attacks
by: Nurlanov, Zhakshylyk, et al.
Published: (2026)

Learning Shared Representations for Multi-Task Linear Bandits
by: Lin, Jiabin, et al.
Published: (2026)

Learning with Shared Representations: Statistical Rates and Efficient Algorithms
by: Niu, Xiaochun, et al.
Published: (2024)

Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models
by: Cui, Kaiyuan, et al.
Published: (2026)

Towards Predicting the Success of Transfer-based Attacks by Quantifying Shared Feature Representations
by: Dale, Ashley S., et al.
Published: (2024)

Robust Knowledge Transfer in Tiered Reinforcement Learning
by: Huang, Jiawei, et al.
Published: (2023)

Attention-Aware GNN-based Input Defense against Multi-Turn LLM Jailbreak
by: Huang, Zixuan, et al.
Published: (2025)

Compound Fault Diagnosis for Train Transmission Systems Using Deep Learning with Fourier-enhanced Representation
by: Rico, Jonathan Adam, et al.
Published: (2025)

Learning Relational Tabular Data without Shared Features
by: Wu, Zhaomin, et al.
Published: (2025)

TRYLOCK: Defense-in-Depth Against LLM Jailbreaks via Layered Preference and Representation Engineering
by: Thornton, Scott
Published: (2026)

Learning Shared Representations from Unpaired Data
by: Yacobi, Amitai, et al.
Published: (2025)

Algorithms for the preordering problem and their application to the task of jointly clustering and ordering the accounts of a social network
by: Irmai, Jannik, et al.
Published: (2025)

Evaluating the performance-deviation of itemKNN in RecBole and LensKit
by: Schmidt, Michael, et al.
Published: (2024)

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations
by: Li, Kevin Y., et al.
Published: (2026)

Hierarchical Successor Representation for Robust Transfer
by: Yu, Changmin, et al.
Published: (2026)

Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
by: Xiang, Yongli, et al.
Published: (2025)

Transfer Learning for Kernel-based Regression
by: Wang, Chao, et al.
Published: (2023)

Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
by: He, Zhida, et al.
Published: (2026)

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
by: Chao, Patrick, et al.
Published: (2024)

Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
by: Yang, Junxiao, et al.
Published: (2025)