Saved in:
| Main Authors: | Bartoszcze, Lukasz, Munshi, Sarthak, Sukidi, Bryan, Yen, Jennifer, Yang, Zejia, Williams-King, David, Le, Linh, Asuzu, Kosi, Maple, Carsten |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.17601 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Representation Noising: A Defence Mechanism Against Harmful Finetuning
by: Rosati, Domenic, et al.
Published: (2024)
by: Rosati, Domenic, et al.
Published: (2024)
Immunization against harmful fine-tuning attacks
by: Rosati, Domenic, et al.
Published: (2024)
by: Rosati, Domenic, et al.
Published: (2024)
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
by: Williams-King, David, et al.
Published: (2025)
by: Williams-King, David, et al.
Published: (2025)
Individualised Counterfactual Examples Using Conformal Prediction Intervals
by: Adams, James M., et al.
Published: (2025)
by: Adams, James M., et al.
Published: (2025)
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
by: Wehner, Jan, et al.
Published: (2025)
by: Wehner, Jan, et al.
Published: (2025)
Single-Configuration Attack Success Rate Is Not Enough: Jailbreak Evaluations Should Report Distributional Attack Success
by: Maple, Carsten, et al.
Published: (2026)
by: Maple, Carsten, et al.
Published: (2026)
Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges
by: Tapwal, Riya, et al.
Published: (2026)
by: Tapwal, Riya, et al.
Published: (2026)
PRISM: Generation-Time Detection and Mitigation of Secret Leakage in Multi-Agent LLM Pipelines
by: Tapwal, Riya, et al.
Published: (2026)
by: Tapwal, Riya, et al.
Published: (2026)
Justified Evidence Collection for Argument-based AI Fairness Assurance
by: Sabuncuoglu, Alpay, et al.
Published: (2025)
by: Sabuncuoglu, Alpay, et al.
Published: (2025)
Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity
by: Scott, Mary, et al.
Published: (2024)
by: Scott, Mary, et al.
Published: (2024)
Private Federated Multiclass Post-hoc Calibration
by: Maddock, Samuel, et al.
Published: (2025)
by: Maddock, Samuel, et al.
Published: (2025)
FLAIM: AIM-based Synthetic Data Generation in the Federated Setting
by: Maddock, Samuel, et al.
Published: (2023)
by: Maddock, Samuel, et al.
Published: (2023)
DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants
by: Kumar, Abhishek, et al.
Published: (2026)
by: Kumar, Abhishek, et al.
Published: (2026)
Latent Personality Alignment: Improving Harmlessness Without Mentioning Harms
by: Le, Linh, et al.
Published: (2026)
by: Le, Linh, et al.
Published: (2026)
Audio Computer-Assisted Self Interview Compared to Traditional Interview in an HIV-Related Behavioral Survey in Vietnam
by: Linh Cu Le
Published: (2012)
by: Linh Cu Le
Published: (2012)
Threat, Risk and Mitigation Taxonomy for Digital Identity Systems
by: SHEIK, AL TARIQ, et al.
Published: (2024)
by: SHEIK, AL TARIQ, et al.
Published: (2024)
Towards Smart Healthcare: Challenges and Opportunities in IoT and ML
by: Saifuzzaman, Munshi, et al.
Published: (2023)
by: Saifuzzaman, Munshi, et al.
Published: (2023)
LearnedCache: An eBPF-Integrated Perceptron-Based Eviction Policy for the Linux Page Cache
by: Qi, Zejia
Published: (2026)
by: Qi, Zejia
Published: (2026)
Differentially Private Health Tokens for Estimating COVID-19 Risk
by: Butler, David, et al.
Published: (2020)
by: Butler, David, et al.
Published: (2020)
Data-Agnostic Face Image Synthesis Detection Using Bayesian CNNs
by: Leyva, Roberto, et al.
Published: (2024)
by: Leyva, Roberto, et al.
Published: (2024)
Operationalising Artificial Intelligence Bills of Materials (AIBOMs) for Verifiable AI Provenance and Lifecycle Assurance
by: Radanliev, Petar, et al.
Published: (2026)
by: Radanliev, Petar, et al.
Published: (2026)
Distributed, communication-efficient, and differentially private estimation of KL divergence
by: Scott, Mary, et al.
Published: (2024)
by: Scott, Mary, et al.
Published: (2024)
SBOMs into Agentic AIBOMs: Schema Extensions, Agentic Orchestration, and Reproducibility Evaluation
by: Radanliev, Petar, et al.
Published: (2026)
by: Radanliev, Petar, et al.
Published: (2026)
Field-Localized Forgery Detection for Digital Identity Documents
by: Kumar, Abhishek, et al.
Published: (2026)
by: Kumar, Abhishek, et al.
Published: (2026)
Detecting Face Synthesis Using a Concealed Fusion Model
by: Leyva, Roberto, et al.
Published: (2024)
by: Leyva, Roberto, et al.
Published: (2024)
Manifold of Failure: Behavioral Attraction Basins in Language Models
by: Munshi, Sarthak, et al.
Published: (2026)
by: Munshi, Sarthak, et al.
Published: (2026)
ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
by: Munshi, Sarthak, et al.
Published: (2025)
by: Munshi, Sarthak, et al.
Published: (2025)
acad_recuperation_joueurs_exclus_fr-ca
by: Maple, Kevon
Published: (2026)
by: Maple, Kevon
Published: (2026)
acad_self_excluded_player_recovery_en-ca
by: Maple, Kevon
Published: (2026)
by: Maple, Kevon
Published: (2026)
acad_dispute_resolution_handbook_bilingual_fr-ca
by: Maple, Kevon
Published: (2026)
by: Maple, Kevon
Published: (2026)
acad_rg_resource_compendium_bilingual_fr-ca
by: Maple, Kevon
Published: (2026)
by: Maple, Kevon
Published: (2026)
acad_withdrawal_caps_high_rollers_en-ca
by: Maple, Kevon
Published: (2026)
by: Maple, Kevon
Published: (2026)
acad_trustpilot_casinos_quebecois_fr-ca
by: Maple, Kevon
Published: (2026)
by: Maple, Kevon
Published: (2026)
Refugee Reception in Southern Africa
by: Maple, Nicholas
Published: (2024)
by: Maple, Nicholas
Published: (2024)
SRA: Span Representation Alignment for Large Language Model Distillation
by: Dao, Quoc Phong, et al.
Published: (2026)
by: Dao, Quoc Phong, et al.
Published: (2026)
A Game-Theoretic Approach for PMU Deployment Against False Data Injection Attacks
by: Maleki, Sajjad, et al.
Published: (2024)
by: Maleki, Sajjad, et al.
Published: (2024)
A privacy preserving querying mechanism with high utility for electric vehicles
by: Atmaca, Ugur Ilker, et al.
Published: (2022)
by: Atmaca, Ugur Ilker, et al.
Published: (2022)
Large Language Models and the Rationalist Empiricist Debate
by: King, David
Published: (2024)
by: King, David
Published: (2024)
Spherical Steering: Geometry-Aware Activation Rotation for Language Models
by: You, Zejia, et al.
Published: (2026)
by: You, Zejia, et al.
Published: (2026)
Securing Cryptographic Software via Typed Assembly Language (Extended Version)
by: Song, Shixin, et al.
Published: (2025)
by: Song, Shixin, et al.
Published: (2025)
Similar Items
-
Representation Noising: A Defence Mechanism Against Harmful Finetuning
by: Rosati, Domenic, et al.
Published: (2024) -
Immunization against harmful fine-tuning attacks
by: Rosati, Domenic, et al.
Published: (2024) -
Can Safety Fine-Tuning Be More Principled? Lessons Learned from Cybersecurity
by: Williams-King, David, et al.
Published: (2025) -
Individualised Counterfactual Examples Using Conformal Prediction Intervals
by: Adams, James M., et al.
Published: (2025) -
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
by: Wehner, Jan, et al.
Published: (2025)