Saved in:
| Main Authors: | Springer, Max, Lee, Chung Peng, Metevier, Blossom, Castleman, Jane, Turbal, Bohdan, Jung, Hayoung, Shen, Zeyu, Korolova, Aleksandra |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.15799 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Measuring Validity in LLM-based Resume Screening
by: Castleman, Jane, et al.
Published: (2026)
by: Castleman, Jane, et al.
Published: (2026)
Why am I Still Seeing This: Measuring the Effectiveness Of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems
by: Castleman, Jane, et al.
Published: (2024)
by: Castleman, Jane, et al.
Published: (2024)
Adultification Bias in LLMs and Text-to-Image Models
by: Castleman, Jane, et al.
Published: (2025)
by: Castleman, Jane, et al.
Published: (2025)
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets
by: Turbal, Bohdan, et al.
Published: (2026)
by: Turbal, Bohdan, et al.
Published: (2026)
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
by: Hossain, Ismail, et al.
Published: (2026)
by: Hossain, Ismail, et al.
Published: (2026)
External Evaluation of Discrimination Mitigation Efforts in Meta's Ad Delivery
by: Imana, Basileal, et al.
Published: (2025)
by: Imana, Basileal, et al.
Published: (2025)
On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)
by: Turbal, Bohdan, et al.
Published: (2024)
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026)
by: Roh, Jaechul, et al.
Published: (2026)
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
by: Shen, Zeyu, et al.
Published: (2025)
by: Shen, Zeyu, et al.
Published: (2025)
Auditing for Racial Discrimination in the Delivery of Education Ads
by: Imana, Basileal, et al.
Published: (2024)
by: Imana, Basileal, et al.
Published: (2024)
Auditing for Bias in Ad Delivery Using Inferred Demographic Attributes
by: Imana, Basileal, et al.
Published: (2024)
by: Imana, Basileal, et al.
Published: (2024)
Multilevel Analysis of Cryptocurrency News using RAG Approach with Fine-Tuned Mistral Large Language Model
by: Pavlyshenko, Bohdan M.
Published: (2025)
by: Pavlyshenko, Bohdan M.
Published: (2025)
Stability and Multigroup Fairness in Ranking with Uncertain Predictions
by: Devic, Siddartha, et al.
Published: (2024)
by: Devic, Siddartha, et al.
Published: (2024)
On the Use of Proxies in Political Ad Targeting
by: Sapiezynski, Piotr, et al.
Published: (2024)
by: Sapiezynski, Piotr, et al.
Published: (2024)
Paremias of the Latvians and the Russians in Latgale: From the Holy Scripture to Modern Existence
by: Jelena Korolova
Published: (2020)
by: Jelena Korolova
Published: (2020)
Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
by: Goel, Anmol, et al.
Published: (2026)
by: Goel, Anmol, et al.
Published: (2026)
Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework
by: Park, Hyunjee, et al.
Published: (2026)
by: Park, Hyunjee, et al.
Published: (2026)
"I Have a Dream, Too!": The American Dream in Coretta Scott King Award-Winning Books
by: Parsons, Linda T., et al.
Published: (2011)
by: Parsons, Linda T., et al.
Published: (2011)
Before 2000: Funding Technology in New Jersey's Schools and Public Libraries by the End of the Century.
by: Peretz, Blossom A.
Published: (1997)
by: Peretz, Blossom A.
Published: (1997)
RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs
by: Asif, Sadia, et al.
Published: (2026)
by: Asif, Sadia, et al.
Published: (2026)
When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
by: Xiao, Yuxin, et al.
Published: (2025)
by: Xiao, Yuxin, et al.
Published: (2025)
Multi-Selection for Recommendation Systems
by: Sarmasarkar, Sahasrajit, et al.
Published: (2025)
by: Sarmasarkar, Sahasrajit, et al.
Published: (2025)
An External Fairness Evaluation of LinkedIn Talent Search
by: Behzad, Tina, et al.
Published: (2025)
by: Behzad, Tina, et al.
Published: (2025)
Differential Privacy with Multiple Selections
by: Goel, Ashish, et al.
Published: (2024)
by: Goel, Ashish, et al.
Published: (2024)
Exploring the Impact of Childhood Trauma Profiles on Social Competence and Self‐Stigma of Seeking Help in Early Adulthood
by: Hayoung Jung, et al.
Published: (2025)
by: Hayoung Jung, et al.
Published: (2025)
Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)
by: Sahoo, Subramanyam
Published: (2026)
SafeTuneBed: A Toolkit for Benchmarking LLM Safety Alignment in Fine-Tuning
by: Hossain, Saad, et al.
Published: (2025)
by: Hossain, Saad, et al.
Published: (2025)
Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors
by: Castleman, Blake, et al.
Published: (2023)
by: Castleman, Blake, et al.
Published: (2023)
Modulation of Cell Cycle Kinases by Kaposi's Sarcoma‐Associated Herpesvirus
by: Steven Longworth, et al.
Published: (2025)
by: Steven Longworth, et al.
Published: (2025)
Narrow Fine-Tuning Erodes Safety Alignment in Vision-Language Agents
by: Gulati, Idhant, et al.
Published: (2026)
by: Gulati, Idhant, et al.
Published: (2026)
Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT
by: Yu, Le, et al.
Published: (2025)
by: Yu, Le, et al.
Published: (2025)
When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning
by: Zhao, Haoran, et al.
Published: (2026)
by: Zhao, Haoran, et al.
Published: (2026)
Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
by: Lan, Wenhao, et al.
Published: (2026)
by: Lan, Wenhao, et al.
Published: (2026)
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
by: Hsiung, Lei, et al.
Published: (2025)
by: Hsiung, Lei, et al.
Published: (2025)
GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning
by: Fang, Zhouxiang, et al.
Published: (2026)
by: Fang, Zhouxiang, et al.
Published: (2026)
PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment
by: Verma, Richa, et al.
Published: (2026)
by: Verma, Richa, et al.
Published: (2026)
Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation
by: Liu, Guozhi, et al.
Published: (2024)
by: Liu, Guozhi, et al.
Published: (2024)
Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation Between the United States and South Africa
by: Jung, Hayoung, et al.
Published: (2024)
by: Jung, Hayoung, et al.
Published: (2024)
Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction
by: Guo, Jiahe, et al.
Published: (2026)
by: Guo, Jiahe, et al.
Published: (2026)
Similar Items
-
Measuring Validity in LLM-based Resume Screening
by: Castleman, Jane, et al.
Published: (2026) -
Why am I Still Seeing This: Measuring the Effectiveness Of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems
by: Castleman, Jane, et al.
Published: (2024) -
Adultification Bias in LLMs and Text-to-Image Models
by: Castleman, Jane, et al.
Published: (2025) -
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
by: Chittepu, Yaswanth, et al.
Published: (2025) -
ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets
by: Turbal, Bohdan, et al.
Published: (2026)