:: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Springer, Max, Lee, Chung Peng, Metevier, Blossom, Castleman, Jane, Turbal, Bohdan, Jung, Hayoung, Shen, Zeyu, Korolova, Aleksandra
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.15799
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Measuring Validity in LLM-based Resume Screening
by: Castleman, Jane, et al.
Published: (2026)

Why am I Still Seeing This: Measuring the Effectiveness Of Ad Controls and Explanations in AI-Mediated Ad Targeting Systems
by: Castleman, Jane, et al.
Published: (2024)

Adultification Bias in LLMs and Text-to-Image Models
by: Castleman, Jane, et al.
Published: (2025)

Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
by: Chittepu, Yaswanth, et al.
Published: (2025)

ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets
by: Turbal, Bohdan, et al.
Published: (2026)

When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models
by: Hossain, Ismail, et al.
Published: (2026)

External Evaluation of Discrimination Mitigation Efforts in Meta's Ad Delivery
by: Imana, Basileal, et al.
Published: (2025)

On Adversarial Robustness of Language Models in Transfer Learning
by: Turbal, Bohdan, et al.
Published: (2024)

Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs
by: Roh, Jaechul, et al.
Published: (2026)

ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
by: Shen, Zeyu, et al.
Published: (2025)

Auditing for Racial Discrimination in the Delivery of Education Ads
by: Imana, Basileal, et al.
Published: (2024)

Auditing for Bias in Ad Delivery Using Inferred Demographic Attributes
by: Imana, Basileal, et al.
Published: (2024)

Multilevel Analysis of Cryptocurrency News using RAG Approach with Fine-Tuned Mistral Large Language Model
by: Pavlyshenko, Bohdan M.
Published: (2025)

Stability and Multigroup Fairness in Ranking with Uncertain Predictions
by: Devic, Siddartha, et al.
Published: (2024)

On the Use of Proxies in Political Ad Targeting
by: Sapiezynski, Piotr, et al.
Published: (2024)

Paremias of the Latvians and the Russians in Latgale: From the Holy Scripture to Modern Existence
by: Jelena Korolova
Published: (2020)

Privacy Collapse: Benign Fine-Tuning Can Break Contextual Privacy in Language Models
by: Goel, Anmol, et al.
Published: (2026)

Self-Refining Topology Optimization via an LLM-Based Multi-Agent Framework
by: Park, Hyunjee, et al.
Published: (2026)

"I Have a Dream, Too!": The American Dream in Coretta Scott King Award-Winning Books
by: Parsons, Linda T., et al.
Published: (2011)

Before 2000: Funding Technology in New Jersey's Schools and Public Libraries by the End of the Century.
by: Peretz, Blossom A.
Published: (1997)

RefusalGuard: Geometry-Preserving Fine-Tuning for Safety in LLMs
by: Asif, Sadia, et al.
Published: (2026)

When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
by: Xiao, Yuxin, et al.
Published: (2025)

Multi-Selection for Recommendation Systems
by: Sarmasarkar, Sahasrajit, et al.
Published: (2025)

An External Fairness Evaluation of LinkedIn Talent Search
by: Behzad, Tina, et al.
Published: (2025)

Differential Privacy with Multiple Selections
by: Goel, Ashish, et al.
Published: (2024)

Exploring the Impact of Childhood Trauma Profiles on Social Competence and Self‐Stigma of Seeking Help in Early Adulthood
by: Hayoung Jung, et al.
Published: (2025)

Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs
by: Sahoo, Subramanyam
Published: (2026)

SafeTuneBed: A Toolkit for Benchmarking LLM Safety Alignment in Fine-Tuning
by: Hossain, Saad, et al.
Published: (2025)

Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors
by: Castleman, Blake, et al.
Published: (2023)

Modulation of Cell Cycle Kinases by Kaposi's Sarcoma‐Associated Herpesvirus
by: Steven Longworth, et al.
Published: (2025)

Narrow Fine-Tuning Erodes Safety Alignment in Vision-Language Agents
by: Gulati, Idhant, et al.
Published: (2026)

Stealth Fine-Tuning: Efficiently Breaking Alignment in RVLMs Using Self-Generated CoT
by: Yu, Le, et al.
Published: (2025)

When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning
by: Zhao, Haoran, et al.
Published: (2026)

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry
by: Lan, Wenhao, et al.
Published: (2026)

Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets
by: Hsiung, Lei, et al.
Published: (2025)

GR-SAP: Generative Replay for Safety Alignment Preservation during Fine-Tuning
by: Fang, Zhouxiang, et al.
Published: (2026)

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment
by: Verma, Richa, et al.
Published: (2026)

Targeted Vaccine: Safety Alignment for Large Language Models against Harmful Fine-Tuning via Layer-wise Perturbation
by: Liu, Guozhi, et al.
Published: (2024)

Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation Between the United States and South Africa
by: Jung, Hayoung, et al.
Published: (2024)

Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction
by: Guo, Jiahe, et al.
Published: (2026)