:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kassem, Aly, Jiralerspong, Thomas, Rostamzadeh, Negar, Farnadi, Golnoosh
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.04426
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
by: Jiralerspong, Thomas, et al.
Published: (2026)

Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
by: Kassem, Aly M., et al.
Published: (2025)

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
by: Farnadi, Golnoosh, et al.
Published: (2024)

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
by: Minder, Julian, et al.
Published: (2025)

Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
by: Bayazit, Deniz, et al.
Published: (2025)

LoRA Provides Differential Privacy by Design via Random Sketching
by: Malekmohammadi, Saber, et al.
Published: (2024)

Group Crosscoders for Mechanistic Analysis of Symmetry
by: Gorton, Liv
Published: (2024)

Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity
by: Ganesh, Prakhar, et al.
Published: (2026)

Systemizing Multiplicity: The Curious Case of Arbitrariness in Machine Learning
by: Ganesh, Prakhar, et al.
Published: (2025)

Fairness in Federated Learning: Fairness for Whom?
by: Taik, Afaf, et al.
Published: (2025)

Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations
by: Moradi, Armin, et al.
Published: (2024)

Sparse Crosscoders for diffing MoEs and Dense models
by: Chaudhari, Marmik, et al.
Published: (2026)

The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity
by: Ganesh, Prakhar, et al.
Published: (2024)

Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
by: Ganesh, Prakhar, et al.
Published: (2024)

fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery
by: Demou, Andreas D., et al.
Published: (2026)

Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
by: Farashah, Alireza Dehghanpour, et al.
Published: (2026)

Balancing Profit and Fairness in Risk-Based Pricing Markets
by: Thibodeau, Jesse, et al.
Published: (2025)

Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms
by: Rismani, Shalaleh, et al.
Published: (2025)

Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control
by: Jiralerspong, Thomas, et al.
Published: (2025)

Efficient Causal Graph Discovery Using Large Language Models
by: Jiralerspong, Thomas, et al.
Published: (2024)

Wasserstein Distributionally Robust Optimization Through the Lens of Structural Causal Models and Individual Fairness
by: Ehyaei, Ahmad-Reza, et al.
Published: (2025)

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)

A Complexity-Based Theory of Compositionality
by: Elmoznino, Eric, et al.
Published: (2024)

BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes
by: Huang, Baichuan, et al.
Published: (2025)

DifFaiRec: Generative Fair Recommender with Conditional Diffusion Model
by: Jiang, Zhenhao, et al.
Published: (2024)

DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
by: Jiang, Yanfeng, et al.
Published: (2024)

Designing Ambiguity Sets for Distributionally Robust Optimization Using Structural Causal Optimal Transport
by: Ehyaei, Ahmad-Reza, et al.
Published: (2025)

Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs
by: Giordani, Jeremiah
Published: (2025)

DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering
by: Jain, Suparshva, et al.
Published: (2025)

Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness
by: Ehyaei, Ahmad-Reza, et al.
Published: (2023)

Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation
by: Liu, Jinmei, et al.
Published: (2026)

You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
by: Roy, Shuvendu, et al.
Published: (2025)

Geometric Signatures of Compositionality Across a Language Model's Lifetime
by: Lee, Jin Hwa, et al.
Published: (2024)

What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
by: Humayun, Ahmed Imtiaz, et al.
Published: (2024)

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025)

Understanding Intrinsic Socioeconomic Biases in Large Language Models
by: Arzaghi, Mina, et al.
Published: (2024)

Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity
by: Ganesh, Prakhar, et al.
Published: (2025)

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets
by: Fauchard, Marylou, et al.
Published: (2025)

How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models
by: He, Feng, et al.
Published: (2025)

Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
by: Lu, Ning, et al.
Published: (2025)