Saved in:
| Main Authors: | Kassem, Aly, Jiralerspong, Thomas, Rostamzadeh, Negar, Farnadi, Golnoosh |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.04426 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
by: Jiralerspong, Thomas, et al.
Published: (2026)
by: Jiralerspong, Thomas, et al.
Published: (2026)
Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
by: Kassem, Aly M., et al.
Published: (2025)
by: Kassem, Aly M., et al.
Published: (2025)
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
by: Farnadi, Golnoosh, et al.
Published: (2024)
by: Farnadi, Golnoosh, et al.
Published: (2024)
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
by: Minder, Julian, et al.
Published: (2025)
by: Minder, Julian, et al.
Published: (2025)
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
by: Bayazit, Deniz, et al.
Published: (2025)
by: Bayazit, Deniz, et al.
Published: (2025)
LoRA Provides Differential Privacy by Design via Random Sketching
by: Malekmohammadi, Saber, et al.
Published: (2024)
by: Malekmohammadi, Saber, et al.
Published: (2024)
Group Crosscoders for Mechanistic Analysis of Symmetry
by: Gorton, Liv
Published: (2024)
by: Gorton, Liv
Published: (2024)
Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity
by: Ganesh, Prakhar, et al.
Published: (2026)
by: Ganesh, Prakhar, et al.
Published: (2026)
Systemizing Multiplicity: The Curious Case of Arbitrariness in Machine Learning
by: Ganesh, Prakhar, et al.
Published: (2025)
by: Ganesh, Prakhar, et al.
Published: (2025)
Fairness in Federated Learning: Fairness for Whom?
by: Taik, Afaf, et al.
Published: (2025)
by: Taik, Afaf, et al.
Published: (2025)
Advancing Cultural Inclusivity: Optimizing Embedding Spaces for Balanced Music Recommendations
by: Moradi, Armin, et al.
Published: (2024)
by: Moradi, Armin, et al.
Published: (2024)
Sparse Crosscoders for diffing MoEs and Dense models
by: Chaudhari, Marmik, et al.
Published: (2026)
by: Chaudhari, Marmik, et al.
Published: (2026)
The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity
by: Ganesh, Prakhar, et al.
Published: (2024)
by: Ganesh, Prakhar, et al.
Published: (2024)
Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
by: Ganesh, Prakhar, et al.
Published: (2024)
by: Ganesh, Prakhar, et al.
Published: (2024)
fmxcoders: Factorized Masked Crosscoders for Cross-Layer Feature Discovery
by: Demou, Andreas D., et al.
Published: (2026)
by: Demou, Andreas D., et al.
Published: (2026)
Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs
by: Farashah, Alireza Dehghanpour, et al.
Published: (2026)
by: Farashah, Alireza Dehghanpour, et al.
Published: (2026)
Balancing Profit and Fairness in Risk-Based Pricing Markets
by: Thibodeau, Jesse, et al.
Published: (2025)
by: Thibodeau, Jesse, et al.
Published: (2025)
Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms
by: Rismani, Shalaleh, et al.
Published: (2025)
by: Rismani, Shalaleh, et al.
Published: (2025)
Shaping Inductive Bias in Diffusion Models through Frequency-Based Noise Control
by: Jiralerspong, Thomas, et al.
Published: (2025)
by: Jiralerspong, Thomas, et al.
Published: (2025)
Efficient Causal Graph Discovery Using Large Language Models
by: Jiralerspong, Thomas, et al.
Published: (2024)
by: Jiralerspong, Thomas, et al.
Published: (2024)
Wasserstein Distributionally Robust Optimization Through the Lens of Structural Causal Models and Individual Fairness
by: Ehyaei, Ahmad-Reza, et al.
Published: (2025)
by: Ehyaei, Ahmad-Reza, et al.
Published: (2025)
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
A Complexity-Based Theory of Compositionality
by: Elmoznino, Eric, et al.
Published: (2024)
by: Elmoznino, Eric, et al.
Published: (2024)
BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes
by: Huang, Baichuan, et al.
Published: (2025)
by: Huang, Baichuan, et al.
Published: (2025)
DifFaiRec: Generative Fair Recommender with Conditional Diffusion Model
by: Jiang, Zhenhao, et al.
Published: (2024)
by: Jiang, Zhenhao, et al.
Published: (2024)
DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization
by: Jiang, Yanfeng, et al.
Published: (2024)
by: Jiang, Yanfeng, et al.
Published: (2024)
Designing Ambiguity Sets for Distributionally Robust Optimization Using Structural Causal Optimal Transport
by: Ehyaei, Ahmad-Reza, et al.
Published: (2025)
by: Ehyaei, Ahmad-Reza, et al.
Published: (2025)
Re-Emergent Misalignment: How Narrow Fine-Tuning Erodes Safety Alignment in LLMs
by: Giordani, Jeremiah
Published: (2025)
by: Giordani, Jeremiah
Published: (2025)
DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering
by: Jain, Suparshva, et al.
Published: (2025)
by: Jain, Suparshva, et al.
Published: (2025)
Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness
by: Ehyaei, Ahmad-Reza, et al.
Published: (2023)
by: Ehyaei, Ahmad-Reza, et al.
Published: (2023)
Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation
by: Liu, Jinmei, et al.
Published: (2026)
by: Liu, Jinmei, et al.
Published: (2026)
You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
by: Roy, Shuvendu, et al.
Published: (2025)
by: Roy, Shuvendu, et al.
Published: (2025)
Geometric Signatures of Compositionality Across a Language Model's Lifetime
by: Lee, Jin Hwa, et al.
Published: (2024)
by: Lee, Jin Hwa, et al.
Published: (2024)
What Secrets Do Your Manifolds Hold? Understanding the Local Geometry of Generative Models
by: Humayun, Ahmed Imtiaz, et al.
Published: (2024)
by: Humayun, Ahmed Imtiaz, et al.
Published: (2024)
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
by: Ye, Kai, et al.
Published: (2025)
by: Ye, Kai, et al.
Published: (2025)
Understanding Intrinsic Socioeconomic Biases in Large Language Models
by: Arzaghi, Mina, et al.
Published: (2024)
by: Arzaghi, Mina, et al.
Published: (2024)
Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity
by: Ganesh, Prakhar, et al.
Published: (2025)
by: Ganesh, Prakhar, et al.
Published: (2025)
Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets
by: Fauchard, Marylou, et al.
Published: (2025)
by: Fauchard, Marylou, et al.
Published: (2025)
How Robust is Model Editing after Fine-Tuning? An Empirical Study on Text-to-Image Diffusion Models
by: He, Feng, et al.
Published: (2025)
by: He, Feng, et al.
Published: (2025)
Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets
by: Lu, Ning, et al.
Published: (2025)
by: Lu, Ning, et al.
Published: (2025)
Similar Items
-
Cross-Architecture Model Diffing with Crosscoders: Unsupervised Discovery of Differences Between LLMs
by: Jiralerspong, Thomas, et al.
Published: (2026) -
Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
by: Kassem, Aly M., et al.
Published: (2025) -
Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
by: Farnadi, Golnoosh, et al.
Published: (2024) -
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
by: Minder, Julian, et al.
Published: (2025) -
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
by: Bayazit, Deniz, et al.
Published: (2025)