:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Suriyakumar, Vinith M., Sekhari, Ayush, Stempfle, Lena, Wang, Robertson, Simpson, Michael, Portnoff, Rebecca, Ghassemi, Marzyeh, Wilson, Ashia C.
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Computers and Society
Accesso online:	https://arxiv.org/abs/2604.25119
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

UCD: Unlearning in LLMs via Contrastive Decoding
di: Suriyakumar, Vinith M., et al.
Pubblicazione: (2025)

When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment
di: Xiao, Yuxin, et al.
Pubblicazione: (2025)

Algorithmic Pluralism: A Structural Approach To Equal Opportunity
di: Jain, Shomik, et al.
Pubblicazione: (2023)

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
di: Suriyakumar, Vinith M., et al.
Pubblicazione: (2024)

AI Generated Child Sexual Abuse Material -- What's the Harm?
di: Ciardha, Caoilte Ó, et al.
Pubblicazione: (2025)

Layered Unlearning for Adversarial Relearning
di: Qian, Timothy, et al.
Pubblicazione: (2025)

Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models
di: Shaib, Chantal, et al.
Pubblicazione: (2025)

An Investigation of Memorization Risk in Healthcare Foundation Models
di: Tonekaboni, Sana, et al.
Pubblicazione: (2025)

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
di: Chan, Yik Siu, et al.
Pubblicazione: (2025)

Position: AI Evaluations Should be Grounded on a Theory of Capability
di: Jo, Nathanael, et al.
Pubblicazione: (2025)

Bias Delayed is Bias Denied? Assessing the Effect of Reporting Delays on Disparity Assessments
di: Gosciak, Jennah, et al.
Pubblicazione: (2025)

Generative AI in Medicine
di: Shanmugam, Divya, et al.
Pubblicazione: (2024)

What's in a Query: Polarity-Aware Distribution-Based Fair Ranking
di: Balagopalan, Aparna, et al.
Pubblicazione: (2025)

In the Name of Fairness: Assessing the Bias in Clinical Record De-identification
di: Xiao, Yuxin, et al.
Pubblicazione: (2023)

Allocation Multiplicity: Evaluating the Promises of the Rashomon Set
di: Jain, Shomik, et al.
Pubblicazione: (2025)

As an AI Language Model, "Yes I Would Recommend Calling the Police": Norm Inconsistency in LLM Decision-Making
di: Jain, Shomik, et al.
Pubblicazione: (2024)

The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches
di: Lev, Omri, et al.
Pubblicazione: (2025)

Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
di: Jain, Shomik, et al.
Pubblicazione: (2024)

Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection
di: Jain, Saachi, et al.
Pubblicazione: (2024)

Identifying Implicit Social Biases in Vision-Language Models
di: Hamidieh, Kimia, et al.
Pubblicazione: (2024)

DiffusionWorldViewer: Exposing and Broadening the Worldview Reflected by Generative Text-to-Image Models
di: De Simone, Zoe, et al.
Pubblicazione: (2023)

Just in Plain Sight: Unveiling CSAM Distribution Campaigns on the Clear Web
di: Lykousas, Nikolaos, et al.
Pubblicazione: (2025)

Automating Transparency Mechanisms in the Judicial System Using LLMs: Opportunities and Challenges
di: Shastri, Ishana, et al.
Pubblicazione: (2024)

Unveiling AI's Threats to Child Protection: Regulatory efforts to Criminalize AI-Generated CSAM and Emerging Children's Rights Violations
di: Kokolaki, Emmanouela, et al.
Pubblicazione: (2025)

Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks
di: Di, Jimmy Z., et al.
Pubblicazione: (2022)

Evaluation eines Lehramtsmasterstudiengangs mit dem Profil Quereinstieg im Fach Physik
di: Ghassemi, Novid
Pubblicazione: (2024)

Homogeneous Algorithms Can Reduce Competition in Personalized Pricing
di: Jo, Nathanael, et al.
Pubblicazione: (2025)

Machine Unlearning Fails to Remove Data Poisoning Attacks
di: Pawelczyk, Martin, et al.
Pubblicazione: (2024)

Task-Dependent Evaluation of LLM Output Homogenization: A Taxonomy-Guided Framework
di: Jain, Shomik, et al.
Pubblicazione: (2025)

Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights
di: Javed, Rafiya, et al.
Pubblicazione: (2025)

Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach
di: Ghosh, Sourojit, et al.
Pubblicazione: (2024)

Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models
di: An, Bang, et al.
Pubblicazione: (2024)

Towards a Harms Taxonomy of AI Likeness Generation
di: Bariach, Ben, et al.
Pubblicazione: (2024)

The MedPerturb Dataset: What Non-Content Perturbations Reveal About Human and Clinical LLM Decision Making
di: Gourabathina, Abinitha, et al.
Pubblicazione: (2025)

Safety and Security Analysis of Large Language Models: Benchmarking Risk Profile and Harm Potential
di: Akiri, Charankumar, et al.
Pubblicazione: (2025)

MaskMedPaint: Masked Medical Image Inpainting with Diffusion Models for Mitigation of Spurious Correlations
di: Jin, Qixuan, et al.
Pubblicazione: (2024)

Measuring Stochastic Data Complexity with Boltzmann Influence Functions
di: Ng, Nathan, et al.
Pubblicazione: (2024)

The Role of Computing Resources in Publishing Foundation Model Research
di: Hao, Yuexing, et al.
Pubblicazione: (2025)

Machine-arranged Interactions Improve Institutional Belonging and Cohesion
di: Ghassemi, Mohammad M., et al.
Pubblicazione: (2024)

Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations
di: Salaudeen, Olawale, et al.
Pubblicazione: (2025)