:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Chehbouni, Khaoula, Carr, Jonathan Colaço, More, Yash, Cheung, Jackie CK, Farnadi, Golnoosh
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Computers and Society
Accesso online:	https://arxiv.org/abs/2411.08243
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
di: Chehbouni, Khaoula, et al.
Pubblicazione: (2024)

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
di: Chehbouni, Khaoula, et al.
Pubblicazione: (2025)

Fairness in Federated Learning: Fairness for Whom?
di: Taik, Afaf, et al.
Pubblicazione: (2025)

Enhancing Privacy in the Early Detection of Sexual Predators Through Federated Learning and Differential Privacy
di: Chehbouni, Khaoula, et al.
Pubblicazione: (2025)

Towards More Realistic Extraction Attacks: An Adversarial Perspective
di: More, Yash, et al.
Pubblicazione: (2024)

Understanding Intrinsic Socioeconomic Biases in Large Language Models
di: Arzaghi, Mina, et al.
Pubblicazione: (2024)

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild
di: Mireshghallah, Niloofar, et al.
Pubblicazione: (2024)

Intrinsic Meets Extrinsic Fairness: Assessing the Downstream Impact of Bias Mitigation in Large Language Models
di: Arzaghi', 'Mina, et al.
Pubblicazione: (2025)

LoRA Provides Differential Privacy by Design via Random Sketching
di: Malekmohammadi, Saber, et al.
Pubblicazione: (2024)

Multilingual Hallucination Gaps in Large Language Models
di: Chataigner, Cléa, et al.
Pubblicazione: (2024)

Crossing Boundaries: Leveraging Semantic Divergences to Explore Cultural Novelty in Cooking Recipes
di: Carichon, Florian, et al.
Pubblicazione: (2025)

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities
di: Farnadi, Golnoosh, et al.
Pubblicazione: (2024)

Auditing Agent Harness Safety
di: Liu, Chengzhi, et al.
Pubblicazione: (2026)

Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness
di: Ehyaei, Ahmad-Reza, et al.
Pubblicazione: (2023)

GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
di: Darrin, Maxime, et al.
Pubblicazione: (2024)

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
di: Zhu, Zhaowei, et al.
Pubblicazione: (2023)

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process
di: Carichon, Florian, et al.
Pubblicazione: (2025)

Say It Another Way: Auditing LLMs with a User-Grounded Automated Paraphrasing Framework
di: Chataigner, Cléa, et al.
Pubblicazione: (2025)

Compromising Honesty and Harmlessness in Language Models via Deception Attacks
di: Vaugrante, Laurène, et al.
Pubblicazione: (2025)

Dishonesty in Helpful and Harmless Alignment
di: Huang, Youcheng, et al.
Pubblicazione: (2024)

Embedding Cultural Diversity in Prototype-based Recommender Systems
di: Moradi, Armin, et al.
Pubblicazione: (2024)

Different Horses for Different Courses: Comparing Bias Mitigation Algorithms in ML
di: Ganesh, Prakhar, et al.
Pubblicazione: (2024)

On the Suitability of LLM-Driven Agents for Dark Pattern Audits
di: Sun, Chen, et al.
Pubblicazione: (2026)

Promoting Fair Vaccination Strategies Through Influence Maximization: A Case Study on COVID-19 Spread
di: Neophytou, Nicola, et al.
Pubblicazione: (2024)

Beyond Accuracy: Diagnosing Algebraic Reasoning Failures in LLMs Across Nine Complexity Dimensions
di: Patil, Parth, et al.
Pubblicazione: (2026)

Fairness Incentives in Response to Unfair Dynamic Pricing
di: Thibodeau, Jesse, et al.
Pubblicazione: (2024)

The Cost of Arbitrariness for Individuals: Examining the Legal and Technical Challenges of Model Multiplicity
di: Ganesh, Prakhar, et al.
Pubblicazione: (2024)

Reviving Your MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
di: Kassem, Aly M., et al.
Pubblicazione: (2025)

Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
di: Nghiem, Huy, et al.
Pubblicazione: (2025)

Prioritization First, Principles Second: An Adaptive Interpretation of Helpful, Honest, and Harmless Principles
di: Huang, Yue, et al.
Pubblicazione: (2025)

Auditing Stance Asymmetry in Generative Explanations
di: Han, Jiarui
Pubblicazione: (2026)

Too Helpful, Too Harmless, Too Honest or Just Right?
di: Kashyap, Gautam Siddharth, et al.
Pubblicazione: (2025)

AuditWen:An Open-Source Large Language Model for Audit
di: Huang, Jiajia, et al.
Pubblicazione: (2024)

Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety
di: Stamatis, Caitlin A., et al.
Pubblicazione: (2026)

Balancing Act: Constraining Disparate Impact in Sparse Models
di: Hashemizadeh, Meraj, et al.
Pubblicazione: (2023)

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring
di: Armstrong, Lena, et al.
Pubblicazione: (2024)

Audit Me If You Can: Query-Efficient Active Fairness Auditing of Black-Box LLMs
di: Hartmann, David, et al.
Pubblicazione: (2026)

AuditGPT: Auditing Smart Contracts with ChatGPT
di: Xia, Shihao, et al.
Pubblicazione: (2024)

Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
di: Arnaiz-Rodriguez, Adrian, et al.
Pubblicazione: (2025)

Big Help or Big Brother? Auditing Tracking, Profiling, and Personalization in Generative AI Assistants
di: Vekaria, Yash, et al.
Pubblicazione: (2025)