:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Simbeck, Katharina, Mahran, Mariam
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence Computers and Society
Online Access:	https://arxiv.org/abs/2509.17665
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
by: Mahran, Mariam, et al.
Published: (2025)

Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
by: Mahran, Mariam, et al.
Published: (2025)

Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
by: Song, Xiangchen, et al.
Published: (2025)

The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
by: Bai, Xiaoyan, et al.
Published: (2026)

Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use
by: Li, Yiyang, et al.
Published: (2025)

Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off
by: Kenny, Eoin M., et al.
Published: (2024)

Binary Autoencoder for Mechanistic Interpretability of Large Language Models
by: Cho, Hakaze, et al.
Published: (2025)

Resa: Transparent Reasoning Models via SAEs
by: Wang, Shangshang, et al.
Published: (2025)

Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024)

Building Interpretable Models for Moral Decision-Making
by: Goel, Mayank, et al.
Published: (2026)

Evaluating Large Language Models for Fair and Reliable Organ Allocation
by: Kim, Brian Hyeongseok, et al.
Published: (2025)

Existential Conversations with Large Language Models: Content, Community, and Culture
by: Shanahan, Murray, et al.
Published: (2024)

Correlated Errors in Large Language Models
by: Kim, Elliot, et al.
Published: (2025)

Hypothesis Generation with Large Language Models
by: Zhou, Yangqiaoyu, et al.
Published: (2024)

Large Language Models are Geographically Biased
by: Manvi, Rohin, et al.
Published: (2024)

Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts
by: Zuin, Gianlucca, et al.
Published: (2025)

Participatory Assessment of Large Language Model Applications in an Academic Medical Center
by: Carra, Giorgia, et al.
Published: (2024)

Interpretable Recognition of Cognitive Distortions in Natural Language Texts
by: Kolonin, Anton, et al.
Published: (2025)

KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts
by: Coscia, Adam, et al.
Published: (2024)

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
by: Ensign, Danielle, et al.
Published: (2025)

Harnessing Large Language Models for Mental Health: Opportunities, Challenges, and Ethical Considerations
by: Pandey, Hari Mohan
Published: (2024)

Psychological Counseling Ability of Large Language Models
by: Peng, Fangyu, et al.
Published: (2025)

Assessing Large Language Models on Climate Information
by: Bulian, Jannis, et al.
Published: (2023)

Fairness-Aware Interpretable Modeling (FAIM) for Trustworthy Machine Learning in Healthcare
by: Liu, Mingxuan, et al.
Published: (2024)

Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data
by: Nolte, Henrik, et al.
Published: (2025)

A Taxonomy of Stereotype Content in Large Language Models
by: Nicolas, Gandalf, et al.
Published: (2024)

Bias and Fairness in Large Language Models: A Survey
by: Gallegos, Isabel O., et al.
Published: (2023)

Transforming Agency. On the mode of existence of Large Language Models
by: Barandiaran, Xabier E., et al.
Published: (2024)

LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
by: Faiz, Ahmad, et al.
Published: (2023)

PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
by: Sehwag, Udari Madhushani, et al.
Published: (2025)

Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction
by: Xu, Yiming, et al.
Published: (2025)

Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?
by: Zakazov, Ivan, et al.
Published: (2024)

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity
by: Pudasaini, Shushanta, et al.
Published: (2024)

ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models
by: Heakl, Ahmed, et al.
Published: (2024)

iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries
by: Coscia, Adam, et al.
Published: (2024)

Foundational Challenges in Assuring Alignment and Safety of Large Language Models
by: Anwar, Usman, et al.
Published: (2024)

Exploring Accuracy-Fairness Trade-off in Large Language Models
by: Zhang, Qingquan, et al.
Published: (2024)

Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys
by: Geng, Mingmeng, et al.
Published: (2024)

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026)

Transparent AI: The Case for Interpretability and Explainability
by: Ramachandram, Dhanesh, et al.
Published: (2025)