Saved in:
| Main Authors: | Simbeck, Katharina, Mahran, Mariam |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.17665 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
by: Mahran, Mariam, et al.
Published: (2025)
by: Mahran, Mariam, et al.
Published: (2025)
Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
by: Mahran, Mariam, et al.
Published: (2025)
by: Mahran, Mariam, et al.
Published: (2025)
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
by: Song, Xiangchen, et al.
Published: (2025)
by: Song, Xiangchen, et al.
Published: (2025)
The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
by: Bai, Xiaoyan, et al.
Published: (2026)
by: Bai, Xiaoyan, et al.
Published: (2026)
Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use
by: Li, Yiyang, et al.
Published: (2025)
by: Li, Yiyang, et al.
Published: (2025)
Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off
by: Kenny, Eoin M., et al.
Published: (2024)
by: Kenny, Eoin M., et al.
Published: (2024)
Binary Autoencoder for Mechanistic Interpretability of Large Language Models
by: Cho, Hakaze, et al.
Published: (2025)
by: Cho, Hakaze, et al.
Published: (2025)
Resa: Transparent Reasoning Models via SAEs
by: Wang, Shangshang, et al.
Published: (2025)
by: Wang, Shangshang, et al.
Published: (2025)
Defining and Evaluating Physical Safety for Large Language Models
by: Tang, Yung-Chen, et al.
Published: (2024)
by: Tang, Yung-Chen, et al.
Published: (2024)
Building Interpretable Models for Moral Decision-Making
by: Goel, Mayank, et al.
Published: (2026)
by: Goel, Mayank, et al.
Published: (2026)
Evaluating Large Language Models for Fair and Reliable Organ Allocation
by: Kim, Brian Hyeongseok, et al.
Published: (2025)
by: Kim, Brian Hyeongseok, et al.
Published: (2025)
Existential Conversations with Large Language Models: Content, Community, and Culture
by: Shanahan, Murray, et al.
Published: (2024)
by: Shanahan, Murray, et al.
Published: (2024)
Correlated Errors in Large Language Models
by: Kim, Elliot, et al.
Published: (2025)
by: Kim, Elliot, et al.
Published: (2025)
Hypothesis Generation with Large Language Models
by: Zhou, Yangqiaoyu, et al.
Published: (2024)
by: Zhou, Yangqiaoyu, et al.
Published: (2024)
Large Language Models are Geographically Biased
by: Manvi, Rohin, et al.
Published: (2024)
by: Manvi, Rohin, et al.
Published: (2024)
Leveraging Large Language Models for Tacit Knowledge Discovery in Organizational Contexts
by: Zuin, Gianlucca, et al.
Published: (2025)
by: Zuin, Gianlucca, et al.
Published: (2025)
Participatory Assessment of Large Language Model Applications in an Academic Medical Center
by: Carra, Giorgia, et al.
Published: (2024)
by: Carra, Giorgia, et al.
Published: (2024)
Interpretable Recognition of Cognitive Distortions in Natural Language Texts
by: Kolonin, Anton, et al.
Published: (2025)
by: Kolonin, Anton, et al.
Published: (2025)
KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts
by: Coscia, Adam, et al.
Published: (2024)
by: Coscia, Adam, et al.
Published: (2024)
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
by: Ensign, Danielle, et al.
Published: (2025)
by: Ensign, Danielle, et al.
Published: (2025)
Harnessing Large Language Models for Mental Health: Opportunities, Challenges, and Ethical Considerations
by: Pandey, Hari Mohan
Published: (2024)
by: Pandey, Hari Mohan
Published: (2024)
Psychological Counseling Ability of Large Language Models
by: Peng, Fangyu, et al.
Published: (2025)
by: Peng, Fangyu, et al.
Published: (2025)
Assessing Large Language Models on Climate Information
by: Bulian, Jannis, et al.
Published: (2023)
by: Bulian, Jannis, et al.
Published: (2023)
Fairness-Aware Interpretable Modeling (FAIM) for Trustworthy Machine Learning in Healthcare
by: Liu, Mingxuan, et al.
Published: (2024)
by: Liu, Mingxuan, et al.
Published: (2024)
Machine Learners Should Acknowledge the Legal Implications of Large Language Models as Personal Data
by: Nolte, Henrik, et al.
Published: (2025)
by: Nolte, Henrik, et al.
Published: (2025)
A Taxonomy of Stereotype Content in Large Language Models
by: Nicolas, Gandalf, et al.
Published: (2024)
by: Nicolas, Gandalf, et al.
Published: (2024)
Bias and Fairness in Large Language Models: A Survey
by: Gallegos, Isabel O., et al.
Published: (2023)
by: Gallegos, Isabel O., et al.
Published: (2023)
Transforming Agency. On the mode of existence of Large Language Models
by: Barandiaran, Xabier E., et al.
Published: (2024)
by: Barandiaran, Xabier E., et al.
Published: (2024)
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
by: Faiz, Ahmad, et al.
Published: (2023)
by: Faiz, Ahmad, et al.
Published: (2023)
PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach
by: Sehwag, Udari Madhushani, et al.
Published: (2025)
by: Sehwag, Udari Madhushani, et al.
Published: (2025)
Evaluating Retrieval-Augmented Generation Strategies for Large Language Models in Travel Mode Choice Prediction
by: Xu, Yiming, et al.
Published: (2025)
by: Xu, Yiming, et al.
Published: (2025)
Assessing Social Alignment: Do Personality-Prompted Large Language Models Behave Like Humans?
by: Zakazov, Ivan, et al.
Published: (2024)
by: Zakazov, Ivan, et al.
Published: (2024)
Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity
by: Pudasaini, Shushanta, et al.
Published: (2024)
by: Pudasaini, Shushanta, et al.
Published: (2024)
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models
by: Heakl, Ahmed, et al.
Published: (2024)
by: Heakl, Ahmed, et al.
Published: (2024)
iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries
by: Coscia, Adam, et al.
Published: (2024)
by: Coscia, Adam, et al.
Published: (2024)
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
by: Anwar, Usman, et al.
Published: (2024)
by: Anwar, Usman, et al.
Published: (2024)
Exploring Accuracy-Fairness Trade-off in Large Language Models
by: Zhang, Qingquan, et al.
Published: (2024)
by: Zhang, Qingquan, et al.
Published: (2024)
Are Large Language Models Chameleons? An Attempt to Simulate Social Surveys
by: Geng, Mingmeng, et al.
Published: (2024)
by: Geng, Mingmeng, et al.
Published: (2024)
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness
by: Sahoo, Subramanyam, et al.
Published: (2026)
by: Sahoo, Subramanyam, et al.
Published: (2026)
Transparent AI: The Case for Interpretability and Explainability
by: Ramachandram, Dhanesh, et al.
Published: (2025)
by: Ramachandram, Dhanesh, et al.
Published: (2025)
Similar Items
-
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
by: Mahran, Mariam, et al.
Published: (2025) -
Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs
by: Mahran, Mariam, et al.
Published: (2025) -
Position: Mechanistic Interpretability Should Prioritize Feature Consistency in SAEs
by: Song, Xiangchen, et al.
Published: (2025) -
The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
by: Bai, Xiaoyan, et al.
Published: (2026) -
Interpretable Graph-Language Modeling for Detecting Youth Illicit Drug Use
by: Li, Yiyang, et al.
Published: (2025)