Saved in:
| Main Authors: | Prakash, Nirmalendu, Jie, Yeo Wei, Abdullah, Amir, Satapathy, Ranjan, Cambria, Erik, Lee, Roy Ka Wei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.09708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding Refusal in Language Models with Sparse Autoencoders
by: Yeo, Wei Jie, et al.
Published: (2025)
by: Yeo, Wei Jie, et al.
Published: (2025)
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
by: Yeo, Wei Jie, et al.
Published: (2024)
by: Yeo, Wei Jie, et al.
Published: (2024)
Mitigating Jailbreaks with Intent-Aware LLMs
by: Yeo, Wei Jie, et al.
Published: (2025)
by: Yeo, Wei Jie, et al.
Published: (2025)
Plausible Extractive Rationalization through Semi-Supervised Entailment Signal
by: Yeo, Wei Jie, et al.
Published: (2024)
by: Yeo, Wei Jie, et al.
Published: (2024)
Interpreting Bias in Large Language Models: A Feature-Based Approach
by: Prakash, Nirmalendu, et al.
Published: (2024)
by: Prakash, Nirmalendu, et al.
Published: (2024)
Self-training Large Language Models through Knowledge Detection
by: Yeo, Wei Jie, et al.
Published: (2024)
by: Yeo, Wei Jie, et al.
Published: (2024)
How Interpretable are Reasoning Explanations from Prompting Large Language Models?
by: Yeo, Wei Jie, et al.
Published: (2024)
by: Yeo, Wei Jie, et al.
Published: (2024)
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
by: Yeo, Wei Jie, et al.
Published: (2025)
by: Yeo, Wei Jie, et al.
Published: (2025)
Beyond Correlation: Refutation-Validated Aspect-Based Sentiment Analysis for Explainable Energy Market Returns
by: van der Heever, Wihan, et al.
Published: (2026)
by: van der Heever, Wihan, et al.
Published: (2026)
Activation Space Interventions Can Be Transferred Between Large Language Models
by: Oozeer, Narmeen, et al.
Published: (2025)
by: Oozeer, Narmeen, et al.
Published: (2025)
SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore
by: Ng, Ri Chi, et al.
Published: (2024)
by: Ng, Ri Chi, et al.
Published: (2024)
"Can't believe I'm crying over an anime girl": Public Parasocial Grieving and Coping Towards VTuber Graduation and Termination
by: Lee, Ken Jen, et al.
Published: (2025)
by: Lee, Ken Jen, et al.
Published: (2025)
“Sorry, I Cannot Fulfill That Request”: Analyzing Large Language Model Responses, Redirections, and Refusals to Polarized News Topics
by: Haley Triem, et al.
Published: (2025)
by: Haley Triem, et al.
Published: (2025)
I'm OK, I'm Alive!
Published: (1982)
Published: (1982)
FinXABSA: Explainable Finance through Aspect-Based Sentiment Analysis
by: Ong, Keane, et al.
Published: (2023)
by: Ong, Keane, et al.
Published: (2023)
DreamReader: An Interpretability Toolkit for Text-to-Image Models
by: Prakash, Nirmalendu, et al.
Published: (2026)
by: Prakash, Nirmalendu, et al.
Published: (2026)
Explainable Natural Language Processing for Corporate Sustainability Analysis
by: Ong, Keane, et al.
Published: (2024)
by: Ong, Keane, et al.
Published: (2024)
"I Can Read but I Can't Turn the Pages."
by: Page, Chris
Published: (1992)
by: Page, Chris
Published: (1992)
I'm Taking Over the Internet
by: Sharks, Lee
Published: (2026)
by: Sharks, Lee
Published: (2026)
I'm Sorry Dave: How the old world of personnel security can inform the new world of AI insider risk
by: Martin, Paul, et al.
Published: (2025)
by: Martin, Paul, et al.
Published: (2025)
“Sorry. I am what I am. ”
by: Walton, Chris
Published: (2024)
by: Walton, Chris
Published: (2024)
I'm on the plane!
by: Cohen, David
Published: (2003)
by: Cohen, David
Published: (2003)
I'm the same, I'm the same, I'm trying to change: Investigating the role of human information behavior in view change
by: Dana McKay, et al.
Published: (2024)
by: Dana McKay, et al.
Published: (2024)
I Can't Believe It's Not Scene Flow!
by: Khatri, Ishan, et al.
Published: (2024)
by: Khatri, Ishan, et al.
Published: (2024)
Who Said I Can't Read This?
by: Keene, Leslie
Published: (2004)
by: Keene, Leslie
Published: (2004)
Geometry-Aware CLIP Retrieval via Local Cross-Modal Alignment and Steering
by: Prakash, Nirmalendu, et al.
Published: (2026)
by: Prakash, Nirmalendu, et al.
Published: (2026)
Oyster-I: Beyond Refusal -- Constructive Safety Alignment for Responsible Language Models
by: Duan, Ranjie, et al.
Published: (2025)
by: Duan, Ranjie, et al.
Published: (2025)
I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion
by: Li, Kun, et al.
Published: (2024)
by: Li, Kun, et al.
Published: (2024)
Subjectivity Detection in Nuclear Energy Tweets
by: Ranjan Satapathy
Published: (2017)
by: Ranjan Satapathy
Published: (2017)
Time Blindness: Why Video-Language Models Can't See What Humans Can?
by: Upadhyay, Ujjwal, et al.
Published: (2025)
by: Upadhyay, Ujjwal, et al.
Published: (2025)
Distribution-Aware Feature Selection for SAEs
by: Oozeer, Narmeen, et al.
Published: (2025)
by: Oozeer, Narmeen, et al.
Published: (2025)
ESGSenticNet: A Neurosymbolic Knowledge Base for Corporate Sustainability Analysis
by: Ong, Keane, et al.
Published: (2025)
by: Ong, Keane, et al.
Published: (2025)
"I Can't Keep Up": Accessibility Barriers in Video-Based Learning for Individuals with Borderline Intellectual Functioning
by: Chu, Hyehyun, et al.
Published: (2026)
by: Chu, Hyehyun, et al.
Published: (2026)
Lord I'm Coming Home
by: Forrest, John
Published: (2023)
by: Forrest, John
Published: (2023)
Look at the State I'm In! In the Library.
by: Hurst, Carol Otis
Published: (1995)
by: Hurst, Carol Otis
Published: (1995)
I'm a Media Freak
by: Immroth, John Phillip
Published: (1971)
by: Immroth, John Phillip
Published: (1971)
Why Can't I Ever Find Anything in the Library?
by: Radford, Neil, et al.
Published: (1983)
by: Radford, Neil, et al.
Published: (1983)
Why I Can't Create a Learning Center
by: Miller, Rosalind
Published: (1975)
by: Miller, Rosalind
Published: (1975)
I'm brown and I'm bright: Using collective storying to disrupt the white‐centering of successful girlhood
by: Eunice Gaerlan, et al.
Published: (2024)
by: Eunice Gaerlan, et al.
Published: (2024)
Explain Like I'm Five: Using LLMs to Improve PDE Surrogate Models with Text
by: Lorsung, Cooper, et al.
Published: (2024)
by: Lorsung, Cooper, et al.
Published: (2024)
Similar Items
-
Understanding Refusal in Language Models with Sparse Autoencoders
by: Yeo, Wei Jie, et al.
Published: (2025) -
Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
by: Yeo, Wei Jie, et al.
Published: (2024) -
Mitigating Jailbreaks with Intent-Aware LLMs
by: Yeo, Wei Jie, et al.
Published: (2025) -
Plausible Extractive Rationalization through Semi-Supervised Entailment Signal
by: Yeo, Wei Jie, et al.
Published: (2024) -
Interpreting Bias in Large Language Models: A Feature-Based Approach
by: Prakash, Nirmalendu, et al.
Published: (2024)