:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Aljaafari, Nura, Carvalho, Danilo S., Freitas, André
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2408.11827
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TRACE: Training and Inference-Time Interpretability Analysis for Language Models
by: Aljaafari, Nura, et al.
Published: (2025)

Interpreting token compositionality in LLMs: A robustness analysis
by: Aljaafari, Nura, et al.
Published: (2024)

Emergence and Localisation of Semantic Role Circuits in LLMs
by: Aljaafari, Nura, et al.
Published: (2025)

CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment
by: Aljaafari, Nura, et al.
Published: (2025)

TRACE for Tracking the Emergence of Semantic Representations in Transformers
by: Aljaafari, Nura, et al.
Published: (2025)

Is Inference Mediated by Distinct Semantic Structures in LLMs? A Mechanistic Interpretation
by: Aljaafari, Nura, et al.
Published: (2026)

From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
by: Aljaafari, Nura, et al.
Published: (2026)

Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder
by: Zhang, Yingji, et al.
Published: (2025)

Quasi-symbolic Semantic Geometry over Transformer-based Variational AutoEncoder
by: Zhang, Yingji, et al.
Published: (2022)

Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks
by: Zhang, Yingji, et al.
Published: (2023)

Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions
by: Valentino, Marco, et al.
Published: (2023)

LangVAE and LangSpace: Building and Probing for Language Model VAEs
by: Carvalho, Danilo S., et al.
Published: (2025)

Learning to Disentangle Latent Reasoning Rules with Language VAEs: A Systematic Study
by: Zhang, Yingji, et al.
Published: (2025)

Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
by: Kim, Geonhee, et al.
Published: (2024)

SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning
by: Wysocka, Magdalena, et al.
Published: (2024)

Towards Controllable Natural Language Inference through Lexical Inference Types
by: Zhang, Yingji, et al.
Published: (2023)

Inductive Learning of Logical Theories with LLMs: An Expressivity-Graded Analysis
by: Gandarela, João Pedro, et al.
Published: (2024)

Montague semantics and modifier consistency measurement in neural language models
by: Carvalho, Danilo S., et al.
Published: (2022)

Mechanistic Interpretability of GPT-like Models on Summarization Tasks
by: Mishra, Anurag
Published: (2025)

Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders
by: Zhang, Yingji, et al.
Published: (2024)

PEIRCE: Unifying Material and Formal Reasoning via LLM-Driven Neuro-Symbolic Refinement
by: Quan, Xin, et al.
Published: (2025)

Accelerating Antibiotic Discovery with Large Language Models and Knowledge Graphs
by: Delmas, Maxime, et al.
Published: (2025)

Linearly-Interpretable Concept Embedding Models for Text Analysis
by: De Santis, Francesco, et al.
Published: (2024)

Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization
by: Bidusa, Or Raphael, et al.
Published: (2025)

Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis
by: Mofael, Abdullah Al, et al.
Published: (2026)

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
by: Wu, Zhengxuan, et al.
Published: (2023)

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
by: Koike, Ryuto, et al.
Published: (2025)

Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG
by: Vargas, Francielle, et al.
Published: (2026)

FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)

Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis
by: Hatua, Amartya
Published: (2025)

Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
by: Yu, Zeping, et al.
Published: (2024)

Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)

From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
by: Mosbach, Marius, et al.
Published: (2024)

Circuit Insights: Towards Interpretability Beyond Activations
by: Golimblevskaia, Elena, et al.
Published: (2025)

Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
by: Cao, Yixin, et al.
Published: (2025)

An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery
by: Wysocki, Oskar, et al.
Published: (2024)

Using Interpretation Methods for Model Enhancement
by: Chen, Zhuo, et al.
Published: (2024)

Toward Machine Interpreting: Lessons from Human Interpreting Studies
by: Sperber, Matthias, et al.
Published: (2025)

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
by: Lv, Ang, et al.
Published: (2024)

GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text
by: Hamilton, Kyle, et al.
Published: (2024)