Saved in:
| Main Authors: | Aljaafari, Nura, Carvalho, Danilo S., Freitas, André |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.11827 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TRACE: Training and Inference-Time Interpretability Analysis for Language Models
by: Aljaafari, Nura, et al.
Published: (2025)
by: Aljaafari, Nura, et al.
Published: (2025)
Interpreting token compositionality in LLMs: A robustness analysis
by: Aljaafari, Nura, et al.
Published: (2024)
by: Aljaafari, Nura, et al.
Published: (2024)
Emergence and Localisation of Semantic Role Circuits in LLMs
by: Aljaafari, Nura, et al.
Published: (2025)
by: Aljaafari, Nura, et al.
Published: (2025)
CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment
by: Aljaafari, Nura, et al.
Published: (2025)
by: Aljaafari, Nura, et al.
Published: (2025)
TRACE for Tracking the Emergence of Semantic Representations in Transformers
by: Aljaafari, Nura, et al.
Published: (2025)
by: Aljaafari, Nura, et al.
Published: (2025)
Is Inference Mediated by Distinct Semantic Structures in LLMs? A Mechanistic Interpretation
by: Aljaafari, Nura, et al.
Published: (2026)
by: Aljaafari, Nura, et al.
Published: (2026)
From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
by: Aljaafari, Nura, et al.
Published: (2026)
by: Aljaafari, Nura, et al.
Published: (2026)
Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder
by: Zhang, Yingji, et al.
Published: (2025)
by: Zhang, Yingji, et al.
Published: (2025)
Quasi-symbolic Semantic Geometry over Transformer-based Variational AutoEncoder
by: Zhang, Yingji, et al.
Published: (2022)
by: Zhang, Yingji, et al.
Published: (2022)
Learning Disentangled Semantic Spaces of Explanations via Invertible Neural Networks
by: Zhang, Yingji, et al.
Published: (2023)
by: Zhang, Yingji, et al.
Published: (2023)
Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions
by: Valentino, Marco, et al.
Published: (2023)
by: Valentino, Marco, et al.
Published: (2023)
LangVAE and LangSpace: Building and Probing for Language Model VAEs
by: Carvalho, Danilo S., et al.
Published: (2025)
by: Carvalho, Danilo S., et al.
Published: (2025)
Learning to Disentangle Latent Reasoning Rules with Language VAEs: A Systematic Study
by: Zhang, Yingji, et al.
Published: (2025)
by: Zhang, Yingji, et al.
Published: (2025)
Reasoning Circuits in Language Models: A Mechanistic Interpretation of Syllogistic Inference
by: Kim, Geonhee, et al.
Published: (2024)
by: Kim, Geonhee, et al.
Published: (2024)
SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning
by: Wysocka, Magdalena, et al.
Published: (2024)
by: Wysocka, Magdalena, et al.
Published: (2024)
Towards Controllable Natural Language Inference through Lexical Inference Types
by: Zhang, Yingji, et al.
Published: (2023)
by: Zhang, Yingji, et al.
Published: (2023)
Inductive Learning of Logical Theories with LLMs: An Expressivity-Graded Analysis
by: Gandarela, João Pedro, et al.
Published: (2024)
by: Gandarela, João Pedro, et al.
Published: (2024)
Montague semantics and modifier consistency measurement in neural language models
by: Carvalho, Danilo S., et al.
Published: (2022)
by: Carvalho, Danilo S., et al.
Published: (2022)
Mechanistic Interpretability of GPT-like Models on Summarization Tasks
by: Mishra, Anurag
Published: (2025)
by: Mishra, Anurag
Published: (2025)
Improving Semantic Control in Discrete Latent Spaces with Transformer Quantized Variational Autoencoders
by: Zhang, Yingji, et al.
Published: (2024)
by: Zhang, Yingji, et al.
Published: (2024)
PEIRCE: Unifying Material and Formal Reasoning via LLM-Driven Neuro-Symbolic Refinement
by: Quan, Xin, et al.
Published: (2025)
by: Quan, Xin, et al.
Published: (2025)
Accelerating Antibiotic Discovery with Large Language Models and Knowledge Graphs
by: Delmas, Maxime, et al.
Published: (2025)
by: Delmas, Maxime, et al.
Published: (2025)
Linearly-Interpretable Concept Embedding Models for Text Analysis
by: De Santis, Francesco, et al.
Published: (2024)
by: De Santis, Francesco, et al.
Published: (2024)
Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization
by: Bidusa, Or Raphael, et al.
Published: (2025)
by: Bidusa, Or Raphael, et al.
Published: (2025)
Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis
by: Mofael, Abdullah Al, et al.
Published: (2026)
by: Mofael, Abdullah Al, et al.
Published: (2026)
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
by: Wu, Zhengxuan, et al.
Published: (2023)
by: Wu, Zhengxuan, et al.
Published: (2023)
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability
by: Koike, Ryuto, et al.
Published: (2025)
by: Koike, Ryuto, et al.
Published: (2025)
Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG
by: Vargas, Francielle, et al.
Published: (2026)
by: Vargas, Francielle, et al.
Published: (2026)
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)
by: Cai, Hengxing, et al.
Published: (2025)
Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis
by: Hatua, Amartya
Published: (2025)
by: Hatua, Amartya
Published: (2025)
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
by: Yu, Zeping, et al.
Published: (2024)
by: Yu, Zeping, et al.
Published: (2024)
Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models
by: Liu, Jiaxiang, et al.
Published: (2025)
by: Liu, Jiaxiang, et al.
Published: (2025)
From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP
by: Mosbach, Marius, et al.
Published: (2024)
by: Mosbach, Marius, et al.
Published: (2024)
Circuit Insights: Towards Interpretability Beyond Activations
by: Golimblevskaia, Elena, et al.
Published: (2025)
by: Golimblevskaia, Elena, et al.
Published: (2025)
Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric
by: Cao, Yixin, et al.
Published: (2025)
by: Cao, Yixin, et al.
Published: (2025)
An LLM-based Knowledge Synthesis and Scientific Reasoning Framework for Biomedical Discovery
by: Wysocki, Oskar, et al.
Published: (2024)
by: Wysocki, Oskar, et al.
Published: (2024)
Using Interpretation Methods for Model Enhancement
by: Chen, Zhuo, et al.
Published: (2024)
by: Chen, Zhuo, et al.
Published: (2024)
Toward Machine Interpreting: Lessons from Human Interpreting Studies
by: Sperber, Matthias, et al.
Published: (2025)
by: Sperber, Matthias, et al.
Published: (2025)
Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
by: Lv, Ang, et al.
Published: (2024)
by: Lv, Ang, et al.
Published: (2024)
GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text
by: Hamilton, Kyle, et al.
Published: (2024)
by: Hamilton, Kyle, et al.
Published: (2024)
Similar Items
-
TRACE: Training and Inference-Time Interpretability Analysis for Language Models
by: Aljaafari, Nura, et al.
Published: (2025) -
Interpreting token compositionality in LLMs: A robustness analysis
by: Aljaafari, Nura, et al.
Published: (2024) -
Emergence and Localisation of Semantic Role Circuits in LLMs
by: Aljaafari, Nura, et al.
Published: (2025) -
CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment
by: Aljaafari, Nura, et al.
Published: (2025) -
TRACE for Tracking the Emergence of Semantic Representations in Transformers
by: Aljaafari, Nura, et al.
Published: (2025)