Saved in:
Bibliographic Details
Main Authors: Baek, David D., Li, Yuxiao, Tegmark, Max
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.08255
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914167616700416
author Baek, David D.
Li, Yuxiao
Tegmark, Max
author_facet Baek, David D.
Li, Yuxiao
Tegmark, Max
contents Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To this end, we present two complementary experimental evidence that might support universality of graph representations. First, on an in-context genealogy Q&A task, we train a cone probe to isolate a tree-like subspace in residual stream activations and use activation patching to verify its causal effect in answering related questions. We validate our findings across five different models. Second, we conduct model stitching experiments across models of diverse architectures and parameter counts (OPT, Pythia, Mistral, and LLaMA, 410 million to 8 billion parameters), quantifying representational alignment via relative degradation in the next-token prediction loss. Generally, we conclude that the lack of ground truth representations of graphs makes it challenging to study how LLMs represent them. Ultimately, improving our understanding of LLM representations could facilitate the development of more interpretable, robust, and controllable AI systems.
format Preprint
id arxiv_https___arxiv_org_abs_2410_08255
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Investigating Representation Universality: Case Study on Genealogical Representations
Baek, David D.
Li, Yuxiao
Tegmark, Max
Machine Learning
Artificial Intelligence
Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To this end, we present two complementary experimental evidence that might support universality of graph representations. First, on an in-context genealogy Q&A task, we train a cone probe to isolate a tree-like subspace in residual stream activations and use activation patching to verify its causal effect in answering related questions. We validate our findings across five different models. Second, we conduct model stitching experiments across models of diverse architectures and parameter counts (OPT, Pythia, Mistral, and LLaMA, 410 million to 8 billion parameters), quantifying representational alignment via relative degradation in the next-token prediction loss. Generally, we conclude that the lack of ground truth representations of graphs makes it challenging to study how LLMs represent them. Ultimately, improving our understanding of LLM representations could facilitate the development of more interpretable, robust, and controllable AI systems.
title Investigating Representation Universality: Case Study on Genealogical Representations
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2410.08255