Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Baek, David D., Li, Yuxiao, Tegmark, Max
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.08255
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914167616700416
author	Baek, David D. Li, Yuxiao Tegmark, Max
author_facet	Baek, David D. Li, Yuxiao Tegmark, Max
contents	Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To this end, we present two complementary experimental evidence that might support universality of graph representations. First, on an in-context genealogy Q&A task, we train a cone probe to isolate a tree-like subspace in residual stream activations and use activation patching to verify its causal effect in answering related questions. We validate our findings across five different models. Second, we conduct model stitching experiments across models of diverse architectures and parameter counts (OPT, Pythia, Mistral, and LLaMA, 410 million to 8 billion parameters), quantifying representational alignment via relative degradation in the next-token prediction loss. Generally, we conclude that the lack of ground truth representations of graphs makes it challenging to study how LLMs represent them. Ultimately, improving our understanding of LLM representations could facilitate the development of more interpretable, robust, and controllable AI systems.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_08255
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Investigating Representation Universality: Case Study on Genealogical Representations Baek, David D. Li, Yuxiao Tegmark, Max Machine Learning Artificial Intelligence Motivated by interpretability and reliability, we investigate whether large language models (LLMs) deploy universal geometric structures to encode discrete, graph-structured knowledge. To this end, we present two complementary experimental evidence that might support universality of graph representations. First, on an in-context genealogy Q&A task, we train a cone probe to isolate a tree-like subspace in residual stream activations and use activation patching to verify its causal effect in answering related questions. We validate our findings across five different models. Second, we conduct model stitching experiments across models of diverse architectures and parameter counts (OPT, Pythia, Mistral, and LLaMA, 410 million to 8 billion parameters), quantifying representational alignment via relative degradation in the next-token prediction loss. Generally, we conclude that the lack of ground truth representations of graphs makes it challenging to study how LLMs represent them. Ultimately, improving our understanding of LLM representations could facilitate the development of more interpretable, robust, and controllable AI systems.
title	Investigating Representation Universality: Case Study on Genealogical Representations
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2410.08255

Similar Items