Saved in:
Bibliographic Details
Main Authors: Hewitt, John, Geirhos, Robert, Kim, Been
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2502.07586
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915146180329472
author Hewitt, John
Geirhos, Robert
Kim, Been
author_facet Hewitt, John
Geirhos, Robert
Kim, Been
contents This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.
format Preprint
id arxiv_https___arxiv_org_abs_2502_07586
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle We Can't Understand AI Using our Existing Vocabulary
Hewitt, John
Geirhos, Robert
Kim, Been
Computation and Language
Artificial Intelligence
This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.
title We Can't Understand AI Using our Existing Vocabulary
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2502.07586