Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hewitt, John, Geirhos, Robert, Kim, Been
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2502.07586
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915146180329472
author	Hewitt, John Geirhos, Robert Kim, Been
author_facet	Hewitt, John Geirhos, Robert Kim, Been
contents	This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_07586
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	We Can't Understand AI Using our Existing Vocabulary Hewitt, John Geirhos, Robert Kim, Been Computation and Language Artificial Intelligence This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.
title	We Can't Understand AI Using our Existing Vocabulary
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2502.07586

Similar Items