Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Forrester, Chris, Sulea, Octavia
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.08058
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915287739138048
author	Forrester, Chris Sulea, Octavia
author_facet	Forrester, Chris Sulea, Octavia
contents	Compute optimization using token reduction of LLM prompts is an emerging task in the fields of NLP and next generation, agentic AI. In this white paper, we introduce a novel (patent pending) text representation scheme and a first-of-its-kind word-level semantic compression of paragraphs that can lead to over 90% token reduction, while retaining high semantic similarity to the source text. We explain how this novel compression technique can be lossless and how the detail granularity is controllable. We discuss benchmark results over open source data (i.e. Bram Stoker's Dracula available through Project Gutenberg) and show how our results hold at the paragraph level, across multiple genres and models.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_08058
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method Forrester, Chris Sulea, Octavia Computation and Language Compute optimization using token reduction of LLM prompts is an emerging task in the fields of NLP and next generation, agentic AI. In this white paper, we introduce a novel (patent pending) text representation scheme and a first-of-its-kind word-level semantic compression of paragraphs that can lead to over 90% token reduction, while retaining high semantic similarity to the source text. We explain how this novel compression technique can be lossless and how the detail granularity is controllable. We discuss benchmark results over open source data (i.e. Bram Stoker's Dracula available through Project Gutenberg) and show how our results hold at the paragraph level, across multiple genres and models.
title	Hypernym Mercury: Token Optimization Through Semantic Field Constriction And Reconstruction From Hypernyms. A New Text Compression Method
topic	Computation and Language
url	https://arxiv.org/abs/2505.08058

Similar Items