Saved in:
| Main Authors: | Forrester, Chris, Sulea, Octavia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.08058 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models
by: Tikhomirov, Mikhail, et al.
Published: (2024)
by: Tikhomirov, Mikhail, et al.
Published: (2024)
SHADE: Semantic Hypernym Annotator for Domain-specific Entities -- DnD Domain Use Case
by: Peiris, Akila, et al.
Published: (2024)
by: Peiris, Akila, et al.
Published: (2024)
Inferring Adjective Hypernyms with Language Models to Increase the Connectivity of Open English Wordnet
by: Augello, Lorenzo, et al.
Published: (2025)
by: Augello, Lorenzo, et al.
Published: (2025)
HyperBox: A Supervised Approach for Hypernym Discovery using Box Embeddings
by: Parmar, Maulik, et al.
Published: (2022)
by: Parmar, Maulik, et al.
Published: (2022)
Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy
by: Malashin, Roman, et al.
Published: (2025)
by: Malashin, Roman, et al.
Published: (2025)
On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction
by: Bondarenko, Ivan, et al.
Published: (2026)
by: Bondarenko, Ivan, et al.
Published: (2026)
Beyond Text Compression: Evaluating Tokenizers Across Scales
by: Lotz, Jonas F., et al.
Published: (2025)
by: Lotz, Jonas F., et al.
Published: (2025)
Breaking Token Into Concepts: Exploring Extreme Compression in Token Representation Via Compositional Shared Semantics
by: R V, Kavin, et al.
Published: (2025)
by: R V, Kavin, et al.
Published: (2025)
Text-Preserving Lossy Text Compression: A Study of Strategic Deletion and LLM Reconstruction
by: Zou, Yuchun, et al.
Published: (2026)
by: Zou, Yuchun, et al.
Published: (2026)
KVReviver: Reversible KV Cache Compression with Sketch-Based Token Reconstruction
by: Yuan, Aomufei, et al.
Published: (2025)
by: Yuan, Aomufei, et al.
Published: (2025)
Tokenization Is More Than Compression
by: Schmidt, Craig W., et al.
Published: (2024)
by: Schmidt, Craig W., et al.
Published: (2024)
See the Text: From Tokenization to Visual Reading
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
Greed is All You Need: An Evaluation of Tokenizer Inference Methods
by: Uzan, Omri, et al.
Published: (2024)
by: Uzan, Omri, et al.
Published: (2024)
ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine Translation
by: Dai, Shaojie, et al.
Published: (2024)
by: Dai, Shaojie, et al.
Published: (2024)
Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP
by: Kim, Eunji, et al.
Published: (2024)
by: Kim, Eunji, et al.
Published: (2024)
From Token to Token Pair: Efficient Prompt Compression for Large Language Models in Clinical Prediction
by: Zhu, Mingcheng, et al.
Published: (2026)
by: Zhu, Mingcheng, et al.
Published: (2026)
Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance
by: Goldman, Omer, et al.
Published: (2024)
by: Goldman, Omer, et al.
Published: (2024)
Frequency-Ordered Tokenization for Better Text Compression
by: Kalcher, Maximilian
Published: (2026)
by: Kalcher, Maximilian
Published: (2026)
CompressKV: Semantic Retrieval Heads Know What Tokens are Not Important Before Generation
by: Lin, Xiaolin, et al.
Published: (2025)
by: Lin, Xiaolin, et al.
Published: (2025)
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction
by: Mao, Yu, et al.
Published: (2025)
by: Mao, Yu, et al.
Published: (2025)
zip2zip: Inference-Time Adaptive Tokenization via Online Compression
by: Geng, Saibo, et al.
Published: (2025)
by: Geng, Saibo, et al.
Published: (2025)
Learning to Compress Prompts with Gist Tokens
by: Mu, Jesse, et al.
Published: (2023)
by: Mu, Jesse, et al.
Published: (2023)
LLM-Augmented Semantic Steering of Text Embedding Projection Spaces
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
Brain-CLIPLM: Decoding Compressed Semantic Representations in EEG for Language Reconstruction
by: Yang, Xiaoli, et al.
Published: (2026)
by: Yang, Xiaoli, et al.
Published: (2026)
SemanticZip: A Pilot Framework for Lossy Text Compression with LLMs as Semantic Decompressors
by: Trukhina, Natalia, et al.
Published: (2026)
by: Trukhina, Natalia, et al.
Published: (2026)
Faster Superword Tokenization
by: Schmidt, Craig W., et al.
Published: (2026)
by: Schmidt, Craig W., et al.
Published: (2026)
Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
by: Zhao, Yize, et al.
Published: (2025)
by: Zhao, Yize, et al.
Published: (2025)
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
by: Shani, Chen, et al.
Published: (2025)
by: Shani, Chen, et al.
Published: (2025)
Multi-word Tokenization for Sequence Compression
by: Gee, Leonidas, et al.
Published: (2024)
by: Gee, Leonidas, et al.
Published: (2024)
Morphologically-Informed Tokenizers for Languages with Non-Concatenative Morphology: A case study of Yoloxóchtil Mixtec ASR
by: Crawford, Chris
Published: (2025)
by: Crawford, Chris
Published: (2025)
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation
by: Moroni, Luca, et al.
Published: (2025)
by: Moroni, Luca, et al.
Published: (2025)
Text2Token: Unsupervised Text Representation Learning with Token Target Prediction
by: An, Ruize, et al.
Published: (2025)
by: An, Ruize, et al.
Published: (2025)
Lossless Token Sequence Compression via Meta-Tokens
by: Harvill, John, et al.
Published: (2025)
by: Harvill, John, et al.
Published: (2025)
From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution
by: Chizhov, Pavel, et al.
Published: (2026)
by: Chizhov, Pavel, et al.
Published: (2026)
StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs
by: Song, Yuhan, et al.
Published: (2025)
by: Song, Yuhan, et al.
Published: (2025)
Scaling Open Discrete Audio Foundation Models with Interleaved Semantic, Acoustic, and Text Tokens
by: Manakul, Potsawee, et al.
Published: (2026)
by: Manakul, Potsawee, et al.
Published: (2026)
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens
by: Nie, Zhijie, et al.
Published: (2024)
by: Nie, Zhijie, et al.
Published: (2024)
More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression
by: Zhang, Jiebin, et al.
Published: (2024)
by: Zhang, Jiebin, et al.
Published: (2024)
Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation
by: Belikova, Julia, et al.
Published: (2026)
by: Belikova, Julia, et al.
Published: (2026)
Text Compression for Efficient Language Generation
by: Gu, David, et al.
Published: (2025)
by: Gu, David, et al.
Published: (2025)
Similar Items
-
Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models
by: Tikhomirov, Mikhail, et al.
Published: (2024) -
SHADE: Semantic Hypernym Annotator for Domain-specific Entities -- DnD Domain Use Case
by: Peiris, Akila, et al.
Published: (2024) -
Inferring Adjective Hypernyms with Language Models to Increase the Connectivity of Open English Wordnet
by: Augello, Lorenzo, et al.
Published: (2025) -
HyperBox: A Supervised Approach for Hypernym Discovery using Box Embeddings
by: Parmar, Maulik, et al.
Published: (2022) -
Hypernym Bias: Unraveling Deep Classifier Training Dynamics through the Lens of Class Hierarchy
by: Malashin, Roman, et al.
Published: (2025)