Saved in:
| Main Authors: | Mehra, Somesh, Garcia, Javier Alonso, Mauch, Lukas |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.09419 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The pitfalls of next-token prediction
by: Bachmann, Gregor, et al.
Published: (2024)
by: Bachmann, Gregor, et al.
Published: (2024)
Language models are better than humans at next-token prediction
by: Shlegeris, Buck, et al.
Published: (2022)
by: Shlegeris, Buck, et al.
Published: (2022)
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters
by: Choudhary, Anand, et al.
Published: (2025)
by: Choudhary, Anand, et al.
Published: (2025)
Where is the signal in tokenization space?
by: Geh, Renato Lui, et al.
Published: (2024)
by: Geh, Renato Lui, et al.
Published: (2024)
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
by: Kim, Taesu, et al.
Published: (2024)
by: Kim, Taesu, et al.
Published: (2024)
Is Sanskrit the most token-efficient language? A quantitative study using GPT, Gemini, and SentencePiece
by: Kumar, Anshul
Published: (2026)
by: Kumar, Anshul
Published: (2026)
Looking beyond the next token
by: Thankaraj, Abitha, et al.
Published: (2025)
by: Thankaraj, Abitha, et al.
Published: (2025)
Visualizing token importance for black-box language models
by: Rauba, Paulius, et al.
Published: (2025)
by: Rauba, Paulius, et al.
Published: (2025)
Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs
by: Singh, Aaditya K., et al.
Published: (2024)
by: Singh, Aaditya K., et al.
Published: (2024)
Do language models plan ahead for future tokens?
by: Wu, Wilson, et al.
Published: (2024)
by: Wu, Wilson, et al.
Published: (2024)
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
by: Nagarajan, Vaishnavh, et al.
Published: (2025)
by: Nagarajan, Vaishnavh, et al.
Published: (2025)
Statistical multi-metric evaluation and visualization of LLM system predictive performance
by: Ackerman, Samuel, et al.
Published: (2025)
by: Ackerman, Samuel, et al.
Published: (2025)
Global-Order GFlowNets
by: Pastor-Pérez, Lluís, et al.
Published: (2025)
by: Pastor-Pérez, Lluís, et al.
Published: (2025)
Byte-token Enhanced Language Models for Temporal Point Processes Analysis
by: Kong, Quyu, et al.
Published: (2025)
by: Kong, Quyu, et al.
Published: (2025)
Shaping capabilities with token-level data filtering
by: Rathi, Neil, et al.
Published: (2026)
by: Rathi, Neil, et al.
Published: (2026)
Scaling Transformer to 1M tokens and beyond with RMT
by: Bulatov, Aydar, et al.
Published: (2023)
by: Bulatov, Aydar, et al.
Published: (2023)
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)
by: Zhao, Yize, et al.
Published: (2024)
Interpretable Next-token Prediction via the Generalized Induction Head
by: Kim, Eunji, et al.
Published: (2024)
by: Kim, Eunji, et al.
Published: (2024)
Sample-efficient LLM Optimization with Reset Replay
by: Liu, Zichuan, et al.
Published: (2025)
by: Liu, Zichuan, et al.
Published: (2025)
AtteSTNet -- An attention and subword tokenization based approach for code-switched text hate speech detection
by: Shingi, Geet, et al.
Published: (2021)
by: Shingi, Geet, et al.
Published: (2021)
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
by: Kwek, Eugene, et al.
Published: (2025)
by: Kwek, Eugene, et al.
Published: (2025)
PreFT: Prefill-only finetuning for efficient inference
by: Lanpouthakoun, Andrew, et al.
Published: (2026)
by: Lanpouthakoun, Andrew, et al.
Published: (2026)
Essential-Web v1.0: 24T tokens of organized web data
by: AI, Essential, et al.
Published: (2025)
by: AI, Essential, et al.
Published: (2025)
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
by: Marconato, Emanuele, et al.
Published: (2024)
by: Marconato, Emanuele, et al.
Published: (2024)
Assessing LLM Text Detection in Educational Contexts: Does Human Contribution Affect Detection?
by: Gehring, Lukas, et al.
Published: (2025)
by: Gehring, Lukas, et al.
Published: (2025)
AtP*: An efficient and scalable method for localizing LLM behaviour to components
by: Kramár, János, et al.
Published: (2024)
by: Kramár, János, et al.
Published: (2024)
You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs
by: Xu, Yijie, et al.
Published: (2025)
by: Xu, Yijie, et al.
Published: (2025)
Exploring space efficiency in a tree-based linear model for extreme multi-label classification
by: Lin, He-Zhe, et al.
Published: (2024)
by: Lin, He-Zhe, et al.
Published: (2024)
Transformers for molecular property prediction: Domain adaptation efficiently improves performance
by: Sultan, Afnan, et al.
Published: (2025)
by: Sultan, Afnan, et al.
Published: (2025)
Publicly-Detectable Watermarking for Language Models
by: Fairoze, Jaiden, et al.
Published: (2023)
by: Fairoze, Jaiden, et al.
Published: (2023)
LLM-based feature generation from text for interpretable machine learning
by: Balek, Vojtěch, et al.
Published: (2024)
by: Balek, Vojtěch, et al.
Published: (2024)
Are LLM-based methods good enough for detecting unfair terms of service?
by: Frasheri, Mirgita, et al.
Published: (2024)
by: Frasheri, Mirgita, et al.
Published: (2024)
OrthoRank: Token Selection via Sink Token Orthogonality for Efficient LLM inference
by: Shin, Seungjun, et al.
Published: (2025)
by: Shin, Seungjun, et al.
Published: (2025)
Amortizing intractable inference in large language models
by: Hu, Edward J., et al.
Published: (2023)
by: Hu, Edward J., et al.
Published: (2023)
POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation
by: Qiu, Zeju, et al.
Published: (2026)
by: Qiu, Zeju, et al.
Published: (2026)
Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thought
by: Zhao, Xinghao
Published: (2026)
by: Zhao, Xinghao
Published: (2026)
Order-Preserving GFlowNets
by: Chen, Yihang, et al.
Published: (2023)
by: Chen, Yihang, et al.
Published: (2023)
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance
by: Choi, Jihye, et al.
Published: (2024)
by: Choi, Jihye, et al.
Published: (2024)
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
by: Tan, Qitao, et al.
Published: (2025)
by: Tan, Qitao, et al.
Published: (2025)
Kanana: Compute-efficient Bilingual Language Models
by: Kanana LLM Team, et al.
Published: (2025)
by: Kanana LLM Team, et al.
Published: (2025)
Similar Items
-
The pitfalls of next-token prediction
by: Bachmann, Gregor, et al.
Published: (2024) -
Language models are better than humans at next-token prediction
by: Shlegeris, Buck, et al.
Published: (2022) -
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters
by: Choudhary, Anand, et al.
Published: (2025) -
Where is the signal in tokenization space?
by: Geh, Renato Lui, et al.
Published: (2024) -
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
by: Kim, Taesu, et al.
Published: (2024)