Saved in:
| Main Authors: | Nielsen, Beatrix M. G., Macocco, Iuri, Baroni, Marco |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.10201 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models
by: Macocco, Iuri, et al.
Published: (2025)
by: Macocco, Iuri, et al.
Published: (2025)
Tracing Computation Density in LLMs
by: Kervadec, Corentin, et al.
Published: (2026)
by: Kervadec, Corentin, et al.
Published: (2026)
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
by: Cheng, Emily, et al.
Published: (2024)
by: Cheng, Emily, et al.
Published: (2024)
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
by: Gemini Team, et al.
Published: (2024)
by: Gemini Team, et al.
Published: (2024)
AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions
by: Witold, Waligóra
Published: (2024)
by: Witold, Waligóra
Published: (2024)
Jacobian Scopes: token-level causal attributions in LLMs
by: Liu, Toni J. B., et al.
Published: (2026)
by: Liu, Toni J. B., et al.
Published: (2026)
MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)
Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer
by: Shrestha, Adarsha, et al.
Published: (2025)
by: Shrestha, Adarsha, et al.
Published: (2025)
Long-context LLMs Struggle with Long In-context Learning
by: Li, Tianle, et al.
Published: (2024)
by: Li, Tianle, et al.
Published: (2024)
Interpretable Next-token Prediction via the Generalized Induction Head
by: Kim, Eunji, et al.
Published: (2024)
by: Kim, Eunji, et al.
Published: (2024)
Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values
by: Zawistowski, Krystian
Published: (2024)
by: Zawistowski, Krystian
Published: (2024)
You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs
by: Xu, Yijie, et al.
Published: (2025)
by: Xu, Yijie, et al.
Published: (2025)
Looking beyond the next token
by: Thankaraj, Abitha, et al.
Published: (2025)
by: Thankaraj, Abitha, et al.
Published: (2025)
The pitfalls of next-token prediction
by: Bachmann, Gregor, et al.
Published: (2024)
by: Bachmann, Gregor, et al.
Published: (2024)
Evil twins are not that evil: Qualitative insights into machine-generated prompts
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)
by: Rakotonirina, Nathanaël Carraz, et al.
Published: (2024)
ThoughtSource: A central hub for large language model reasoning data
by: Ott, Simon, et al.
Published: (2023)
by: Ott, Simon, et al.
Published: (2023)
On the token distance modeling ability of higher RoPE attention dimension
by: Hong, Xiangyu, et al.
Published: (2024)
by: Hong, Xiangyu, et al.
Published: (2024)
Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives
by: Baker, Mohammed Abu, et al.
Published: (2026)
by: Baker, Mohammed Abu, et al.
Published: (2026)
Shaping capabilities with token-level data filtering
by: Rathi, Neil, et al.
Published: (2026)
by: Rathi, Neil, et al.
Published: (2026)
Scaled and Inter-token Relation Enhanced Transformer for Sample-restricted Residential NILM
by: Rahman, Minhajur, et al.
Published: (2024)
by: Rahman, Minhajur, et al.
Published: (2024)
Factors affecting the in-context learning abilities of LLMs for dialogue state tracking
by: Hegde, Pradyoth, et al.
Published: (2025)
by: Hegde, Pradyoth, et al.
Published: (2025)
No Need for Explanations: LLMs can implicitly learn from mistakes in-context
by: Alazraki, Lisa, et al.
Published: (2025)
by: Alazraki, Lisa, et al.
Published: (2025)
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
by: Jiang, Ziyan, et al.
Published: (2024)
by: Jiang, Ziyan, et al.
Published: (2024)
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
by: Wang, Chonghua, et al.
Published: (2024)
by: Wang, Chonghua, et al.
Published: (2024)
Scaling Transformer to 1M tokens and beyond with RMT
by: Bulatov, Aydar, et al.
Published: (2023)
by: Bulatov, Aydar, et al.
Published: (2023)
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
by: Georgiou, Efthymios, et al.
Published: (2025)
by: Georgiou, Efthymios, et al.
Published: (2025)
SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding
by: Hou, Shuyang, et al.
Published: (2026)
by: Hou, Shuyang, et al.
Published: (2026)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection
by: Miralles-González, Pablo, et al.
Published: (2025)
by: Miralles-González, Pablo, et al.
Published: (2025)
Language models are better than humans at next-token prediction
by: Shlegeris, Buck, et al.
Published: (2022)
by: Shlegeris, Buck, et al.
Published: (2022)
A Decomposition Perspective to Long-context Reasoning for LLMs
by: Xiao, Yanling, et al.
Published: (2026)
by: Xiao, Yanling, et al.
Published: (2026)
Meaning Is Not A Metric: Using LLMs to make cultural context legible at scale
by: Kommers, Cody, et al.
Published: (2025)
by: Kommers, Cody, et al.
Published: (2025)
Do LLMs Dream of Ontologies?
by: Bombieri, Marco, et al.
Published: (2024)
by: Bombieri, Marco, et al.
Published: (2024)
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
by: Russell, Jenna, et al.
Published: (2025)
by: Russell, Jenna, et al.
Published: (2025)
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens
by: Kwek, Eugene, et al.
Published: (2025)
by: Kwek, Eugene, et al.
Published: (2025)
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs
by: Chen, Ruxiao, et al.
Published: (2025)
by: Chen, Ruxiao, et al.
Published: (2025)
Essential-Web v1.0: 24T tokens of organized web data
by: AI, Essential, et al.
Published: (2025)
by: AI, Essential, et al.
Published: (2025)
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
by: Marconato, Emanuele, et al.
Published: (2024)
by: Marconato, Emanuele, et al.
Published: (2024)
Redefining "Hallucination" in LLMs: Towards a psychology-informed framework for mitigating misinformation
by: Berberette, Elijah, et al.
Published: (2024)
by: Berberette, Elijah, et al.
Published: (2024)
Harnessing LLMs for Educational Content-Driven Italian Crossword Generation
by: Zeinalipour, Kamyar, et al.
Published: (2024)
by: Zeinalipour, Kamyar, et al.
Published: (2024)
TECP: Token-Entropy Conformal Prediction for LLMs
by: Xu, Beining, et al.
Published: (2025)
by: Xu, Beining, et al.
Published: (2025)
Similar Items
-
Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models
by: Macocco, Iuri, et al.
Published: (2025) -
Tracing Computation Density in LLMs
by: Kervadec, Corentin, et al.
Published: (2026) -
Emergence of a High-Dimensional Abstraction Phase in Language Transformers
by: Cheng, Emily, et al.
Published: (2024) -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
by: Gemini Team, et al.
Published: (2024) -
AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions
by: Witold, Waligóra
Published: (2024)