Saved in:
| Main Authors: | Zhao, Yize, Thrampoulidis, Christos |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.08348 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)
by: Zhao, Yize, et al.
Published: (2024)
Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024)
by: Thrampoulidis, Christos
Published: (2024)
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
by: Zhao, Yize, et al.
Published: (2026)
by: Zhao, Yize, et al.
Published: (2026)
In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025)
by: Deora, Puneesh, et al.
Published: (2025)
NITP: Next Implicit Token Prediction for LLM Pre-training
by: Zhang, Xiangdong, et al.
Published: (2026)
by: Zhang, Xiangdong, et al.
Published: (2026)
Short-Context Dominance: How Much Local Context Natural Language Actually Needs?
by: Vakilian, Vala, et al.
Published: (2025)
by: Vakilian, Vala, et al.
Published: (2025)
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025)
by: Behnia, Tina, et al.
Published: (2025)
Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)
by: Vasudeva, Bhavya, et al.
Published: (2026)
On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
by: Viswanathan, Karthik, et al.
Published: (2025)
by: Viswanathan, Karthik, et al.
Published: (2025)
Scale Determines Whether Language Models Organize Representation Geometry for Prediction
by: Xu, Weilun
Published: (2026)
by: Xu, Weilun
Published: (2026)
LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)
by: Deng, Wenlong, et al.
Published: (2024)
Cautious Next Token Prediction
by: Wang, Yizhou, et al.
Published: (2025)
by: Wang, Yizhou, et al.
Published: (2025)
How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
by: Vasudeva, Bhavya, et al.
Published: (2025)
by: Vasudeva, Bhavya, et al.
Published: (2025)
Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms
by: Trauger, Jacob, et al.
Published: (2025)
by: Trauger, Jacob, et al.
Published: (2025)
Reasoning Bias of Next Token Prediction Training
by: Lin, Pengxiao, et al.
Published: (2025)
by: Lin, Pengxiao, et al.
Published: (2025)
ENTP: Encoder-only Next Token Prediction
by: Ewer, Ethan, et al.
Published: (2024)
by: Ewer, Ethan, et al.
Published: (2024)
Token Prediction as Implicit Classification to Identify LLM-Generated Text
by: Chen, Yutian, et al.
Published: (2023)
by: Chen, Yutian, et al.
Published: (2023)
Diversity or Precision? A Deep Dive into Next Token Prediction
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
How Language Directions Align with Token Geometry in Multilingual LLMs
by: Kim, JaeSeong, et al.
Published: (2025)
by: Kim, JaeSeong, et al.
Published: (2025)
The Geometry of Tokens in Internal Representations of Large Language Models
by: Viswanathan, Karthik, et al.
Published: (2025)
by: Viswanathan, Karthik, et al.
Published: (2025)
Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension
by: Qi, Mengnan, et al.
Published: (2024)
by: Qi, Mengnan, et al.
Published: (2024)
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens
by: He, Yinhan, et al.
Published: (2025)
by: He, Yinhan, et al.
Published: (2025)
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)
by: Mahdavi, Sadegh, et al.
Published: (2025)
Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content
by: Münker, Simon, et al.
Published: (2026)
by: Münker, Simon, et al.
Published: (2026)
Alternatives To Next Token Prediction In Text Generation -- A Survey
by: Wyatt, Charlie, et al.
Published: (2025)
by: Wyatt, Charlie, et al.
Published: (2025)
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)
by: Deng, Wenlong, et al.
Published: (2026)
Fractal Patterns May Illuminate the Success of Next-Token Prediction
by: Alabdulmohsin, Ibrahim, et al.
Published: (2024)
by: Alabdulmohsin, Ibrahim, et al.
Published: (2024)
Breaking Token Into Concepts: Exploring Extreme Compression in Token Representation Via Compositional Shared Semantics
by: R V, Kavin, et al.
Published: (2025)
by: R V, Kavin, et al.
Published: (2025)
Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication
by: Tarau, Paul
Published: (2026)
by: Tarau, Paul
Published: (2026)
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)
by: Li, Yingcong, et al.
Published: (2024)
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
by: Qian, Junlang, et al.
Published: (2025)
by: Qian, Junlang, et al.
Published: (2025)
YNTP-100: A Benchmark for Your Next Token Prediction with 100 People
by: Ding, Shiyao, et al.
Published: (2025)
by: Ding, Shiyao, et al.
Published: (2025)
One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic Representations
by: Oda, Kohei, et al.
Published: (2025)
by: Oda, Kohei, et al.
Published: (2025)
Using Model-Theoretic Approaches to Uncover Linguistic Organization
by: Griffin, Olivia, et al.
Published: (2024)
by: Griffin, Olivia, et al.
Published: (2024)
Similar Items
-
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024) -
Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024) -
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024) -
Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
by: Zhao, Yize, et al.
Published: (2026) -
In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025)