:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yize, Thrampoulidis, Christos
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2505.08348
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)

Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024)

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)

Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
by: Zhao, Yize, et al.
Published: (2026)

In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025)

NITP: Next Implicit Token Prediction for LLM Pre-training
by: Zhang, Xiangdong, et al.
Published: (2026)

Short-Context Dominance: How Much Local Context Natural Language Actually Needs?
by: Vakilian, Vala, et al.
Published: (2025)

Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025)

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)

On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
by: Deng, Wenlong, et al.
Published: (2025)

Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
by: Deng, Wenlong, et al.
Published: (2025)

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
by: Deng, Wenlong, et al.
Published: (2025)

Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
by: Viswanathan, Karthik, et al.
Published: (2025)

Scale Determines Whether Language Models Organize Representation Geometry for Prediction
by: Xu, Weilun
Published: (2026)

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)

Cautious Next Token Prediction
by: Wang, Yizhou, et al.
Published: (2025)

How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
by: Vasudeva, Bhavya, et al.
Published: (2025)

Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)

On Next-Token Prediction in LLMs: How End Goals Determine the Consistency of Decoding Algorithms
by: Trauger, Jacob, et al.
Published: (2025)

Reasoning Bias of Next Token Prediction Training
by: Lin, Pengxiao, et al.
Published: (2025)

ENTP: Encoder-only Next Token Prediction
by: Ewer, Ethan, et al.
Published: (2024)

Token Prediction as Implicit Classification to Identify LLM-Generated Text
by: Chen, Yutian, et al.
Published: (2023)

Diversity or Precision? A Deep Dive into Next Token Prediction
by: Wu, Haoyuan, et al.
Published: (2025)

How Language Directions Align with Token Geometry in Multilingual LLMs
by: Kim, JaeSeong, et al.
Published: (2025)

The Geometry of Tokens in Internal Representations of Large Language Models
by: Viswanathan, Karthik, et al.
Published: (2025)

Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension
by: Qi, Mengnan, et al.
Published: (2024)

SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens
by: He, Yinhan, et al.
Published: (2025)

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)

Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content
by: Münker, Simon, et al.
Published: (2026)

Alternatives To Next Token Prediction In Text Generation -- A Survey
by: Wyatt, Charlie, et al.
Published: (2025)

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)

Fractal Patterns May Illuminate the Success of Next-Token Prediction
by: Alabdulmohsin, Ibrahim, et al.
Published: (2024)

Breaking Token Into Concepts: Exploring Extreme Compression in Token Representation Via Compositional Shared Semantics
by: R V, Kavin, et al.
Published: (2025)

Modeling Next-Token Prediction as Left-Nested Intuitionistic Implication
by: Tarau, Paul
Published: (2026)

For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
by: Deng, Wenlong, et al.
Published: (2025)

Mechanics of Next Token Prediction with Self-Attention
by: Li, Yingcong, et al.
Published: (2024)

Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
by: Qian, Junlang, et al.
Published: (2025)

YNTP-100: A Benchmark for Your Next Token Prediction with 100 People
by: Ding, Shiyao, et al.
Published: (2025)

One Sentence, Two Embeddings: Contrastive Learning of Explicit and Implicit Semantic Representations
by: Oda, Kohei, et al.
Published: (2025)

Using Model-Theoretic Approaches to Uncover Linguistic Organization
by: Griffin, Olivia, et al.
Published: (2024)