Saved in:
| Main Author: | Rozental, Alon |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.21768 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
by: Levy, Mosh, et al.
Published: (2024)
by: Levy, Mosh, et al.
Published: (2024)
Probabilistic Token Alignment for Large Language Model Fusion
by: Zeng, Runjia, et al.
Published: (2025)
by: Zeng, Runjia, et al.
Published: (2025)
Certain but not Probable? Differentiating Certainty from Probability in LLM Token Outputs for Probabilistic Scenarios
by: Toney-Wails, Autumn, et al.
Published: (2025)
by: Toney-Wails, Autumn, et al.
Published: (2025)
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026)
by: Huang, Yuxiang, et al.
Published: (2026)
One Token Is Enough: Improving Diffusion Language Models with a Sink Token
by: Zhang, Zihou, et al.
Published: (2026)
by: Zhang, Zihou, et al.
Published: (2026)
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
by: Wang, Zekun, et al.
Published: (2023)
by: Wang, Zekun, et al.
Published: (2023)
Attention Sinks in Diffusion Language Models
by: Rulli, Maximo Eduardo, et al.
Published: (2025)
by: Rulli, Maximo Eduardo, et al.
Published: (2025)
Impact of Preference Noise on the Alignment Performance of Generative Language Models
by: Gao, Yang, et al.
Published: (2024)
by: Gao, Yang, et al.
Published: (2024)
Probabilistic Lexical Manifold Construction in Large Language Models via Hierarchical Vector Field Interpolation
by: Pendleton, Clive, et al.
Published: (2025)
by: Pendleton, Clive, et al.
Published: (2025)
HieraVid: Hierarchical Token Pruning for Fast Video Large Language Models
by: Guo, Yansong, et al.
Published: (2026)
by: Guo, Yansong, et al.
Published: (2026)
Multi-Token Attention
by: Golovneva, Olga, et al.
Published: (2025)
by: Golovneva, Olga, et al.
Published: (2025)
DiffuMask: Diffusion Language Model for Token-level Prompt Pruning
by: Zheng, Caleb, et al.
Published: (2026)
by: Zheng, Caleb, et al.
Published: (2026)
SparseD: Sparse Attention for Diffusion Language Models
by: Wang, Zeqing, et al.
Published: (2025)
by: Wang, Zeqing, et al.
Published: (2025)
Multimodal Latent Language Modeling with Next-Token Diffusion
by: Sun, Yutao, et al.
Published: (2024)
by: Sun, Yutao, et al.
Published: (2024)
Differentially Private Next-Token Prediction of Large Language Models
by: Flemings, James, et al.
Published: (2024)
by: Flemings, James, et al.
Published: (2024)
H-Net++: Hierarchical Dynamic Chunking for Tokenizer-Free Language Modelling in Morphologically-Rich Languages
by: Zakershahrak, Mehrdad, et al.
Published: (2025)
by: Zakershahrak, Mehrdad, et al.
Published: (2025)
Attention-Based Sampler for Diffusion Language Models
by: Zhou, Yuyan, et al.
Published: (2026)
by: Zhou, Yuyan, et al.
Published: (2026)
Rethinking Token Prediction: Tree-Structured Diffusion Language Model
by: Wu, Zihao, et al.
Published: (2026)
by: Wu, Zihao, et al.
Published: (2026)
Targeted Remasking: Replacing Token Editing with Token-to-Mask Refinement in Discrete Diffusion Language Models
by: Yao, Lin
Published: (2026)
by: Yao, Lin
Published: (2026)
Remask, Don't Replace: Token-to-Mask Refinement in Diffusion Large Language Models
by: Yao, Lin
Published: (2026)
by: Yao, Lin
Published: (2026)
MBTSAD: Mitigating Backdoors in Language Models Based on Token Splitting and Attention Distillation
by: Ding, Yidong, et al.
Published: (2025)
by: Ding, Yidong, et al.
Published: (2025)
Adaptive Targeted Dynamic Chunking for Tokenization-Free Hierarchical Model
by: Dang, Thang, et al.
Published: (2026)
by: Dang, Thang, et al.
Published: (2026)
CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling
by: Singaravadivelan, Karthik, et al.
Published: (2026)
by: Singaravadivelan, Karthik, et al.
Published: (2026)
PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model
by: Arif, Kazi Hasan Ibn, et al.
Published: (2025)
by: Arif, Kazi Hasan Ibn, et al.
Published: (2025)
Problematic Tokens: Tokenizer Bias in Large Language Models
by: Yang, Jin, et al.
Published: (2024)
by: Yang, Jin, et al.
Published: (2024)
Token Perturbation Guidance for Diffusion Models
by: Rajabi, Javad, et al.
Published: (2025)
by: Rajabi, Javad, et al.
Published: (2025)
Just on Time: Token-Level Early Stopping for Diffusion Language Models
by: Kohut, Zahar, et al.
Published: (2026)
by: Kohut, Zahar, et al.
Published: (2026)
Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models
by: Zhong, Linhao, et al.
Published: (2026)
by: Zhong, Linhao, et al.
Published: (2026)
HIGHT: Hierarchical Graph Tokenization for Molecule-Language Alignment
by: Chen, Yongqiang, et al.
Published: (2024)
by: Chen, Yongqiang, et al.
Published: (2024)
EntropyCache: Decoded Token Entropy Guided KV Caching for Diffusion Language Models
by: Cheong, Minsoo, et al.
Published: (2026)
by: Cheong, Minsoo, et al.
Published: (2026)
Scaling Optimal LR Across Token Horizons
by: Bjorck, Johan, et al.
Published: (2024)
by: Bjorck, Johan, et al.
Published: (2024)
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models
by: Yang, Jinbiao
Published: (2024)
by: Yang, Jinbiao
Published: (2024)
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
by: Thareja, Rushil, et al.
Published: (2025)
by: Thareja, Rushil, et al.
Published: (2025)
The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models
by: Sun, Bohang, et al.
Published: (2026)
by: Sun, Bohang, et al.
Published: (2026)
Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs
by: Wagner, Eitan, et al.
Published: (2025)
by: Wagner, Eitan, et al.
Published: (2025)
Blockwise SFT for Diffusion Language Models: Reconciling Bidirectional Attention and Autoregressive Decoding
by: Sun, Bowen, et al.
Published: (2025)
by: Sun, Bowen, et al.
Published: (2025)
Unifying Linear-Time Attention via Latent Probabilistic Modelling
by: Dolga, Rares, et al.
Published: (2024)
by: Dolga, Rares, et al.
Published: (2024)
Improving Self Consistency in LLMs through Probabilistic Tokenization
by: Sathe, Ashutosh, et al.
Published: (2024)
by: Sathe, Ashutosh, et al.
Published: (2024)
A Hierarchical Probabilistic Framework for Incremental Knowledge Tracing in Classroom Settings
by: Gao, Xinyi, et al.
Published: (2025)
by: Gao, Xinyi, et al.
Published: (2025)
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning
by: Ye, Ziang, et al.
Published: (2024)
by: Ye, Ziang, et al.
Published: (2024)
Similar Items
-
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
by: Levy, Mosh, et al.
Published: (2024) -
Probabilistic Token Alignment for Large Language Model Fusion
by: Zeng, Runjia, et al.
Published: (2025) -
Certain but not Probable? Differentiating Certainty from Probability in LLM Token Outputs for Probabilistic Scenarios
by: Toney-Wails, Autumn, et al.
Published: (2025) -
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
by: Huang, Yuxiang, et al.
Published: (2026) -
One Token Is Enough: Improving Diffusion Language Models with a Sink Token
by: Zhang, Zihou, et al.
Published: (2026)