Saved in:
| Main Authors: | Sullivan, Michael, Koller, Alexander |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.29986 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025)
by: Sullivan, Michael, et al.
Published: (2025)
AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing
by: Thillainathan, Sarubi, et al.
Published: (2026)
by: Thillainathan, Sarubi, et al.
Published: (2026)
MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration
by: Liu, Xinyu, et al.
Published: (2026)
by: Liu, Xinyu, et al.
Published: (2026)
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
by: Hao, Yongchang, et al.
Published: (2026)
by: Hao, Yongchang, et al.
Published: (2026)
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
by: Nguyen, Tan Dat, et al.
Published: (2024)
by: Nguyen, Tan Dat, et al.
Published: (2024)
Direct Multi-Token Decoding
by: Luo, Xuan, et al.
Published: (2025)
by: Luo, Xuan, et al.
Published: (2025)
Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models
by: Xu, Zihao, et al.
Published: (2026)
by: Xu, Zihao, et al.
Published: (2026)
A*-Decoding: Token-Efficient Inference Scaling
by: Chatziveroglou, Giannis
Published: (2025)
by: Chatziveroglou, Giannis
Published: (2025)
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
by: Zeng, Ziqian, et al.
Published: (2024)
by: Zeng, Ziqian, et al.
Published: (2024)
SparAMX: Accelerating Compressed LLMs Token Generation on AMX-powered CPUs
by: AbouElhamayed, Ahmed F., et al.
Published: (2025)
by: AbouElhamayed, Ahmed F., et al.
Published: (2025)
Constrained Decoding with Speculative Lookaheads
by: Nakshatri, Nishanth, et al.
Published: (2024)
by: Nakshatri, Nishanth, et al.
Published: (2024)
Tokenized Bandit for LLM Decoding and Alignment
by: Shin, Suho, et al.
Published: (2025)
by: Shin, Suho, et al.
Published: (2025)
Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning
by: Stein, Katharina, et al.
Published: (2023)
by: Stein, Katharina, et al.
Published: (2023)
InternVL-X: Advancing and Accelerating InternVL Series with Efficient Visual Token Compression
by: Lu, Dongchen, et al.
Published: (2025)
by: Lu, Dongchen, et al.
Published: (2025)
Flexible and Efficient Grammar-Constrained Decoding
by: Park, Kanghee, et al.
Published: (2025)
by: Park, Kanghee, et al.
Published: (2025)
Edit-Constrained Decoding for Sentence Simplification
by: Zetsu, Tatsuya, et al.
Published: (2024)
by: Zetsu, Tatsuya, et al.
Published: (2024)
Primal-Dual Guided Decoding for Constrained Discrete Diffusion
by: Tomasi, Federico, et al.
Published: (2026)
by: Tomasi, Federico, et al.
Published: (2026)
Lossless Token Sequence Compression via Meta-Tokens
by: Harvill, John, et al.
Published: (2025)
by: Harvill, John, et al.
Published: (2025)
Improved Generalized Planning with LLMs through Strategy Refinement and Reflection
by: Stein, Katharina, et al.
Published: (2025)
by: Stein, Katharina, et al.
Published: (2025)
SceneTok: A Compressed, Diffusable Token Space for 3D Scenes
by: Asim, Mohammad, et al.
Published: (2026)
by: Asim, Mohammad, et al.
Published: (2026)
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
by: Hong, Junyuan, et al.
Published: (2024)
by: Hong, Junyuan, et al.
Published: (2024)
DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space
by: He, Wenkun, et al.
Published: (2025)
by: He, Wenkun, et al.
Published: (2025)
Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
by: Matveev, Ivan
Published: (2026)
by: Matveev, Ivan
Published: (2026)
Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
by: Zhu, Yifan, et al.
Published: (2026)
by: Zhu, Yifan, et al.
Published: (2026)
OneLatent: Single-Token Compression for Visual Latent Reasoning
by: Lv, Bo, et al.
Published: (2026)
by: Lv, Bo, et al.
Published: (2026)
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
by: Hu, Shijing, et al.
Published: (2025)
by: Hu, Shijing, et al.
Published: (2025)
WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding
by: Wang, Ran, et al.
Published: (2025)
by: Wang, Ran, et al.
Published: (2025)
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
by: Liu, Peijie, et al.
Published: (2025)
by: Liu, Peijie, et al.
Published: (2025)
HybridToken-VLM: Hybrid Token Compression for Vision-Language Models
by: Zhang, Jusheng, et al.
Published: (2025)
by: Zhang, Jusheng, et al.
Published: (2025)
Exploring Token-Space Manipulation in Latent Audio Tokenizers
by: Paissan, Francesco, et al.
Published: (2026)
by: Paissan, Francesco, et al.
Published: (2026)
TokenSkip: Controllable Chain-of-Thought Compression in LLMs
by: Xia, Heming, et al.
Published: (2025)
by: Xia, Heming, et al.
Published: (2025)
TokenSqueeze: Performance-Preserving Compression for Reasoning LLMs
by: Zhang, Yuxiang, et al.
Published: (2025)
by: Zhang, Yuxiang, et al.
Published: (2025)
On the Ability of Transformers to Verify Plans
by: Sarrof, Yash, et al.
Published: (2026)
by: Sarrof, Yash, et al.
Published: (2026)
The First Token Knows: Single-Decode Confidence for Hallucination Detection
by: Gabriel, Mina
Published: (2026)
by: Gabriel, Mina
Published: (2026)
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
by: Wu, Zhaoxuan, et al.
Published: (2025)
by: Wu, Zhaoxuan, et al.
Published: (2025)
Progressive Refinement Regulation for Accelerating Diffusion Language Model Decoding
by: Wan, Lipeng, et al.
Published: (2026)
by: Wan, Lipeng, et al.
Published: (2026)
Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
by: Zhang, Kexun, et al.
Published: (2023)
by: Zhang, Kexun, et al.
Published: (2023)
Tokenization Is More Than Compression
by: Schmidt, Craig W., et al.
Published: (2024)
by: Schmidt, Craig W., et al.
Published: (2024)
CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit
by: Wang, Kangyu, et al.
Published: (2025)
by: Wang, Kangyu, et al.
Published: (2025)
Policy-Space Search: Equivalences, Improvements, and Compression
by: Messa, Frederico, et al.
Published: (2024)
by: Messa, Frederico, et al.
Published: (2024)
Similar Items
-
GRPO is Secretly a Process Reward Model
by: Sullivan, Michael, et al.
Published: (2025) -
AuthorMix: Modular Authorship Style Transfer via Layer-wise Adapter Mixing
by: Thillainathan, Sarubi, et al.
Published: (2026) -
MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration
by: Liu, Xinyu, et al.
Published: (2026) -
Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling
by: Hao, Yongchang, et al.
Published: (2026) -
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
by: Nguyen, Tan Dat, et al.
Published: (2024)