Saved in:
| Main Author: | Schelpe, Sietse |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.09990 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks
by: Schelpe, Sietse
Published: (2026)
by: Schelpe, Sietse
Published: (2026)
LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model
by: Shao, Wei, et al.
Published: (2025)
by: Shao, Wei, et al.
Published: (2025)
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
by: Zeng, Ziqian, et al.
Published: (2024)
by: Zeng, Ziqian, et al.
Published: (2024)
Merlin's Whisper: Enabling Efficient Reasoning in Large Language Models via Black-box Persuasive Prompting
by: Xia, Heming, et al.
Published: (2025)
by: Xia, Heming, et al.
Published: (2025)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)
by: Phan, Buu, et al.
Published: (2024)
Generative Deduplication For Socia Media Data Selection
by: Li, Xianming, et al.
Published: (2024)
by: Li, Xianming, et al.
Published: (2024)
SEDD: Scalable and Efficient Dataset Deduplication with GPUs
by: Son, Youngjun, et al.
Published: (2025)
by: Son, Youngjun, et al.
Published: (2025)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
by: Wu, Pengfei, et al.
Published: (2024)
by: Wu, Pengfei, et al.
Published: (2024)
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
by: Timor, Nadav, et al.
Published: (2024)
by: Timor, Nadav, et al.
Published: (2024)
Deduplicating and Ranking Solution Programs for Suggesting Reference Solutions
by: Shirafuji, Atsushi, et al.
Published: (2023)
by: Shirafuji, Atsushi, et al.
Published: (2023)
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
by: Zhang, Jun, et al.
Published: (2023)
by: Zhang, Jun, et al.
Published: (2023)
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline
by: Dong, Guosheng, et al.
Published: (2024)
by: Dong, Guosheng, et al.
Published: (2024)
Membership Inference Attack against Long-Context Large Language Models
by: Wang, Zixiong, et al.
Published: (2024)
by: Wang, Zixiong, et al.
Published: (2024)
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
by: Cho, Sukmin, et al.
Published: (2025)
by: Cho, Sukmin, et al.
Published: (2025)
FuzzCoder: Byte-level Fuzzing Test via Large Language Model
by: Yang, Liqun, et al.
Published: (2024)
by: Yang, Liqun, et al.
Published: (2024)
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction
by: Mao, Yu, et al.
Published: (2025)
by: Mao, Yu, et al.
Published: (2025)
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
by: Ou, Jie, et al.
Published: (2024)
by: Ou, Jie, et al.
Published: (2024)
Privacy-Preserving Data Deduplication for Enhancing Federated Learning of Language Models (Extended Version)
by: Abadi, Aydin, et al.
Published: (2024)
by: Abadi, Aydin, et al.
Published: (2024)
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
by: Lin, Feng, et al.
Published: (2024)
by: Lin, Feng, et al.
Published: (2024)
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)
by: Zhu, Qianchao, et al.
Published: (2024)
Deterministic Differentiable Structured Pruning for Large Language Models
by: Huang, Weiyu, et al.
Published: (2026)
by: Huang, Weiyu, et al.
Published: (2026)
Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models
by: Li, Jiaqian, et al.
Published: (2026)
by: Li, Jiaqian, et al.
Published: (2026)
Towards Efficient Exact Optimization of Language Model Alignment
by: Ji, Haozhe, et al.
Published: (2024)
by: Ji, Haozhe, et al.
Published: (2024)
Classical and quantum Merlin-Arthur automata
by: Yakaryılmaz, Abuzer
Published: (2022)
by: Yakaryılmaz, Abuzer
Published: (2022)
Inference Optimizations for Large Language Models: Effects, Challenges, and Practical Considerations
by: Donisch, Leo, et al.
Published: (2024)
by: Donisch, Leo, et al.
Published: (2024)
KV-Distill: Nearly Lossless Learnable Context Compression for LLMs
by: Chari, Vivek, et al.
Published: (2025)
by: Chari, Vivek, et al.
Published: (2025)
Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models
by: Xu, Ningyu, et al.
Published: (2026)
by: Xu, Ningyu, et al.
Published: (2026)
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
by: Deng, Chunyuan, et al.
Published: (2026)
by: Deng, Chunyuan, et al.
Published: (2026)
LongRoPE2: Near-Lossless LLM Context Window Scaling
by: Shang, Ning, et al.
Published: (2025)
by: Shang, Ning, et al.
Published: (2025)
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression
by: Mittu, Fazal, et al.
Published: (2024)
by: Mittu, Fazal, et al.
Published: (2024)
Energy Considerations of Large Language Model Inference and Efficiency Optimizations
by: Fernandez, Jared, et al.
Published: (2025)
by: Fernandez, Jared, et al.
Published: (2025)
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
by: Slagle, Kevin
Published: (2024)
by: Slagle, Kevin
Published: (2024)
Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models
by: Ramos, Miguel Moura, et al.
Published: (2026)
by: Ramos, Miguel Moura, et al.
Published: (2026)
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency
by: Park, Sihyeong, et al.
Published: (2025)
by: Park, Sihyeong, et al.
Published: (2025)
Lossless Vocabulary Reduction for Auto-Regressive Language Models
by: Chijiwa, Daiki, et al.
Published: (2025)
by: Chijiwa, Daiki, et al.
Published: (2025)
Language Models over Canonical Byte-Pair Encodings
by: Vieira, Tim, et al.
Published: (2025)
by: Vieira, Tim, et al.
Published: (2025)
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
by: Zhang, Hanzhi, et al.
Published: (2025)
by: Zhang, Hanzhi, et al.
Published: (2025)
Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding
by: Tacconelli, Roberto
Published: (2026)
by: Tacconelli, Roberto
Published: (2026)
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
by: Wan, Zhongwei, et al.
Published: (2024)
by: Wan, Zhongwei, et al.
Published: (2024)
In-Context Watermarks for Large Language Models
by: Liu, Yepeng, et al.
Published: (2025)
by: Liu, Yepeng, et al.
Published: (2025)
Similar Items
-
Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks
by: Schelpe, Sietse
Published: (2026) -
LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model
by: Shao, Wei, et al.
Published: (2025) -
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
by: Zeng, Ziqian, et al.
Published: (2024) -
Merlin's Whisper: Enabling Efficient Reasoning in Large Language Models via Black-box Persuasive Prompting
by: Xia, Heming, et al.
Published: (2025) -
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)