Saved in:
| Main Authors: | Zheng, Lin, Bashlovkina, Vasilisa, Dozat, Timothy, Garrette, Dan, Rimell, Laura, Maynez, Joshua |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.09630 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
by: Shao, Chenze, et al.
Published: (2024)
by: Shao, Chenze, et al.
Published: (2024)
CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
by: Chen, Yongqiang, et al.
Published: (2026)
by: Chen, Yongqiang, et al.
Published: (2026)
Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models
by: Jan, Aviv, et al.
Published: (2025)
by: Jan, Aviv, et al.
Published: (2025)
Attractor Patch Networks: Reducing Catastrophic Forgetting with Routed Low-Rank Patch Experts
by: Shashank
Published: (2026)
by: Shashank
Published: (2026)
Dissecting Persona-Driven Reasoning in Language Models via Activation Patching
by: Poonia, Ansh, et al.
Published: (2025)
by: Poonia, Ansh, et al.
Published: (2025)
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
by: Zhang, Fred, et al.
Published: (2023)
by: Zhang, Fred, et al.
Published: (2023)
Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models
by: Liu, Xinyang, et al.
Published: (2023)
by: Liu, Xinyang, et al.
Published: (2023)
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
by: Phan, Buu, et al.
Published: (2024)
by: Phan, Buu, et al.
Published: (2024)
Language Models over Canonical Byte-Pair Encodings
by: Vieira, Tim, et al.
Published: (2025)
by: Vieira, Tim, et al.
Published: (2025)
Neuron Patching: Semantic-based Neuron-level Language Model Repair for Code Generation
by: Gu, Jian, et al.
Published: (2023)
by: Gu, Jian, et al.
Published: (2023)
ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer
by: Deng, Chunyuan, et al.
Published: (2026)
by: Deng, Chunyuan, et al.
Published: (2026)
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2024)
by: Khan, Rana Muhammad Shahroz, et al.
Published: (2024)
Don't Ignore the Tail: Decoupling top-K Probabilities for Efficient Language Model Distillation
by: Dasgupta, Sayantan, et al.
Published: (2026)
by: Dasgupta, Sayantan, et al.
Published: (2026)
Discovering Decoupled Functional Modules in Large Language Models
by: Yu, Yanke, et al.
Published: (2026)
by: Yu, Yanke, et al.
Published: (2026)
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
by: Neitemeier, Pit, et al.
Published: (2025)
by: Neitemeier, Pit, et al.
Published: (2025)
Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction Prediction
by: Pang, Jiayun, et al.
Published: (2024)
by: Pang, Jiayun, et al.
Published: (2024)
Measuring the Depth of LLM Unlearning via Activation Patching
by: Lee, Jaeung, et al.
Published: (2026)
by: Lee, Jaeung, et al.
Published: (2026)
Patch the Distribution Mismatch: RL Rewriting Agent for Stable Off-Policy SFT
by: Wang, Jiacheng, et al.
Published: (2026)
by: Wang, Jiacheng, et al.
Published: (2026)
Byte-token Enhanced Language Models for Temporal Point Processes Analysis
by: Kong, Quyu, et al.
Published: (2025)
by: Kong, Quyu, et al.
Published: (2025)
Forecasting Time Series with LLMs via Patch-Based Prompting and Decomposition
by: Bumb, Mayank, et al.
Published: (2025)
by: Bumb, Mayank, et al.
Published: (2025)
Sampling from Your Language Model One Byte at a Time
by: Hayase, Jonathan, et al.
Published: (2025)
by: Hayase, Jonathan, et al.
Published: (2025)
DEPT: Decoupled Embeddings for Pre-training Language Models
by: Iacob, Alex, et al.
Published: (2024)
by: Iacob, Alex, et al.
Published: (2024)
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
by: Vendrell, Victor Conchello, et al.
Published: (2026)
by: Vendrell, Victor Conchello, et al.
Published: (2026)
Kronecker Embeddings: Byte-Level Structured Token Representations for Parameter-Efficient Language Models
by: Shravan, Rohan
Published: (2026)
by: Shravan, Rohan
Published: (2026)
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
by: Slagle, Kevin
Published: (2024)
by: Slagle, Kevin
Published: (2024)
GPUTOK: GPU Accelerated Byte Level BPE Tokenization
by: Kadamba, Venu Gopal, et al.
Published: (2026)
by: Kadamba, Venu Gopal, et al.
Published: (2026)
MambaByte: Token-free Selective State Space Model
by: Wang, Junxiong, et al.
Published: (2024)
by: Wang, Junxiong, et al.
Published: (2024)
HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model
by: Guo, Haiyang, et al.
Published: (2025)
by: Guo, Haiyang, et al.
Published: (2025)
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
by: Kallini, Julie, et al.
Published: (2024)
by: Kallini, Julie, et al.
Published: (2024)
MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling
by: Limisiewicz, Tomasz, et al.
Published: (2024)
by: Limisiewicz, Tomasz, et al.
Published: (2024)
Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling
by: Egli, Eric, et al.
Published: (2025)
by: Egli, Eric, et al.
Published: (2025)
Critical Data Size of Language Models from a Grokking Perspective
by: Zhu, Xuekai, et al.
Published: (2024)
by: Zhu, Xuekai, et al.
Published: (2024)
Patch Ranking: Efficient CLIP by Learning to Rank Local Patches
by: Wu, Cheng-En, et al.
Published: (2024)
by: Wu, Cheng-En, et al.
Published: (2024)
Leviathan: Decoupling Input and Output Representations in Language Models
by: Batley, Reza T., et al.
Published: (2026)
by: Batley, Reza T., et al.
Published: (2026)
Scaling Law for Language Models Training Considering Batch Size
by: Shuai, Xian, et al.
Published: (2024)
by: Shuai, Xian, et al.
Published: (2024)
Byte Latent Transformer: Patches Scale Better Than Tokens
by: Pagnoni, Artidoro, et al.
Published: (2024)
by: Pagnoni, Artidoro, et al.
Published: (2024)
Accelerating Vision Transformers with Adaptive Patch Sizes
by: Choudhury, Rohan, et al.
Published: (2025)
by: Choudhury, Rohan, et al.
Published: (2025)
Proxy Compression for Language Modeling
by: Zheng, Lin, et al.
Published: (2026)
by: Zheng, Lin, et al.
Published: (2026)
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
by: Guo, Yiran, et al.
Published: (2025)
by: Guo, Yiran, et al.
Published: (2025)
Fast Byte Latent Transformer
by: Kallini, Julie, et al.
Published: (2026)
by: Kallini, Julie, et al.
Published: (2026)
Similar Items
-
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
by: Shao, Chenze, et al.
Published: (2024) -
CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
by: Chen, Yongqiang, et al.
Published: (2026) -
Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models
by: Jan, Aviv, et al.
Published: (2025) -
Attractor Patch Networks: Reducing Catastrophic Forgetting with Routed Low-Rank Patch Experts
by: Shashank
Published: (2026) -
Dissecting Persona-Driven Reasoning in Language Models via Activation Patching
by: Poonia, Ansh, et al.
Published: (2025)