Saved in:
| Main Authors: | Duan, Shaoxiong, Shi, Yining, Xu, Wei |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2310.11984 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Large Language Models as Interpolated and Extrapolated Event Predictors
by: Zhang, Libo, et al.
Published: (2024)
by: Zhang, Libo, et al.
Published: (2024)
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
by: Cho, Hanseul, et al.
Published: (2024)
by: Cho, Hanseul, et al.
Published: (2024)
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
by: Sabbaghi, Mahdi, et al.
Published: (2024)
by: Sabbaghi, Mahdi, et al.
Published: (2024)
Position as Probability: Self-Supervised Transformers that Think Past Their Training for Length Extrapolation
by: Lee, Philip Heejun
Published: (2025)
by: Lee, Philip Heejun
Published: (2025)
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
by: Le, Hung, et al.
Published: (2024)
by: Le, Hung, et al.
Published: (2024)
CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation
by: Inoshita, Keito, et al.
Published: (2025)
by: Inoshita, Keito, et al.
Published: (2025)
Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)
by: Cheng, Zicong, et al.
Published: (2026)
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
by: He, Zhenyu, et al.
Published: (2024)
by: He, Zhenyu, et al.
Published: (2024)
Learning Extrapolative Sequence Transformations from Markov Chains
by: Hager, Sophia, et al.
Published: (2025)
by: Hager, Sophia, et al.
Published: (2025)
Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
by: Gao, Bo, et al.
Published: (2025)
by: Gao, Bo, et al.
Published: (2025)
The Role of Sparsity for Length Generalization in Transformers
by: Golowich, Noah, et al.
Published: (2025)
by: Golowich, Noah, et al.
Published: (2025)
Transformers Can Achieve Length Generalization But Not Robustly
by: Zhou, Yongchao, et al.
Published: (2024)
by: Zhou, Yongchao, et al.
Published: (2024)
TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)
by: Wu, Wei, et al.
Published: (2024)
Length Generalization Bounds for Transformers
by: Yang, Andy, et al.
Published: (2026)
by: Yang, Andy, et al.
Published: (2026)
Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
by: Mészáros, Anna, et al.
Published: (2024)
by: Mészáros, Anna, et al.
Published: (2024)
On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)
by: Ahuja, Kartik, et al.
Published: (2024)
RLPR: Extrapolating RLVR to General Domains without Verifiers
by: Yu, Tianyu, et al.
Published: (2025)
by: Yu, Tianyu, et al.
Published: (2025)
Generalized Interpolating Discrete Diffusion
by: von Rütte, Dimitri, et al.
Published: (2025)
by: von Rütte, Dimitri, et al.
Published: (2025)
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
by: Wei, Zhepei, et al.
Published: (2026)
by: Wei, Zhepei, et al.
Published: (2026)
Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)
by: Liu, Wei, et al.
Published: (2024)
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)
by: Yang, Songlin, et al.
Published: (2024)
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
by: Kangaslahti, Sara, et al.
Published: (2024)
by: Kangaslahti, Sara, et al.
Published: (2024)
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)
by: Yang, Wenkai, et al.
Published: (2026)
Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
by: Bianchessi, Arthur S., et al.
Published: (2025)
by: Bianchessi, Arthur S., et al.
Published: (2025)
Fusion Matters: Length-Aware Analysis of Positional-Encoding Fusion in Transformers
by: Hallam, Mohamed Amine, et al.
Published: (2026)
by: Hallam, Mohamed Amine, et al.
Published: (2026)
Model Extrapolation Expedites Alignment
by: Zheng, Chujie, et al.
Published: (2024)
by: Zheng, Chujie, et al.
Published: (2024)
Extrapolation by Association: Length Generalization Transfer in Transformers
by: Cai, Ziyang, et al.
Published: (2025)
by: Cai, Ziyang, et al.
Published: (2025)
Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management
by: Cui, Guanyu, et al.
Published: (2026)
by: Cui, Guanyu, et al.
Published: (2026)
Intrinsic Entropy of Context Length Scaling in LLMs
by: Shi, Jingzhe, et al.
Published: (2025)
by: Shi, Jingzhe, et al.
Published: (2025)
Uncovering Cross-Objective Interference in Multi-Objective Alignment
by: Lu, Yining, et al.
Published: (2026)
by: Lu, Yining, et al.
Published: (2026)
The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs
by: Li, Xin, et al.
Published: (2026)
by: Li, Xin, et al.
Published: (2026)
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
by: Chen, Yan-Lun, et al.
Published: (2025)
by: Chen, Yan-Lun, et al.
Published: (2025)
Language Models are Symbolic Learners in Arithmetic
by: Deng, Chunyuan, et al.
Published: (2024)
by: Deng, Chunyuan, et al.
Published: (2024)
Steering Language Models with Weight Arithmetic
by: Fierro, Constanza, et al.
Published: (2025)
by: Fierro, Constanza, et al.
Published: (2025)
Rethinking Regularization Methods for Knowledge Graph Completion
by: Li, Linyu, et al.
Published: (2025)
by: Li, Linyu, et al.
Published: (2025)
Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
by: Battenberg, Eric, et al.
Published: (2024)
by: Battenberg, Eric, et al.
Published: (2024)
Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
by: Leng, Jiaqi, et al.
Published: (2025)
by: Leng, Jiaqi, et al.
Published: (2025)
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
by: Wei, Linye, et al.
Published: (2025)
by: Wei, Linye, et al.
Published: (2025)
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
by: Setlur, Amrith, et al.
Published: (2025)
by: Setlur, Amrith, et al.
Published: (2025)
A Long Way to Go: Investigating Length Correlations in RLHF
by: Singhal, Prasann, et al.
Published: (2023)
by: Singhal, Prasann, et al.
Published: (2023)
Similar Items
-
Large Language Models as Interpolated and Extrapolated Event Predictors
by: Zhang, Libo, et al.
Published: (2024) -
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
by: Cho, Hanseul, et al.
Published: (2024) -
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
by: Sabbaghi, Mahdi, et al.
Published: (2024) -
Position as Probability: Self-Supervised Transformers that Think Past Their Training for Length Extrapolation
by: Lee, Philip Heejun
Published: (2025) -
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
by: Le, Hung, et al.
Published: (2024)