:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Duan, Shaoxiong, Shi, Yining, Xu, Wei
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2310.11984
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Large Language Models as Interpolated and Extrapolated Event Predictors
by: Zhang, Libo, et al.
Published: (2024)

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
by: Cho, Hanseul, et al.
Published: (2024)

Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
by: Sabbaghi, Mahdi, et al.
Published: (2024)

Position as Probability: Self-Supervised Transformers that Think Past Their Training for Length Extrapolation
by: Lee, Philip Heejun
Published: (2025)

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
by: Le, Hung, et al.
Published: (2024)

CIEGAD: Cluster-Conditioned Interpolative and Extrapolative Framework for Geometry-Aware and Domain-Aligned Data Augmentation
by: Inoshita, Keito, et al.
Published: (2025)

Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
by: He, Zhenyu, et al.
Published: (2024)

Learning Extrapolative Sequence Transformations from Markov Chains
by: Hager, Sophia, et al.
Published: (2025)

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models
by: Gao, Bo, et al.
Published: (2025)

The Role of Sparsity for Length Generalization in Transformers
by: Golowich, Noah, et al.
Published: (2025)

Transformers Can Achieve Length Generalization But Not Robustly
by: Zhou, Yongchao, et al.
Published: (2024)

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)

Length Generalization Bounds for Transformers
by: Yang, Andy, et al.
Published: (2026)

Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts
by: Mészáros, Anna, et al.
Published: (2024)

On Provable Length and Compositional Generalization
by: Ahuja, Kartik, et al.
Published: (2024)

RLPR: Extrapolating RLVR to General Domains without Verifiers
by: Yu, Tianyu, et al.
Published: (2025)

Generalized Interpolating Discrete Diffusion
by: von Rütte, Dimitri, et al.
Published: (2025)

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
by: Wei, Zhepei, et al.
Published: (2026)

Length Desensitization in Direct Preference Optimization
by: Liu, Wei, et al.
Published: (2024)

Parallelizing Linear Transformers with the Delta Rule over Sequence Length
by: Yang, Songlin, et al.
Published: (2024)

Continuous Language Model Interpolation for Dynamic and Controllable Text Generation
by: Kangaslahti, Sara, et al.
Published: (2024)

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
by: Yang, Wenkai, et al.
Published: (2026)

Bayesian Attention Mechanism: A Probabilistic Framework for Positional Encoding and Context Length Extrapolation
by: Bianchessi, Arthur S., et al.
Published: (2025)

Fusion Matters: Length-Aware Analysis of Positional-Encoding Fusion in Transformers
by: Hallam, Mohamed Amine, et al.
Published: (2026)

Model Extrapolation Expedites Alignment
by: Zheng, Chujie, et al.
Published: (2024)

Extrapolation by Association: Length Generalization Transfer in Transformers
by: Cai, Ziyang, et al.
Published: (2025)

Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management
by: Cui, Guanyu, et al.
Published: (2026)

Intrinsic Entropy of Context Length Scaling in LLMs
by: Shi, Jingzhe, et al.
Published: (2025)

Uncovering Cross-Objective Interference in Multi-Objective Alignment
by: Lu, Yining, et al.
Published: (2026)

The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs
by: Li, Xin, et al.
Published: (2026)

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge
by: Chen, Yan-Lun, et al.
Published: (2025)

Language Models are Symbolic Learners in Arithmetic
by: Deng, Chunyuan, et al.
Published: (2024)

Steering Language Models with Weight Arithmetic
by: Fierro, Constanza, et al.
Published: (2025)

Rethinking Regularization Methods for Knowledge Graph Completion
by: Li, Linyu, et al.
Published: (2025)

Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
by: Battenberg, Eric, et al.
Published: (2024)

Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
by: Leng, Jiaqi, et al.
Published: (2025)

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
by: Wei, Linye, et al.
Published: (2025)

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
by: Setlur, Amrith, et al.
Published: (2025)

A Long Way to Go: Investigating Length Correlations in RLHF
by: Singhal, Prasann, et al.
Published: (2023)