Guardado en:
| Autores principales: | , , , |
|---|---|
| Formato: | Preprint |
| Publicado: |
2025
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2502.02659 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
| _version_ | 1866915312871407616 |
|---|---|
| author | Li, Yan Zhang, Tianyi Li, Zechuan Han, Soyeon Caren |
| author_facet | Li, Yan Zhang, Tianyi Li, Zechuan Han, Soyeon Caren |
| contents | Transformer-based Large Language Models (LLMs) struggle with inputs exceeding their training context window due to positional out-of-distribution (O.O.D.) issues that disrupt attention. Existing solutions, including fine-tuning and training-free methods, face challenges like inefficiency, redundant interpolation, logit outliers, or loss of local positional information. We propose Greedy Attention Logit Interpolation (GALI), a training-free method that improves length extrapolation by greedily reusing pretrained positional intervals and interpolating attention logit to eliminate outliers. GALI achieves stable and superior performance across a wide range of long-context tasks without requiring input-length-specific tuning. Our analysis further reveals that LLMs interpret positional intervals unevenly and that restricting interpolation to narrower ranges improves performance, even on short-context tasks. GALI represents a step toward more robust and generalizable long-text processing in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/adlnlp/Gali. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_02659 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) Li, Yan Zhang, Tianyi Li, Zechuan Han, Soyeon Caren Computation and Language Artificial Intelligence Transformer-based Large Language Models (LLMs) struggle with inputs exceeding their training context window due to positional out-of-distribution (O.O.D.) issues that disrupt attention. Existing solutions, including fine-tuning and training-free methods, face challenges like inefficiency, redundant interpolation, logit outliers, or loss of local positional information. We propose Greedy Attention Logit Interpolation (GALI), a training-free method that improves length extrapolation by greedily reusing pretrained positional intervals and interpolating attention logit to eliminate outliers. GALI achieves stable and superior performance across a wide range of long-context tasks without requiring input-length-specific tuning. Our analysis further reveals that LLMs interpret positional intervals unevenly and that restricting interpolation to narrower ranges improves performance, even on short-context tasks. GALI represents a step toward more robust and generalizable long-text processing in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/adlnlp/Gali. |
| title | A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI) |
| topic | Computation and Language Artificial Intelligence |
| url | https://arxiv.org/abs/2502.02659 |