Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.21463 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866913827887513600 |
|---|---|
| author | Hou, Haowen Huang, Zhiyi Tan, Kaifeng Lu, Rongchang Yu, Fei Richard |
| author_facet | Hou, Haowen Huang, Zhiyi Tan, Kaifeng Lu, Rongchang Yu, Fei Richard |
| contents | In this paper, we introduce RWKV-X, a novel hybrid architecture that combines the efficiency of RWKV for short-range modeling with a sparse attention mechanism designed to capture long-range context. Unlike previous hybrid approaches that rely on full attention layers and retain quadratic complexity, RWKV-X achieves linear-time complexity in training and constant-time complexity in inference decoding. We demonstrate that RWKV-X, when continually pretrained on 64K-token sequences, achieves near-perfect accuracy on the 64K passkey retrieval benchmark. It consistently outperforms prior RWKV-7 models on long-context benchmarks, while maintaining strong performance on short-context tasks. These results highlight RWKV-X as a scalable and efficient backbone for general-purpose language modeling, capable of decoding sequences up to 1 million tokens with stable speed and memory usage. To facilitate further research and analysis, we have made the checkpoints and the associated code publicly accessible at: https://github.com/howard-hou/RWKV-X. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_21463 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | RWKV-X: A Linear Complexity Hybrid Language Model Hou, Haowen Huang, Zhiyi Tan, Kaifeng Lu, Rongchang Yu, Fei Richard Computation and Language In this paper, we introduce RWKV-X, a novel hybrid architecture that combines the efficiency of RWKV for short-range modeling with a sparse attention mechanism designed to capture long-range context. Unlike previous hybrid approaches that rely on full attention layers and retain quadratic complexity, RWKV-X achieves linear-time complexity in training and constant-time complexity in inference decoding. We demonstrate that RWKV-X, when continually pretrained on 64K-token sequences, achieves near-perfect accuracy on the 64K passkey retrieval benchmark. It consistently outperforms prior RWKV-7 models on long-context benchmarks, while maintaining strong performance on short-context tasks. These results highlight RWKV-X as a scalable and efficient backbone for general-purpose language modeling, capable of decoding sequences up to 1 million tokens with stable speed and memory usage. To facilitate further research and analysis, we have made the checkpoints and the associated code publicly accessible at: https://github.com/howard-hou/RWKV-X. |
| title | RWKV-X: A Linear Complexity Hybrid Language Model |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2504.21463 |