Saved in:
Bibliographic Details
Main Authors: Gan, Bingzheng, Zhang, Tianyi, Li, Yusu, Huang, Jing, Shi, Wei, Ding, Yangkai, Yu, Tao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.00292
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.