Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gan, Bingzheng, Zhang, Tianyi, Li, Yusu, Huang, Jing, Shi, Wei, Ding, Yangkai, Yu, Tao
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2605.00292
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.

Similar Items