Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Steel, Thijs, Langou, Julien
Format:	Preprint
Published:	2024
Subjects:	Performance Data Structures and Algorithms 65F15, 65Y05 G.4
Online Access:	https://arxiv.org/abs/2412.01852
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929611361746944
author	Steel, Thijs Langou, Julien
author_facet	Steel, Thijs Langou, Julien
contents	We present an efficient algorithm for the application of sequences of planar rotations to a matrix. Applying such sequences efficiently is important in many numerical linear algebra algorithms for eigenvalues. Our algorithm is novel in three main ways. First, we introduce a new kernel that is optimized for register reuse in a novel way. Second, we introduce a blocking and packing scheme that improves the cache efficiency of the algorithm. Finally, we thoroughly analyze the memory operations of the algorithm which leads to important theoretical insights and makes it easier to select good parameters. Numerical experiments show that our algorithm outperforms the state-of-the-art and achieves a flop rate close to the theoretical peak on modern hardware.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_01852
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Communication efficient application of sequences of planar rotations to a matrix Steel, Thijs Langou, Julien Performance Data Structures and Algorithms 65F15, 65Y05 G.4 We present an efficient algorithm for the application of sequences of planar rotations to a matrix. Applying such sequences efficiently is important in many numerical linear algebra algorithms for eigenvalues. Our algorithm is novel in three main ways. First, we introduce a new kernel that is optimized for register reuse in a novel way. Second, we introduce a blocking and packing scheme that improves the cache efficiency of the algorithm. Finally, we thoroughly analyze the memory operations of the algorithm which leads to important theoretical insights and makes it easier to select good parameters. Numerical experiments show that our algorithm outperforms the state-of-the-art and achieves a flop rate close to the theoretical peak on modern hardware.
title	Communication efficient application of sequences of planar rotations to a matrix
topic	Performance Data Structures and Algorithms 65F15, 65Y05 G.4
url	https://arxiv.org/abs/2412.01852

Similar Items