Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhou, Zhaokun, Che, Kaiwei, Fang, Wei, Tian, Keyu, Zhu, Yuesheng, Yan, Shuicheng, Tian, Yonghong, Yuan, Li
Format:	Preprint
Published:	2024
Subjects:	Neural and Evolutionary Computing Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2401.02020
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916080709009408
author	Zhou, Zhaokun Che, Kaiwei Fang, Wei Tian, Keyu Zhu, Yuesheng Yan, Shuicheng Tian, Yonghong Yuan, Li
author_facet	Zhou, Zhaokun Che, Kaiwei Fang, Wei Tian, Keyu Zhu, Yuesheng Yan, Shuicheng Tian, Yonghong Yuan, Li
contents	Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self-Attention (SSA) and Spiking Transformer (Spikformer). The SSA mechanism eliminates the need for softmax and captures the sparse visual feature employing spike-based Query, Key, and Value. This sparse computation without multiplication makes SSA efficient and energy-saving. Further, we develop a Spiking Convolutional Stem (SCS) with supplementary convolutional layers to enhance the architecture of Spikformer. The Spikformer enhanced with the SCS is referred to as Spikformer V2. To train larger and deeper Spikformer V2, we introduce a pioneering exploration of Self-Supervised Learning (SSL) within the SNN. Specifically, we pre-train Spikformer V2 with masking and reconstruction style inspired by the mainstream self-supervised Transformer, and then finetune the Spikformer V2 on the image classification on ImageNet. Extensive experiments show that Spikformer V2 outperforms other previous surrogate training and ANN2SNN methods. An 8-layer Spikformer V2 achieves an accuracy of 80.38% using 4 time steps, and after SSL, a 172M 16-layer Spikformer V2 reaches an accuracy of 81.10% with just 1 time step. To the best of our knowledge, this is the first time that the SNN achieves 80+% accuracy on ImageNet. The code will be available at Spikformer V2.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_02020
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket Zhou, Zhaokun Che, Kaiwei Fang, Wei Tian, Keyu Zhu, Yuesheng Yan, Shuicheng Tian, Yonghong Yuan, Li Neural and Evolutionary Computing Computer Vision and Pattern Recognition Machine Learning Spiking Neural Networks (SNNs), known for their biologically plausible architecture, face the challenge of limited performance. The self-attention mechanism, which is the cornerstone of the high-performance Transformer and also a biologically inspired structure, is absent in existing SNNs. To this end, we explore the potential of leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self-Attention (SSA) and Spiking Transformer (Spikformer). The SSA mechanism eliminates the need for softmax and captures the sparse visual feature employing spike-based Query, Key, and Value. This sparse computation without multiplication makes SSA efficient and energy-saving. Further, we develop a Spiking Convolutional Stem (SCS) with supplementary convolutional layers to enhance the architecture of Spikformer. The Spikformer enhanced with the SCS is referred to as Spikformer V2. To train larger and deeper Spikformer V2, we introduce a pioneering exploration of Self-Supervised Learning (SSL) within the SNN. Specifically, we pre-train Spikformer V2 with masking and reconstruction style inspired by the mainstream self-supervised Transformer, and then finetune the Spikformer V2 on the image classification on ImageNet. Extensive experiments show that Spikformer V2 outperforms other previous surrogate training and ANN2SNN methods. An 8-layer Spikformer V2 achieves an accuracy of 80.38% using 4 time steps, and after SSL, a 172M 16-layer Spikformer V2 reaches an accuracy of 81.10% with just 1 time step. To the best of our knowledge, this is the first time that the SNN achieves 80+% accuracy on ImageNet. The code will be available at Spikformer V2.
title	Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
topic	Neural and Evolutionary Computing Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2401.02020

Similar Items