:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xia, Tianhua, Zhang, Sai Qian
Format:	Preprint
Published:	2023
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2311.13290
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models
by: Peng, Tianfan, et al.
Published: (2025)

Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)

P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats
by: Chen, Yuzong, et al.
Published: (2025)

Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)

A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU
by: Belano, Andrea, et al.
Published: (2024)

FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture
by: Li, Tenglong, et al.
Published: (2024)

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
by: İslamoğlu, Gamze, et al.
Published: (2023)

RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
by: Zhang, Jie, et al.
Published: (2024)

VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support
by: Ou, Wenhui, et al.
Published: (2026)

A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration
by: Yunusoglu, Aybars, et al.
Published: (2026)

Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference
by: Danopoulos, Dimitrios, et al.
Published: (2026)

An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
by: Shao, Haikuo, et al.
Published: (2024)

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
by: Oztas, Ali Emre, et al.
Published: (2026)

MINISA: Minimal Instruction Set Architecture for Next-gen Reconfigurable Inference Accelerator
by: Tong, Jianming, et al.
Published: (2026)

VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
by: Wang, Run, et al.
Published: (2025)

STAR: An Efficient Softmax Engine for Attention Model with RRAM Crossbar
by: Zhai, Yifeng, et al.
Published: (2024)

FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration
by: Chen, Xingzhen, et al.
Published: (2026)

HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible Dataflows for Energy-Efficient CNN Inference
by: Vatsavai, Sairam Sri, et al.
Published: (2024)

SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
by: Wang, Wenxun, et al.
Published: (2025)

HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality
by: Shin, Hery, et al.
Published: (2024)

Hardware Efficient Accelerator for Spiking Transformer With Reconfigurable Parallel Time Step Computing
by: Chen, Bo-Yu, et al.
Published: (2025)

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
by: Tong, Jianming, et al.
Published: (2024)

A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)

VersaQ-3D: A Reconfigurable Accelerator Enabling Feed-Forward and Generalizable 3D Reconstruction via Versatile Quantization
by: Zhang, Yipu, et al.
Published: (2026)

Demystifying the 7-D Convolution Loop Nest for Data and Instruction Streaming in Reconfigurable AI Accelerators
by: Chowdhury, Md Rownak Hossain, et al.
Published: (2025)

Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels
by: Wu, Meng, et al.
Published: (2024)

A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network
by: Jiang, Aojie, et al.
Published: (2026)

PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration
by: Zhang, Yifan, et al.
Published: (2025)

Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
by: Song, Chang Eun, et al.
Published: (2025)

SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV
by: Zhang, Jingyao, et al.
Published: (2025)

Hybrid Photonic-digital Accelerator for Attention Mechanism
by: Li, Huize, et al.
Published: (2025)

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
by: Li, Xinyi, et al.
Published: (2024)

SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator
by: Wang, Peipei, et al.
Published: (2025)

Reconfigurable Stream Network Architecture
by: Wang, Chengyue, et al.
Published: (2024)

FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
by: Lu, Jinming, et al.
Published: (2025)

DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN Training and Inference
by: Sonnino, Lorenzo, et al.
Published: (2023)

SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)

PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers
by: Wu, Yuting, et al.
Published: (2023)

Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
by: Li, Boyu, et al.
Published: (2025)

VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration
by: Wipfli, Max, et al.
Published: (2026)