Saved in:
| Main Authors: | Xia, Tianhua, Zhang, Sai Qian |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.13290 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models
by: Peng, Tianfan, et al.
Published: (2025)
by: Peng, Tianfan, et al.
Published: (2025)
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025)
by: Xia, Tianhua, et al.
Published: (2025)
P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats
by: Chen, Yuzong, et al.
Published: (2025)
by: Chen, Yuzong, et al.
Published: (2025)
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)
by: Yubeaton, Patrick, et al.
Published: (2025)
A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU
by: Belano, Andrea, et al.
Published: (2024)
by: Belano, Andrea, et al.
Published: (2024)
FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture
by: Li, Tenglong, et al.
Published: (2024)
by: Li, Tenglong, et al.
Published: (2024)
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
by: İslamoğlu, Gamze, et al.
Published: (2023)
by: İslamoğlu, Gamze, et al.
Published: (2023)
RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
by: Zhang, Jie, et al.
Published: (2024)
by: Zhang, Jie, et al.
Published: (2024)
VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support
by: Ou, Wenhui, et al.
Published: (2026)
by: Ou, Wenhui, et al.
Published: (2026)
A Reconfigurable Framework for AI-FPGA Agent Integration and Acceleration
by: Yunusoglu, Aybars, et al.
Published: (2026)
by: Yunusoglu, Aybars, et al.
Published: (2026)
Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference
by: Danopoulos, Dimitrios, et al.
Published: (2026)
by: Danopoulos, Dimitrios, et al.
Published: (2026)
An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT
by: Shao, Haikuo, et al.
Published: (2024)
by: Shao, Haikuo, et al.
Published: (2024)
DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
by: Oztas, Ali Emre, et al.
Published: (2026)
by: Oztas, Ali Emre, et al.
Published: (2026)
MINISA: Minimal Instruction Set Architecture for Next-gen Reconfigurable Inference Accelerator
by: Tong, Jianming, et al.
Published: (2026)
by: Tong, Jianming, et al.
Published: (2026)
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
by: Wang, Run, et al.
Published: (2025)
by: Wang, Run, et al.
Published: (2025)
STAR: An Efficient Softmax Engine for Attention Model with RRAM Crossbar
by: Zhai, Yifeng, et al.
Published: (2024)
by: Zhai, Yifeng, et al.
Published: (2024)
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration
by: Chen, Xingzhen, et al.
Published: (2026)
by: Chen, Xingzhen, et al.
Published: (2026)
HEANA: A Hybrid Time-Amplitude Analog Optical Accelerator with Flexible Dataflows for Energy-Efficient CNN Inference
by: Vatsavai, Sairam Sri, et al.
Published: (2024)
by: Vatsavai, Sairam Sri, et al.
Published: (2024)
SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
by: Wang, Wenxun, et al.
Published: (2025)
by: Wang, Wenxun, et al.
Published: (2025)
HURRY: Highly Utilized, Reconfigurable ReRAM-based In-situ Accelerator with Multifunctionality
by: Shin, Hery, et al.
Published: (2024)
by: Shin, Hery, et al.
Published: (2024)
Hardware Efficient Accelerator for Spiking Transformer With Reconfigurable Parallel Time Step Computing
by: Chen, Bo-Yu, et al.
Published: (2025)
by: Chen, Bo-Yu, et al.
Published: (2025)
FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
by: Tong, Jianming, et al.
Published: (2024)
by: Tong, Jianming, et al.
Published: (2024)
A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)
by: Cheng, Jianyi, et al.
Published: (2023)
VersaQ-3D: A Reconfigurable Accelerator Enabling Feed-Forward and Generalizable 3D Reconstruction via Versatile Quantization
by: Zhang, Yipu, et al.
Published: (2026)
by: Zhang, Yipu, et al.
Published: (2026)
Demystifying the 7-D Convolution Loop Nest for Data and Instruction Streaming in Reconfigurable AI Accelerators
by: Chowdhury, Md Rownak Hossain, et al.
Published: (2025)
by: Chowdhury, Md Rownak Hossain, et al.
Published: (2025)
Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels
by: Wu, Meng, et al.
Published: (2024)
by: Wu, Meng, et al.
Published: (2024)
A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network
by: Jiang, Aojie, et al.
Published: (2026)
by: Jiang, Aojie, et al.
Published: (2026)
PD-Swap: Prefill-Decode Logic Swapping for End-to-End LLM Inference on Edge FPGAs via Dynamic Partial Reconfiguration
by: Zhang, Yifan, et al.
Published: (2025)
by: Zhang, Yifan, et al.
Published: (2025)
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
by: Song, Chang Eun, et al.
Published: (2025)
by: Song, Chang Eun, et al.
Published: (2025)
SAIL: SRAM-Accelerated LLM Inference System with Lookup-Table-based GEMV
by: Zhang, Jingyao, et al.
Published: (2025)
by: Zhang, Jingyao, et al.
Published: (2025)
Hybrid Photonic-digital Accelerator for Attention Mechanism
by: Li, Huize, et al.
Published: (2025)
by: Li, Huize, et al.
Published: (2025)
FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators
by: Li, Xinyi, et al.
Published: (2024)
by: Li, Xinyi, et al.
Published: (2024)
SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator
by: Wang, Peipei, et al.
Published: (2025)
by: Wang, Peipei, et al.
Published: (2025)
Reconfigurable Stream Network Architecture
by: Wang, Chengyue, et al.
Published: (2024)
by: Wang, Chengyue, et al.
Published: (2024)
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
by: Lu, Jinming, et al.
Published: (2025)
by: Lu, Jinming, et al.
Published: (2025)
DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN Training and Inference
by: Sonnino, Lorenzo, et al.
Published: (2023)
by: Sonnino, Lorenzo, et al.
Published: (2023)
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)
by: Zhong, Linfeng, et al.
Published: (2025)
PIM-GPT: A Hybrid Process-in-Memory Accelerator for Autoregressive Transformers
by: Wu, Yuting, et al.
Published: (2023)
by: Wu, Yuting, et al.
Published: (2023)
Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
by: Li, Boyu, et al.
Published: (2025)
by: Li, Boyu, et al.
Published: (2025)
VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format Acceleration
by: Wipfli, Max, et al.
Published: (2026)
by: Wipfli, Max, et al.
Published: (2026)
Similar Items
-
HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models
by: Peng, Tianfan, et al.
Published: (2025) -
Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing
by: Xia, Tianhua, et al.
Published: (2025) -
P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats
by: Chen, Yuzong, et al.
Published: (2025) -
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025) -
A Flexible Template for Edge Generative AI with High-Accuracy Accelerated Softmax & GELU
by: Belano, Andrea, et al.
Published: (2024)