Saved in:
| Main Authors: | He, Mingxuan, Thottethodi, Mithuna, Vijaykumar, T. N. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2404.04708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
QED: Scalable Verification of Hardware Memory Consistency
by: Ravi, Gokulan, et al.
Published: (2024)
by: Ravi, Gokulan, et al.
Published: (2024)
NetSmith: An Optimization Framework for Machine-Discovered Network Topologies
by: Green, Conor, et al.
Published: (2024)
by: Green, Conor, et al.
Published: (2024)
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2024)
by: Oliveira, Geraldo F., et al.
Published: (2024)
Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025)
by: Huang, Xiaotong, et al.
Published: (2025)
A Composable Dynamic Sparse Dataflow Architecture for Efficient Event-based Vision Processing on FPGA
by: Gao, Yizhao, et al.
Published: (2024)
by: Gao, Yizhao, et al.
Published: (2024)
APACHE: A Processing-Near-Memory Architecture for Multi-Scheme Fully Homomorphic Encryption
by: Ding, Lin, et al.
Published: (2024)
by: Ding, Lin, et al.
Published: (2024)
In-Memory Computing Architecture for Efficient Hardware Security
by: Ajmi, Hala, et al.
Published: (2024)
by: Ajmi, Hala, et al.
Published: (2024)
XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA
by: Yu, Feng, et al.
Published: (2026)
by: Yu, Feng, et al.
Published: (2026)
HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
LP-Spec: Leveraging LPDDR PIM for Efficient LLM Mobile Speculative Inference with Architecture-Dataflow Co-Optimization
by: He, Siyuan, et al.
Published: (2025)
by: He, Siyuan, et al.
Published: (2025)
Harmonia: Algorithm-Hardware Co-Design for Memory- and Compute-Efficient BFP-based LLM Inference
by: Wang, Xinyu, et al.
Published: (2026)
by: Wang, Xinyu, et al.
Published: (2026)
FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture
by: Xuan, Zihao, et al.
Published: (2026)
by: Xuan, Zihao, et al.
Published: (2026)
PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)
by: Lee, Dongjae, et al.
Published: (2025)
A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network
by: Jiang, Aojie, et al.
Published: (2026)
by: Jiang, Aojie, et al.
Published: (2026)
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
by: Li, Huize, et al.
Published: (2026)
by: Li, Huize, et al.
Published: (2026)
DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2023)
by: Oliveira, Geraldo F., et al.
Published: (2023)
PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
by: Chong, Yue Jiet, et al.
Published: (2026)
by: Chong, Yue Jiet, et al.
Published: (2026)
LoopLynx: A Scalable Dataflow Architecture for Efficient LLM Inference
by: Zheng, Jianing, et al.
Published: (2025)
by: Zheng, Jianing, et al.
Published: (2025)
SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs
by: Zhang, Jintao, et al.
Published: (2026)
by: Zhang, Jintao, et al.
Published: (2026)
CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)
by: Qi, Yingjie, et al.
Published: (2025)
Enabling Efficient Transaction Processing on CXL-Based Memory Sharing
by: Wang, Zhao, et al.
Published: (2025)
by: Wang, Zhao, et al.
Published: (2025)
IBEX: Internal Bandwidth-Efficient Compression Architecture for Scalable CXL Memory Expansion
by: Ko, Younghoon, et al.
Published: (2026)
by: Ko, Younghoon, et al.
Published: (2026)
Modeling and Simulation Frameworks for Processing-in-Memory Architectures
by: Aghaei, Mahdi, et al.
Published: (2025)
by: Aghaei, Mahdi, et al.
Published: (2025)
GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Conditional Processing
by: Pei, Minnan, et al.
Published: (2025)
by: Pei, Minnan, et al.
Published: (2025)
Tasa: Thermal-aware 3D-Stacked Architecture Design with Bandwidth Sharing for LLM Inference
by: He, Siyuan, et al.
Published: (2025)
by: He, Siyuan, et al.
Published: (2025)
A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents
by: Liao, Zhipeng, et al.
Published: (2025)
by: Liao, Zhipeng, et al.
Published: (2025)
ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing
by: You, Kang, et al.
Published: (2026)
by: You, Kang, et al.
Published: (2026)
TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge
by: Huang, Zhirui, et al.
Published: (2025)
by: Huang, Zhirui, et al.
Published: (2025)
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)
SmaRTLy: RTL Optimization with Logic Inferencing and Structural Rebuilding
by: Li, Chengxi, et al.
Published: (2025)
by: Li, Chengxi, et al.
Published: (2025)
3D-TrIM: A Memory-Efficient Spatial Computing Architecture for Convolution Workloads
by: Sestito, Cristian, et al.
Published: (2025)
by: Sestito, Cristian, et al.
Published: (2025)
The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference
by: Li, Fang
Published: (2025)
by: Li, Fang
Published: (2025)
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution
by: Song, Chang Eun, et al.
Published: (2025)
by: Song, Chang Eun, et al.
Published: (2025)
LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs
by: He, Zifan, et al.
Published: (2025)
by: He, Zifan, et al.
Published: (2025)
Empowering Malware Detection Efficiency within Processing-in-Memory Architecture
by: Kasarapu, Sreenitha, et al.
Published: (2024)
by: Kasarapu, Sreenitha, et al.
Published: (2024)
End-to-End Transformer Acceleration Through Processing-in-Memory Architectures
by: Yang, Xiaoxuan, et al.
Published: (2025)
by: Yang, Xiaoxuan, et al.
Published: (2025)
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
by: Wolters, Christopher, et al.
Published: (2024)
by: Wolters, Christopher, et al.
Published: (2024)
Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller Design
by: Xie, Rui, et al.
Published: (2025)
by: Xie, Rui, et al.
Published: (2025)
Octopus: Enhancing CXL Memory Pods via Sparse Topology
by: Zhong, Yuhong, et al.
Published: (2025)
by: Zhong, Yuhong, et al.
Published: (2025)
SpANNS: Optimizing Approximate Nearest Neighbor Search for Sparse Vectors Using Near Memory Processing
by: Zhang, Tianqi, et al.
Published: (2026)
by: Zhang, Tianqi, et al.
Published: (2026)
Similar Items
-
QED: Scalable Verification of Hardware Memory Consistency
by: Ravi, Gokulan, et al.
Published: (2024) -
NetSmith: An Optimization Framework for Machine-Discovered Network Topologies
by: Green, Conor, et al.
Published: (2024) -
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures
by: Oliveira, Geraldo F., et al.
Published: (2024) -
Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025) -
A Composable Dynamic Sparse Dataflow Architecture for Efficient Event-based Vision Processing on FPGA
by: Gao, Yizhao, et al.
Published: (2024)