Saved in:
| Main Authors: | Zhang, Yuanpeng, Hu, Xing, Chen, Xi, Yuan, Zhihang, Li, Cong, Zhu, Jingchen, Wang, Zhao, Zhang, Chenguang, Si, Xin, Gao, Wei, Wu, Qiang, Wang, Runsheng, Sun, Guangyu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.04321 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
by: Alsop, Johnathan, et al.
Published: (2023)
by: Alsop, Johnathan, et al.
Published: (2023)
METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators
by: Wang, Zhao, et al.
Published: (2021)
by: Wang, Zhao, et al.
Published: (2021)
Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators
by: Zhang, Chenguang, et al.
Published: (2024)
by: Zhang, Chenguang, et al.
Published: (2024)
Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
by: Li, Cong, et al.
Published: (2026)
by: Li, Cong, et al.
Published: (2026)
PIM-FW: Hardware-Software Co-Design of All-pairs Shortest Paths in DRAM
by: Lu, Tsung-Han, et al.
Published: (2025)
by: Lu, Tsung-Han, et al.
Published: (2025)
NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
by: Zhou, Zhe, et al.
Published: (2024)
by: Zhou, Zhe, et al.
Published: (2024)
MERE: Hardware-Software Co-Design for Masking Cache Miss Latency in Embedded Processors
by: You, Dean, et al.
Published: (2025)
by: You, Dean, et al.
Published: (2025)
Efficient SRAM-PIM Co-design by Joint Exploration of Value-Level and Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
ANCoEF: Asynchronous Neuromorphic Algorithm/Hardware Co-Exploration Framework with a Fully Asynchronous Simulator
by: Zhang, Jian, et al.
Published: (2024)
by: Zhang, Jian, et al.
Published: (2024)
Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology
by: Hyun, Bongjoon, et al.
Published: (2023)
by: Hyun, Bongjoon, et al.
Published: (2023)
SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
by: Wang, Wenxun, et al.
Published: (2025)
by: Wang, Wenxun, et al.
Published: (2025)
PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures
by: Lee, Dongjae, et al.
Published: (2025)
by: Lee, Dongjae, et al.
Published: (2025)
Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity
by: Duan, Cenlin, et al.
Published: (2024)
by: Duan, Cenlin, et al.
Published: (2024)
HSCO-Bench: An Agent-Driven End-to-End Hardware-Software Co-design Benchmark for Systems-on-Chip
by: Tsai, Pei-Huan, et al.
Published: (2026)
by: Tsai, Pei-Huan, et al.
Published: (2026)
PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs
by: Malekar, Jinendra, et al.
Published: (2025)
by: Malekar, Jinendra, et al.
Published: (2025)
SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors
by: Rakka, Mariam, et al.
Published: (2024)
by: Rakka, Mariam, et al.
Published: (2024)
LP-Spec: Leveraging LPDDR PIM for Efficient LLM Mobile Speculative Inference with Architecture-Dataflow Co-Optimization
by: He, Siyuan, et al.
Published: (2025)
by: He, Siyuan, et al.
Published: (2025)
Theseus: Exploring Efficient Wafer-Scale Chip Design for Large Language Models
by: Zhu, Jingchen, et al.
Published: (2024)
by: Zhu, Jingchen, et al.
Published: (2024)
LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism
by: Wang, Yimin, et al.
Published: (2025)
by: Wang, Yimin, et al.
Published: (2025)
AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
by: Xue, Chenhao, et al.
Published: (2026)
by: Xue, Chenhao, et al.
Published: (2026)
CellE: Automated Standard Cell Library Extension via Equality Saturation
by: Ren, Yi, et al.
Published: (2026)
by: Ren, Yi, et al.
Published: (2026)
AutoPDR: Circuit-Aware Solver Configuration Prediction for Hardware Model Checking
by: Hu, Guangyu, et al.
Published: (2026)
by: Hu, Guangyu, et al.
Published: (2026)
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference
by: Zhang, Yu, et al.
Published: (2024)
by: Zhang, Yu, et al.
Published: (2024)
RePart: Efficient Hypergraph Partitioning with Logic Replication Optimization for Multi-FPGA System
by: Fu, Zizhuo, et al.
Published: (2026)
by: Fu, Zizhuo, et al.
Published: (2026)
GenDRAM:Hardware-Software Co-Design of General Platform in DRAM
by: Lu, Tsung-Han, et al.
Published: (2026)
by: Lu, Tsung-Han, et al.
Published: (2026)
Annotated PIM Bibliography
by: Kogge, Peter M.
Published: (2026)
by: Kogge, Peter M.
Published: (2026)
SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design
by: Zhang, Haoyang, et al.
Published: (2025)
by: Zhang, Haoyang, et al.
Published: (2025)
Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures
by: Xu, Bin, et al.
Published: (2025)
by: Xu, Bin, et al.
Published: (2025)
The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization
by: Li, Meng, et al.
Published: (2026)
by: Li, Meng, et al.
Published: (2026)
LeGend: A Data-Driven Framework for Lemma Generation in Hardware Model Checking
by: Miao, Mingkai, et al.
Published: (2026)
by: Miao, Mingkai, et al.
Published: (2026)
UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture
by: Chen, Sitian, et al.
Published: (2024)
by: Chen, Sitian, et al.
Published: (2024)
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory
by: Kim, Dong Eun, et al.
Published: (2025)
by: Kim, Dong Eun, et al.
Published: (2025)
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning
by: Hu, Guangyu, et al.
Published: (2026)
by: Hu, Guangyu, et al.
Published: (2026)
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
by: Li, Zhuoran, et al.
Published: (2026)
by: Li, Zhuoran, et al.
Published: (2026)
PIM-MMU: A Memory Management Unit for Accelerating Data Transfers in Commercial PIM Systems
by: Lee, Dongjae, et al.
Published: (2024)
by: Lee, Dongjae, et al.
Published: (2024)
Reconfigurable Stream Network Architecture
by: Wang, Chengyue, et al.
Published: (2024)
by: Wang, Chengyue, et al.
Published: (2024)
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM
by: Hong, Junguk, et al.
Published: (2026)
by: Hong, Junguk, et al.
Published: (2026)
MCMComm: Hardware-Software Co-Optimization for End-to-End Communication in Multi-Chip-Modules
by: Raj, Ritik, et al.
Published: (2025)
by: Raj, Ritik, et al.
Published: (2025)
L3: DIMM-PIM Integrated Architecture and Coordination for Scalable Long-Context LLM Inference
by: Liu, Qingyuan, et al.
Published: (2025)
by: Liu, Qingyuan, et al.
Published: (2025)
CIM-Tuner: Balancing the Compute and Storage Capacity of SRAM-CIM Accelerator via Hardware-mapping Co-exploration
by: Chen, Jinwu, et al.
Published: (2026)
by: Chen, Jinwu, et al.
Published: (2026)
Similar Items
-
Inclusive-PIM: Hardware-Software Co-design for Broad Acceleration on Commercial PIM Architectures
by: Alsop, Johnathan, et al.
Published: (2023) -
METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators
by: Wang, Zhao, et al.
Published: (2021) -
Algorithm-hardware co-design for Energy-Efficient A/D conversion in ReRAM-based accelerators
by: Zhang, Chenguang, et al.
Published: (2024) -
Hardware-Software Co-design for 3D-DRAM-based LLM Serving Accelerator
by: Li, Cong, et al.
Published: (2026) -
PIM-FW: Hardware-Software Co-Design of All-pairs Shortest Paths in DRAM
by: Lu, Tsung-Han, et al.
Published: (2025)