:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Liang, Shao, Kunming, Liao, Zhipeng, Huang, Xijie, Cheng, Tim Kwang-Ting, Tsui, Chi-Ying, Zou, Yi
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2602.05743
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models
by: Shao, Kunming, et al.
Published: (2026)

DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation
by: Shao, Kunming, et al.
Published: (2025)

A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents
by: Liao, Zhipeng, et al.
Published: (2025)

A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination
by: Zhao, Liang, et al.
Published: (2025)

SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis
by: Shao, Kunming, et al.
Published: (2024)

RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write
by: Guo, Yan-Cheng, et al.
Published: (2026)

LLM-FP4: 4-Bit Floating-Point Quantized Transformers
by: Liu, Shih-yang, et al.
Published: (2023)

CIM-Tuner: Balancing the Compute and Storage Capacity of SRAM-CIM Accelerator via Hardware-mapping Co-exploration
by: Chen, Jinwu, et al.
Published: (2026)

Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)

MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations
by: Zou, Jiaxiang, et al.
Published: (2026)

3DGauCIM: Accelerating Static/Dynamic 3D Gaussian Splatting via Digital CIM for High Frame Rate Real-Time Edge Rendering
by: Huang, Wei-Hsing, et al.
Published: (2025)

31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding
by: Dong, Pingcheng, et al.
Published: (2026)

Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration
by: Numan, Omar, et al.
Published: (2025)

SEGA-DCIM: Design Space Exploration-Guided Automatic Digital CIM Compiler with Multiple Precision Support
by: Diao, Haikang, et al.
Published: (2025)

CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)

MX-SAFE: Versatile Inference- and Training-Proof Microscaling Format with On-the-Fly Exponent and Mantissa Bit Allocation
by: Park, Dahoon, et al.
Published: (2026)

StreamDCIM: A Tile-based Streaming Digital CIM Accelerator with Mixed-stationary Cross-forwarding Dataflow for Multimodal Transformer
by: Qin, Shantian, et al.
Published: (2025)

MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation
by: Natesh, Vikas, et al.
Published: (2025)

AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
by: Xue, Chenhao, et al.
Published: (2026)

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models
by: Bazzi, Jinane, et al.
Published: (2026)

Unicorn-CIM: Uncovering the Vulnerability and Improving the Resilience of High-Precision Compute-in-Memory
by: Li, Qiufeng, et al.
Published: (2025)

High-Level Surface Code Decoding via Parallel FFNNs on CIM Platforms
by: Wang, Hao, et al.
Published: (2024)

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture
by: Xuan, Zihao, et al.
Published: (2026)

CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)

CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
by: Ahn, Bas, et al.
Published: (2026)

Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks
by: Lin, Xipeng, et al.
Published: (2024)

Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)

CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device
by: and, Yan-Cheng Guo, et al.
Published: (2025)

Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning
by: Li, Zhaoying, et al.
Published: (2024)

Faster Inference of LLMs using FP8 on the Intel Gaudi
by: Lee, Joonhyung, et al.
Published: (2025)

A 28nm 1.80Mb/mm2 Digital/Analog Hybrid SRAM-CIM Macro Using 2D-Weighted Capacitor Array for Complex Number Mac Operations
by: Konno, Shota, et al.
Published: (2025)

Hardware-Efficient CNNs: Interleaved Approximate FP32 Multipliers for Kernel Computation
by: Gowda, Bindu G, et al.
Published: (2025)

DGEMM without FP64 Arithmetic - Using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme
by: Mukunoki, Daichi
Published: (2025)

APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
by: Tan, Yonghao, et al.
Published: (2025)

NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference
by: Xu, Weikai, et al.
Published: (2026)

UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference
by: Xu, Weikai, et al.
Published: (2025)

FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables
by: Park, Gunho, et al.
Published: (2025)

GEM3D CIM General Purpose Matrix Computation Using 3D Integrated SRAM eDRAM Hybrid Compute In Memory on Memory Architecture
by: Chakraborty, Subhradip, et al.
Published: (2026)

TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
by: Su, Zhongling, et al.
Published: (2025)

Shift-Left Techniques in Electronic Design Automation: A Survey
by: Wu, Xinyue, et al.
Published: (2025)