Saved in:
| Main Authors: | Zhao, Liang, Shao, Kunming, Liao, Zhipeng, Huang, Xijie, Cheng, Tim Kwang-Ting, Tsui, Chi-Ying, Zou, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.05743 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models
by: Shao, Kunming, et al.
Published: (2026)
by: Shao, Kunming, et al.
Published: (2026)
DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation
by: Shao, Kunming, et al.
Published: (2025)
by: Shao, Kunming, et al.
Published: (2025)
A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents
by: Liao, Zhipeng, et al.
Published: (2025)
by: Liao, Zhipeng, et al.
Published: (2025)
A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination
by: Zhao, Liang, et al.
Published: (2025)
by: Zhao, Liang, et al.
Published: (2025)
SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis
by: Shao, Kunming, et al.
Published: (2024)
by: Shao, Kunming, et al.
Published: (2024)
RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write
by: Guo, Yan-Cheng, et al.
Published: (2026)
by: Guo, Yan-Cheng, et al.
Published: (2026)
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
by: Liu, Shih-yang, et al.
Published: (2023)
by: Liu, Shih-yang, et al.
Published: (2023)
CIM-Tuner: Balancing the Compute and Storage Capacity of SRAM-CIM Accelerator via Hardware-mapping Co-exploration
by: Chen, Jinwu, et al.
Published: (2026)
by: Chen, Jinwu, et al.
Published: (2026)
Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)
by: Zhao, Shixin, et al.
Published: (2025)
MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations
by: Zou, Jiaxiang, et al.
Published: (2026)
by: Zou, Jiaxiang, et al.
Published: (2026)
3DGauCIM: Accelerating Static/Dynamic 3D Gaussian Splatting via Digital CIM for High Frame Rate Real-Time Edge Rendering
by: Huang, Wei-Hsing, et al.
Published: (2025)
by: Huang, Wei-Hsing, et al.
Published: (2025)
31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding
by: Dong, Pingcheng, et al.
Published: (2026)
by: Dong, Pingcheng, et al.
Published: (2026)
Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration
by: Numan, Omar, et al.
Published: (2025)
by: Numan, Omar, et al.
Published: (2025)
SEGA-DCIM: Design Space Exploration-Guided Automatic Digital CIM Compiler with Multiple Precision Support
by: Diao, Haikang, et al.
Published: (2025)
by: Diao, Haikang, et al.
Published: (2025)
CIMFlow: An Integrated Framework for Systematic Design and Evaluation of Digital CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)
by: Qi, Yingjie, et al.
Published: (2025)
MX-SAFE: Versatile Inference- and Training-Proof Microscaling Format with On-the-Fly Exponent and Mantissa Bit Allocation
by: Park, Dahoon, et al.
Published: (2026)
by: Park, Dahoon, et al.
Published: (2026)
StreamDCIM: A Tile-based Streaming Digital CIM Accelerator with Mixed-stationary Cross-forwarding Dataflow for Multimodal Transformer
by: Qin, Shantian, et al.
Published: (2025)
by: Qin, Shantian, et al.
Published: (2025)
MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation
by: Natesh, Vikas, et al.
Published: (2025)
by: Natesh, Vikas, et al.
Published: (2025)
AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
by: Xue, Chenhao, et al.
Published: (2026)
by: Xue, Chenhao, et al.
Published: (2026)
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models
by: Bazzi, Jinane, et al.
Published: (2026)
by: Bazzi, Jinane, et al.
Published: (2026)
Unicorn-CIM: Uncovering the Vulnerability and Improving the Resilience of High-Precision Compute-in-Memory
by: Li, Qiufeng, et al.
Published: (2025)
by: Li, Qiufeng, et al.
Published: (2025)
High-Level Surface Code Decoding via Parallel FFNNs on CIM Platforms
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture
by: Xuan, Zihao, et al.
Published: (2026)
by: Xuan, Zihao, et al.
Published: (2026)
CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures
by: Qi, Yingjie, et al.
Published: (2025)
by: Qi, Yingjie, et al.
Published: (2025)
CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
by: Ahn, Bas, et al.
Published: (2026)
by: Ahn, Bas, et al.
Published: (2026)
Voxel-CIM: An Efficient Compute-in-Memory Accelerator for Voxel-based Point Cloud Neural Networks
by: Lin, Xipeng, et al.
Published: (2024)
by: Lin, Xipeng, et al.
Published: (2024)
Ouroboros: Wafer-Scale SRAM CIM with Token-Grained Pipelining for Large Language Model Inference
by: Liu, Yiqi, et al.
Published: (2026)
by: Liu, Yiqi, et al.
Published: (2026)
CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device
by: and, Yan-Cheng Guo, et al.
Published: (2025)
by: and, Yan-Cheng Guo, et al.
Published: (2025)
Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning
by: Li, Zhaoying, et al.
Published: (2024)
by: Li, Zhaoying, et al.
Published: (2024)
Faster Inference of LLMs using FP8 on the Intel Gaudi
by: Lee, Joonhyung, et al.
Published: (2025)
by: Lee, Joonhyung, et al.
Published: (2025)
A 28nm 1.80Mb/mm2 Digital/Analog Hybrid SRAM-CIM Macro Using 2D-Weighted Capacitor Array for Complex Number Mac Operations
by: Konno, Shota, et al.
Published: (2025)
by: Konno, Shota, et al.
Published: (2025)
Hardware-Efficient CNNs: Interleaved Approximate FP32 Multipliers for Kernel Computation
by: Gowda, Bindu G, et al.
Published: (2025)
by: Gowda, Bindu G, et al.
Published: (2025)
DGEMM without FP64 Arithmetic - Using FP64 Emulation and FP8 Tensor Cores with Ozaki Scheme
by: Mukunoki, Daichi
Published: (2025)
by: Mukunoki, Daichi
Published: (2025)
APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design
by: Tan, Yonghao, et al.
Published: (2025)
by: Tan, Yonghao, et al.
Published: (2025)
NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference
by: Xu, Weikai, et al.
Published: (2026)
by: Xu, Weikai, et al.
Published: (2026)
UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference
by: Xu, Weikai, et al.
Published: (2025)
by: Xu, Weikai, et al.
Published: (2025)
FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables
by: Park, Gunho, et al.
Published: (2025)
by: Park, Gunho, et al.
Published: (2025)
GEM3D CIM General Purpose Matrix Computation Using 3D Integrated SRAM eDRAM Hybrid Compute In Memory on Memory Architecture
by: Chakraborty, Subhradip, et al.
Published: (2026)
by: Chakraborty, Subhradip, et al.
Published: (2026)
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper
by: Su, Zhongling, et al.
Published: (2025)
by: Su, Zhongling, et al.
Published: (2025)
Shift-Left Techniques in Electronic Design Automation: A Survey
by: Wu, Xinyue, et al.
Published: (2025)
by: Wu, Xinyue, et al.
Published: (2025)
Similar Items
-
DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models
by: Shao, Kunming, et al.
Published: (2026) -
DIRC-RAG: Accelerating Edge RAG with Robust High-Density and High-Loading-Bandwidth Digital In-ReRAM Computation
by: Shao, Kunming, et al.
Published: (2025) -
A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-Agents
by: Liao, Zhipeng, et al.
Published: (2025) -
A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight Combination
by: Zhao, Liang, et al.
Published: (2025) -
SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis
by: Shao, Kunming, et al.
Published: (2024)