:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Singh, Shubham Kumar
Format:	Preprint
Published:	2026
Subjects:	Hardware Architecture Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2603.10032
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
by: Yadav, Divakar Kumar, et al.
Published: (2026)

MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
by: Kang, Do Yeong, et al.
Published: (2025)

Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs
by: Gao, Mingzhe, et al.
Published: (2024)

AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
by: Matsushima, Kosuke, et al.
Published: (2026)

Efficient Calibration for RRAM-based In-Memory Computing using DoRA
by: Dong, Weirong, et al.
Published: (2025)

SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems
by: Gogineni, Kailash, et al.
Published: (2024)

MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs
by: Moitra, Abhishek, et al.
Published: (2025)

Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators
by: Kim, Jiyoon, et al.
Published: (2025)

NeFT: Negative Feedback Training to Improve Robustness of Compute-In-Memory DNN Accelerators
by: Qin, Yifan, et al.
Published: (2023)

Differentiable Initialization-Accelerated CPU-GPU Hybrid Combinatorial Scheduling
by: Liu, Mingju, et al.
Published: (2026)

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs
by: Yadav, Divakar Kumar, et al.
Published: (2026)

FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design
by: Zhang, Jiahao, et al.
Published: (2026)

Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
by: Ramachandran, Akshat, et al.
Published: (2025)

HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
by: Feng, Yuannuo, et al.
Published: (2025)

AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology
by: Cong, Rongqing, et al.
Published: (2024)

NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities
by: Read, James, et al.
Published: (2025)

A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference
by: Woodward, Kieran, et al.
Published: (2025)

HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
by: Negi, Shubham, et al.
Published: (2025)

VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation
by: Calzada, Paul E., et al.
Published: (2025)

PGR-DRC: Pre-Global Routing DRC Violation Prediction Using Unsupervised Learning
by: Islam, Riadul, et al.
Published: (2025)

A Unified Memory Perspective for Probabilistic Trustworthy AI
by: Zhao, Xueji, et al.
Published: (2026)

In-Memory Learning Automata Architecture using Y-Flash Cell
by: Ghazal, Omar, et al.
Published: (2024)

Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
by: Farias, Matheus, et al.
Published: (2024)

Causal AI For AMS Circuit Design: Interpretable Parameter Effects Analysis
by: Hussain, Mohyeu, et al.
Published: (2026)

Design Rules for Extreme-Edge Scientific Computing on AI Engines
by: Ma, Zhenghua, et al.
Published: (2026)

Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?
by: Bhandwaldar, Abhishek, et al.
Published: (2026)

TRAM: Training Approximate Multiplier Structures for Low-Power AI Accelerators
by: Meng, Chang, et al.
Published: (2026)

Position Paper: From Edge AI to Adaptive Edge AI
by: Pittorino, Fabrizio, et al.
Published: (2026)

Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation
by: Kim, Hyunmog
Published: (2026)

CacheMind: From Miss Rates to Why -- Natural-Language, Trace-Grounded Reasoning for Cache Replacement
by: Mhapsekar, Kaushal, et al.
Published: (2026)

From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic
by: Ren, Hansheng
Published: (2026)

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators
by: Baldi, T., et al.
Published: (2026)

Dynamic Sparse Attention: Access Patterns and Architecture
by: Levy, Noam
Published: (2026)

Challenges and Research Directions for Large Language Model Inference Hardware
by: Ma, Xiaoyu, et al.
Published: (2026)

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
by: Brandoit, Julien, et al.
Published: (2026)

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs
by: Shashidhar, Vishal, et al.
Published: (2026)

FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
by: Qiao, Ye, et al.
Published: (2026)

SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI
by: Patne, Parth, et al.
Published: (2026)

RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks
by: Mohammadi, Ali Soltan, et al.
Published: (2026)

Continuous-Flow Data-Rate-Aware CNN Inference on FPGA
by: Habermann, Tobias, et al.
Published: (2026)