Saved in:
| Main Author: | Singh, Shubham Kumar |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.10032 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
by: Yadav, Divakar Kumar, et al.
Published: (2026)
by: Yadav, Divakar Kumar, et al.
Published: (2026)
MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
by: Kang, Do Yeong, et al.
Published: (2025)
by: Kang, Do Yeong, et al.
Published: (2025)
Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs
by: Gao, Mingzhe, et al.
Published: (2024)
by: Gao, Mingzhe, et al.
Published: (2024)
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
by: Matsushima, Kosuke, et al.
Published: (2026)
by: Matsushima, Kosuke, et al.
Published: (2026)
Efficient Calibration for RRAM-based In-Memory Computing using DoRA
by: Dong, Weirong, et al.
Published: (2025)
by: Dong, Weirong, et al.
Published: (2025)
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems
by: Gogineni, Kailash, et al.
Published: (2024)
by: Gogineni, Kailash, et al.
Published: (2024)
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs
by: Moitra, Abhishek, et al.
Published: (2025)
by: Moitra, Abhishek, et al.
Published: (2025)
Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators
by: Kim, Jiyoon, et al.
Published: (2025)
by: Kim, Jiyoon, et al.
Published: (2025)
NeFT: Negative Feedback Training to Improve Robustness of Compute-In-Memory DNN Accelerators
by: Qin, Yifan, et al.
Published: (2023)
by: Qin, Yifan, et al.
Published: (2023)
Differentiable Initialization-Accelerated CPU-GPU Hybrid Combinatorial Scheduling
by: Liu, Mingju, et al.
Published: (2026)
by: Liu, Mingju, et al.
Published: (2026)
Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs
by: Yadav, Divakar Kumar, et al.
Published: (2026)
by: Yadav, Divakar Kumar, et al.
Published: (2026)
FlexLLM: Composable HLS Library for Flexible Hybrid LLM Accelerator Design
by: Zhang, Jiahao, et al.
Published: (2026)
by: Zhang, Jiahao, et al.
Published: (2026)
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
by: Ramachandran, Akshat, et al.
Published: (2025)
by: Ramachandran, Akshat, et al.
Published: (2025)
HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
by: Feng, Yuannuo, et al.
Published: (2025)
by: Feng, Yuannuo, et al.
Published: (2025)
AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology
by: Cong, Rongqing, et al.
Published: (2024)
by: Cong, Rongqing, et al.
Published: (2024)
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities
by: Read, James, et al.
Published: (2025)
by: Read, James, et al.
Published: (2025)
A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference
by: Woodward, Kieran, et al.
Published: (2025)
by: Woodward, Kieran, et al.
Published: (2025)
HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference
by: Negi, Shubham, et al.
Published: (2025)
by: Negi, Shubham, et al.
Published: (2025)
VerilogDB: The Largest, Highest-Quality Dataset with a Preprocessing Framework for LLM-based RTL Generation
by: Calzada, Paul E., et al.
Published: (2025)
by: Calzada, Paul E., et al.
Published: (2025)
PGR-DRC: Pre-Global Routing DRC Violation Prediction Using Unsupervised Learning
by: Islam, Riadul, et al.
Published: (2025)
by: Islam, Riadul, et al.
Published: (2025)
A Unified Memory Perspective for Probabilistic Trustworthy AI
by: Zhao, Xueji, et al.
Published: (2026)
by: Zhao, Xueji, et al.
Published: (2026)
In-Memory Learning Automata Architecture using Y-Flash Cell
by: Ghazal, Omar, et al.
Published: (2024)
by: Ghazal, Omar, et al.
Published: (2024)
Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
by: Farias, Matheus, et al.
Published: (2024)
by: Farias, Matheus, et al.
Published: (2024)
Causal AI For AMS Circuit Design: Interpretable Parameter Effects Analysis
by: Hussain, Mohyeu, et al.
Published: (2026)
by: Hussain, Mohyeu, et al.
Published: (2026)
Design Rules for Extreme-Edge Scientific Computing on AI Engines
by: Ma, Zhenghua, et al.
Published: (2026)
by: Ma, Zhenghua, et al.
Published: (2026)
Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?
by: Bhandwaldar, Abhishek, et al.
Published: (2026)
by: Bhandwaldar, Abhishek, et al.
Published: (2026)
TRAM: Training Approximate Multiplier Structures for Low-Power AI Accelerators
by: Meng, Chang, et al.
Published: (2026)
by: Meng, Chang, et al.
Published: (2026)
Position Paper: From Edge AI to Adaptive Edge AI
by: Pittorino, Fabrizio, et al.
Published: (2026)
by: Pittorino, Fabrizio, et al.
Published: (2026)
Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation
by: Kim, Hyunmog
Published: (2026)
by: Kim, Hyunmog
Published: (2026)
CacheMind: From Miss Rates to Why -- Natural-Language, Trace-Grounded Reasoning for Cache Replacement
by: Mhapsekar, Kaushal, et al.
Published: (2026)
by: Mhapsekar, Kaushal, et al.
Published: (2026)
From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic
by: Ren, Hansheng
Published: (2026)
by: Ren, Hansheng
Published: (2026)
ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators
by: Baldi, T., et al.
Published: (2026)
by: Baldi, T., et al.
Published: (2026)
Dynamic Sparse Attention: Access Patterns and Architecture
by: Levy, Noam
Published: (2026)
by: Levy, Noam
Published: (2026)
Challenges and Research Directions for Large Language Model Inference Hardware
by: Ma, Xiaoyu, et al.
Published: (2026)
by: Ma, Xiaoyu, et al.
Published: (2026)
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
by: Brandoit, Julien, et al.
Published: (2026)
by: Brandoit, Julien, et al.
Published: (2026)
Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs
by: Shashidhar, Vishal, et al.
Published: (2026)
by: Shashidhar, Vishal, et al.
Published: (2026)
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
by: Qiao, Ye, et al.
Published: (2026)
by: Qiao, Ye, et al.
Published: (2026)
SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI
by: Patne, Parth, et al.
Published: (2026)
by: Patne, Parth, et al.
Published: (2026)
RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks
by: Mohammadi, Ali Soltan, et al.
Published: (2026)
by: Mohammadi, Ali Soltan, et al.
Published: (2026)
Continuous-Flow Data-Rate-Aware CNN Inference on FPGA
by: Habermann, Tobias, et al.
Published: (2026)
by: Habermann, Tobias, et al.
Published: (2026)
Similar Items
-
Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference
by: Yadav, Divakar Kumar, et al.
Published: (2026) -
MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
by: Kang, Do Yeong, et al.
Published: (2025) -
Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNs
by: Gao, Mingzhe, et al.
Published: (2024) -
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
by: Matsushima, Kosuke, et al.
Published: (2026) -
Efficient Calibration for RRAM-based In-Memory Computing using DoRA
by: Dong, Weirong, et al.
Published: (2025)