:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Krishna, Keshav, Verma, Ayush
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2410.15344
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices
by: Park, Haneul, et al.
Published: (2025)

Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
by: Gundawar, Ayush, et al.
Published: (2024)

Design and Analysis of Approximate Hardware Accelerators for VVC Intra Angular Prediction
by: de Fraga, Lucas M. Leipnitz, et al.
Published: (2025)

On the Impact of Intra-node Communication in the Performance of Supercomputer and Data Center Interconnection Networks
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)

Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centers
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)

How to keep pushing ML accelerator performance? Know your rooflines!
by: Verhelst, Marian, et al.
Published: (2025)

Pointer: An Energy-Efficient ReRAM-based Point Cloud Recognition Accelerator with Inter-layer and Intra-layer Optimizations
by: Zhang, Qijun, et al.
Published: (2024)

HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming
by: Choi, Ilhuan, et al.
Published: (2026)

An Affordable Experimental Technique for SRAM Write Margin Characterization for Nanometer CMOS Technologies
by: Alorda, Bartomeu, et al.
Published: (2024)

BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism
by: Vittal, Suhas, et al.
Published: (2025)

RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write
by: Guo, Yan-Cheng, et al.
Published: (2026)

ML-PCM : Machine Learning Technique for Write Optimization in Phase Change Memory (PCM)
by: Desai, Mahek, et al.
Published: (2025)

A Host-SSD Collaborative Write Accelerator for LSM-Tree-Based Key-Value Stores
by: Kim, KiHwan, et al.
Published: (2024)

Limited Read-Write/Set Hardware Transactional Memory without modifying the ISA or the Coherence Protocol
by: Kafousis, Konstantinos
Published: (2025)

Nemo: A Low-Write-Amplification Cache for Tiny Objects on Log-Structured Flash Devices
by: Yang, Xufeng, et al.
Published: (2026)

A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow
by: Zhao, Zhiyuan, et al.
Published: (2024)

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
by: Tong, Jianming, et al.
Published: (2024)

MCMComm: Hardware-Software Co-Optimization for End-to-End Communication in Multi-Chip-Modules
by: Raj, Ritik, et al.
Published: (2025)

Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs
by: Taka, Endri, et al.
Published: (2025)

LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism
by: Wang, Yimin, et al.
Published: (2025)

CIM-Tuner: Balancing the Compute and Storage Capacity of SRAM-CIM Accelerator via Hardware-mapping Co-exploration
by: Chen, Jinwu, et al.
Published: (2026)

ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions
by: Vavelidou, Ioanna, et al.
Published: (2026)

Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU
by: The Verkor Team, et al.
Published: (2026)

Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours
by: The Verkor Team, et al.
Published: (2026)

PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects
by: Garg, Raveesh, et al.
Published: (2024)

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction
by: Zhao, Liang, et al.
Published: (2026)

Axon: A novel systolic array architecture for improved run time and energy efficient GeMM and Conv operation with on-chip im2col
by: Nayan, Md Mizanur Rahaman, et al.
Published: (2025)

CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design
by: Wan, Zishen, et al.
Published: (2025)

OLAF: Programmable Data Plane Acceleration for Asynchronous Distributed Reinforcement Learning
by: Krishna, Nehal Baganal, et al.
Published: (2025)

SCALE-Sim TPU: Validating and Extending SCALE-Sim for TPUs
by: Dang, Jingtian, et al.
Published: (2026)

U-SWIM: Universal Selective Write-Verify for Computing-in-Memory Neural Accelerators
by: Yan, Zheyu, et al.
Published: (2023)

Cross-Layer Design of Vector-Symbolic Computing: Bridging Cognition and Brain-Inspired Hardware Acceleration
by: Du, Shuting, et al.
Published: (2025)

SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory
by: Desai, Mahek, et al.
Published: (2025)

A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core
by: Jaswal, Pragun, et al.
Published: (2026)

Low Power Approximate Multiplier Architecture for Deep Neural Networks
by: Jaswal, Pragun, et al.
Published: (2025)

SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis
by: Raj, Ritik, et al.
Published: (2025)

EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
by: Heo, Jaehoon, et al.
Published: (2025)

Device-Circuit Co-Design of Variation-Resilient Read and Write Drivers for Antiferromagnetic Tunnel Junction (AFMTJ) Memories
by: Choudhary, Yousuf, et al.
Published: (2026)

FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
by: Rashidi, Saeed, et al.
Published: (2024)

OneDSE: A Unified Microprocessor Metric Prediction and Design Space Exploration Framework
by: Raj, Ritik, et al.
Published: (2025)