Saved in:
| Main Authors: | Krishna, Keshav, Verma, Ayush |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.15344 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices
by: Park, Haneul, et al.
Published: (2025)
by: Park, Haneul, et al.
Published: (2025)
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
by: Gundawar, Ayush, et al.
Published: (2024)
by: Gundawar, Ayush, et al.
Published: (2024)
Design and Analysis of Approximate Hardware Accelerators for VVC Intra Angular Prediction
by: de Fraga, Lucas M. Leipnitz, et al.
Published: (2025)
by: de Fraga, Lucas M. Leipnitz, et al.
Published: (2025)
On the Impact of Intra-node Communication in the Performance of Supercomputer and Data Center Interconnection Networks
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)
Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centers
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)
How to keep pushing ML accelerator performance? Know your rooflines!
by: Verhelst, Marian, et al.
Published: (2025)
by: Verhelst, Marian, et al.
Published: (2025)
Pointer: An Energy-Efficient ReRAM-based Point Cloud Recognition Accelerator with Inter-layer and Intra-layer Optimizations
by: Zhang, Qijun, et al.
Published: (2024)
by: Zhang, Qijun, et al.
Published: (2024)
HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming
by: Choi, Ilhuan, et al.
Published: (2026)
by: Choi, Ilhuan, et al.
Published: (2026)
An Affordable Experimental Technique for SRAM Write Margin Characterization for Nanometer CMOS Technologies
by: Alorda, Bartomeu, et al.
Published: (2024)
by: Alorda, Bartomeu, et al.
Published: (2024)
BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism
by: Vittal, Suhas, et al.
Published: (2025)
by: Vittal, Suhas, et al.
Published: (2025)
RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write
by: Guo, Yan-Cheng, et al.
Published: (2026)
by: Guo, Yan-Cheng, et al.
Published: (2026)
ML-PCM : Machine Learning Technique for Write Optimization in Phase Change Memory (PCM)
by: Desai, Mahek, et al.
Published: (2025)
by: Desai, Mahek, et al.
Published: (2025)
A Host-SSD Collaborative Write Accelerator for LSM-Tree-Based Key-Value Stores
by: Kim, KiHwan, et al.
Published: (2024)
by: Kim, KiHwan, et al.
Published: (2024)
Limited Read-Write/Set Hardware Transactional Memory without modifying the ISA or the Coherence Protocol
by: Kafousis, Konstantinos
Published: (2025)
by: Kafousis, Konstantinos
Published: (2025)
Nemo: A Low-Write-Amplification Cache for Tiny Objects on Log-Structured Flash Devices
by: Yang, Xufeng, et al.
Published: (2026)
by: Yang, Xufeng, et al.
Published: (2026)
A High-Throughput FPGA Accelerator for Lightweight CNNs With Balanced Dataflow
by: Zhao, Zhiyuan, et al.
Published: (2024)
by: Zhao, Zhiyuan, et al.
Published: (2024)
FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
by: Tong, Jianming, et al.
Published: (2024)
by: Tong, Jianming, et al.
Published: (2024)
MCMComm: Hardware-Software Co-Optimization for End-to-End Communication in Multi-Chip-Modules
by: Raj, Ritik, et al.
Published: (2025)
by: Raj, Ritik, et al.
Published: (2025)
Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs
by: Taka, Endri, et al.
Published: (2025)
by: Taka, Endri, et al.
Published: (2025)
LEAP: LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism
by: Wang, Yimin, et al.
Published: (2025)
by: Wang, Yimin, et al.
Published: (2025)
CIM-Tuner: Balancing the Compute and Storage Capacity of SRAM-CIM Accelerator via Hardware-mapping Co-exploration
by: Chen, Jinwu, et al.
Published: (2026)
by: Chen, Jinwu, et al.
Published: (2026)
ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions
by: Vavelidou, Ioanna, et al.
Published: (2026)
by: Vavelidou, Ioanna, et al.
Published: (2026)
Design Conductor: An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU
by: The Verkor Team, et al.
Published: (2026)
by: The Verkor Team, et al.
Published: (2026)
Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours
by: The Verkor Team, et al.
Published: (2026)
by: The Verkor Team, et al.
Published: (2026)
PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects
by: Garg, Raveesh, et al.
Published: (2024)
by: Garg, Raveesh, et al.
Published: (2024)
Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction
by: Zhao, Liang, et al.
Published: (2026)
by: Zhao, Liang, et al.
Published: (2026)
Axon: A novel systolic array architecture for improved run time and energy efficient GeMM and Conv operation with on-chip im2col
by: Nayan, Md Mizanur Rahaman, et al.
Published: (2025)
by: Nayan, Md Mizanur Rahaman, et al.
Published: (2025)
CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design
by: Wan, Zishen, et al.
Published: (2025)
by: Wan, Zishen, et al.
Published: (2025)
OLAF: Programmable Data Plane Acceleration for Asynchronous Distributed Reinforcement Learning
by: Krishna, Nehal Baganal, et al.
Published: (2025)
by: Krishna, Nehal Baganal, et al.
Published: (2025)
SCALE-Sim TPU: Validating and Extending SCALE-Sim for TPUs
by: Dang, Jingtian, et al.
Published: (2026)
by: Dang, Jingtian, et al.
Published: (2026)
U-SWIM: Universal Selective Write-Verify for Computing-in-Memory Neural Accelerators
by: Yan, Zheyu, et al.
Published: (2023)
by: Yan, Zheyu, et al.
Published: (2023)
Cross-Layer Design of Vector-Symbolic Computing: Bridging Cognition and Brain-Inspired Hardware Acceleration
by: Du, Shuting, et al.
Published: (2025)
by: Du, Shuting, et al.
Published: (2025)
SMART-WRITE: Adaptive Learning-based Write Energy Optimization for Phase Change Memory
by: Desai, Mahek, et al.
Published: (2025)
by: Desai, Mahek, et al.
Published: (2025)
A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core
by: Jaswal, Pragun, et al.
Published: (2026)
by: Jaswal, Pragun, et al.
Published: (2026)
Low Power Approximate Multiplier Architecture for Deep Neural Networks
by: Jaswal, Pragun, et al.
Published: (2025)
by: Jaswal, Pragun, et al.
Published: (2025)
SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis
by: Raj, Ritik, et al.
Published: (2025)
by: Raj, Ritik, et al.
Published: (2025)
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
by: Heo, Jaehoon, et al.
Published: (2025)
by: Heo, Jaehoon, et al.
Published: (2025)
Device-Circuit Co-Design of Variation-Resilient Read and Write Drivers for Antiferromagnetic Tunnel Junction (AFMTJ) Memories
by: Choudhary, Yousuf, et al.
Published: (2026)
by: Choudhary, Yousuf, et al.
Published: (2026)
FRED: Flexible REduction-Distribution Interconnect and Communication Implementation for Wafer-Scale Distributed Training of DNN Models
by: Rashidi, Saeed, et al.
Published: (2024)
by: Rashidi, Saeed, et al.
Published: (2024)
OneDSE: A Unified Microprocessor Metric Prediction and Design Space Exploration Framework
by: Raj, Ritik, et al.
Published: (2025)
by: Raj, Ritik, et al.
Published: (2025)
Similar Items
-
A4: Microarchitecture-Aware LLC Management for Datacenter Servers with Emerging I/O Devices
by: Park, Haneul, et al.
Published: (2025) -
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
by: Gundawar, Ayush, et al.
Published: (2024) -
Design and Analysis of Approximate Hardware Accelerators for VVC Intra Angular Prediction
by: de Fraga, Lucas M. Leipnitz, et al.
Published: (2025) -
On the Impact of Intra-node Communication in the Performance of Supercomputer and Data Center Interconnection Networks
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025) -
Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centers
by: Tarraga-Moreno, Joaquin, et al.
Published: (2025)