Saved in:
| Main Authors: | Luo, Yi, Wang, Yaobin, Wang, Qi, Song, Yingchen, Wu, Huan, Wang, Qingfeng, Huang, Jun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.01281 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
by: Gundawar, Ayush, et al.
Published: (2024)
by: Gundawar, Ayush, et al.
Published: (2024)
Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement
by: Liu, Songze, et al.
Published: (2025)
by: Liu, Songze, et al.
Published: (2025)
GreenMalloc: Allocator Optimisation for Industrial Workloads
by: Dakhama, Aidan, et al.
Published: (2025)
by: Dakhama, Aidan, et al.
Published: (2025)
Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
by: Li, Boyu, et al.
Published: (2025)
by: Li, Boyu, et al.
Published: (2025)
Allspark: Workload Orchestration for Visual Transformers on Processing In-Memory Systems
by: Ge, Mengke, et al.
Published: (2024)
by: Ge, Mengke, et al.
Published: (2024)
ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators
by: Zou, Guoqiang, et al.
Published: (2025)
by: Zou, Guoqiang, et al.
Published: (2025)
Accelerating GNN Training through Locality-aware Dropout and Merge
by: Sun, Gongjian, et al.
Published: (2025)
by: Sun, Gongjian, et al.
Published: (2025)
Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System
by: Fang, Yunhua, et al.
Published: (2025)
by: Fang, Yunhua, et al.
Published: (2025)
Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification
by: Patel, Vihaan, et al.
Published: (2026)
by: Patel, Vihaan, et al.
Published: (2026)
SpeedLLM: An FPGA Co-design of Large Language Model Inference Accelerator
by: Wang, Peipei, et al.
Published: (2025)
by: Wang, Peipei, et al.
Published: (2025)
High Utilization Energy-Aware Real-Time Inference Deep Convolutional Neural Network Accelerator
by: Lin, Kuan-Ting, et al.
Published: (2025)
by: Lin, Kuan-Ting, et al.
Published: (2025)
NOVA: Coordinated Test Selection and Bayes-Optimized Constrained Randomization for Accelerated Coverage Closure
by: Peng, Weijie, et al.
Published: (2025)
by: Peng, Weijie, et al.
Published: (2025)
Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN Inference
by: Petropoulos, Anastasios, et al.
Published: (2025)
by: Petropoulos, Anastasios, et al.
Published: (2025)
HPIM: Heterogeneous Processing-In-Memory-based Accelerator for Large Language Models Inference
by: Duan, Cenlin, et al.
Published: (2025)
by: Duan, Cenlin, et al.
Published: (2025)
VEDA: Efficient LLM Generation Through Voting-based KV Cache Eviction and Dataflow-flexible Accelerator
by: Wang, Zhican, et al.
Published: (2025)
by: Wang, Zhican, et al.
Published: (2025)
ApproxPilot: A GNN-based Accelerator Approximation Framework
by: Zhang, Qing, et al.
Published: (2024)
by: Zhang, Qing, et al.
Published: (2024)
Communication Characterization of AI Workloads for Large-scale Multi-chiplet Accelerators
by: Musavi, Mariam, et al.
Published: (2024)
by: Musavi, Mariam, et al.
Published: (2024)
Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server Workloads
by: Kwon, Jaewon, et al.
Published: (2025)
by: Kwon, Jaewon, et al.
Published: (2025)
Messaging-based Adaptive Vector Computing (MAVeC) Accelerator for AI Workloads
by: Chowdhury, Md. Rownak Hossain, et al.
Published: (2024)
by: Chowdhury, Md. Rownak Hossain, et al.
Published: (2024)
A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment
by: Wang, Bowen, et al.
Published: (2025)
by: Wang, Bowen, et al.
Published: (2025)
Aging Aware Adaptive Voltage Scaling for Reliable and Efficient AI Accelerators
by: Xie, Tong, et al.
Published: (2026)
by: Xie, Tong, et al.
Published: (2026)
VeriCache: Turning Lossy KV Cache into Lossless LLM Inference
by: Yao, Jiayi, et al.
Published: (2026)
by: Yao, Jiayi, et al.
Published: (2026)
Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency
by: Li, Mengming, et al.
Published: (2025)
by: Li, Mengming, et al.
Published: (2025)
Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)
by: Zhao, Shixin, et al.
Published: (2025)
PREFENDER: A Prefetching Defender against Cache Side Channel Attacks as A Pretender
by: Li, Luyi, et al.
Published: (2023)
by: Li, Luyi, et al.
Published: (2023)
Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-based Accelerators
by: Das, Abhijit, et al.
Published: (2022)
by: Das, Abhijit, et al.
Published: (2022)
SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling
by: Wang, Huizheng, et al.
Published: (2024)
by: Wang, Huizheng, et al.
Published: (2024)
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding
by: Zhong, Linfeng, et al.
Published: (2025)
by: Zhong, Linfeng, et al.
Published: (2025)
PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator
by: Chong, Yue Jiet, et al.
Published: (2026)
by: Chong, Yue Jiet, et al.
Published: (2026)
Titanus: Enabling KV Cache Pruning and Quantization On-the-Fly for LLM Acceleration
by: Chen, Peilin, et al.
Published: (2025)
by: Chen, Peilin, et al.
Published: (2025)
Comparative Characterization of KV Cache Management Strategies for LLM Inference
by: Mamo, Oteo, et al.
Published: (2026)
by: Mamo, Oteo, et al.
Published: (2026)
BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions
by: Wang, Quancheng, et al.
Published: (2023)
by: Wang, Quancheng, et al.
Published: (2023)
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
by: Adnan, Muhammad, et al.
Published: (2024)
by: Adnan, Muhammad, et al.
Published: (2024)
HCiM: ADC-Less Hybrid Analog-Digital Compute in Memory Accelerator for Deep Learning Workloads
by: Negi, Shubham, et al.
Published: (2024)
by: Negi, Shubham, et al.
Published: (2024)
SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference
by: Choi, Yuseon, et al.
Published: (2025)
by: Choi, Yuseon, et al.
Published: (2025)
MX-SAFE: Versatile Inference- and Training-Proof Microscaling Format with On-the-Fly Exponent and Mantissa Bit Allocation
by: Park, Dahoon, et al.
Published: (2026)
by: Park, Dahoon, et al.
Published: (2026)
TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge
by: Huang, Zhirui, et al.
Published: (2025)
by: Huang, Zhirui, et al.
Published: (2025)
Low Latency GNN Accelerator for Quantum Error Correction
by: Cicero, Alessio, et al.
Published: (2026)
by: Cicero, Alessio, et al.
Published: (2026)
MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness
by: Wang, Huizheng, et al.
Published: (2025)
by: Wang, Huizheng, et al.
Published: (2025)
UniCAIM: A Unified CAM/CIM Architecture with Static-Dynamic KV Cache Pruning for Efficient Long-Context LLM Inference
by: Xu, Weikai, et al.
Published: (2025)
by: Xu, Weikai, et al.
Published: (2025)
Similar Items
-
Towards Performance-Aware Allocation for Accelerated Machine Learning on GPU-SSD Systems
by: Gundawar, Ayush, et al.
Published: (2024) -
Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement
by: Liu, Songze, et al.
Published: (2025) -
GreenMalloc: Allocator Optimisation for Industrial Workloads
by: Dakhama, Aidan, et al.
Published: (2025) -
Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads
by: Li, Boyu, et al.
Published: (2025) -
Allspark: Workload Orchestration for Visual Transformers on Processing In-Memory Systems
by: Ge, Mengke, et al.
Published: (2024)