:: Library Catalog

Imatge de la portada

Guardat en:

Dades bibliogràfiques
Autors principals:	Liu, Lian, Zhao, Shixin, Zhou, Yutian, He, Yintao, Wang, Mengdi, Han, Yinhe, Wang, Ying
Format:	Preprint
Publicat:	2026
Matèries:	Hardware Architecture Distributed, Parallel, and Cluster Computing
Accés en línia:	https://arxiv.org/abs/2602.11521
Etiquetes:	Afegir etiqueta Sense etiquetes, Sigues el primer a etiquetar aquest registre!

Ítems similars

TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading
per: Pan, Yudong, et al.
Publicat: (2026)

Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading
per: Meng, William, et al.
Publicat: (2025)

Adaptive KV Cache Reuse for Fast Long-Context LLM Serving
per: li, Fei, et al.
Publicat: (2026)

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
per: Qin, Ruoyu, et al.
Publicat: (2024)

Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens
per: Yu, Yanpeng, et al.
Publicat: (2025)

PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving
per: Yüzügüler, Ahmet Caner, et al.
Publicat: (2025)

Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service
per: Zheng, Xianzhe, et al.
Publicat: (2026)

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention
per: Wang, Haoxuan, et al.
Publicat: (2026)

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
per: Fan, Ruibo, et al.
Publicat: (2026)

FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
per: Wang, Zeke, et al.
Publicat: (2025)

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
per: Ghiasi, Nika Mansouri, et al.
Publicat: (2022)

PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System
per: He, Yintao, et al.
Publicat: (2025)

GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
per: Shi, Tianyao, et al.
Publicat: (2024)

A Modern Primer on Processing in Memory
per: Mutlu, Onur, et al.
Publicat: (2020)

PiKV: KV Cache Management System for Mixture of Experts
per: Liu, Dong, et al.
Publicat: (2025)

Accelerating Triangle Counting with Real Processing-in-Memory Systems
per: Asquini, Lorenzo, et al.
Publicat: (2025)

Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
per: Ibrahim, Mohamed Assem, et al.
Publicat: (2024)

Memory-Centric Computing: Recent Advances in Processing-in-DRAM
per: Mutlu, Onur, et al.
Publicat: (2024)

New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
per: Oliveira, Geraldo F.
Publicat: (2025)

Efficient Architecture for RISC-V Vector Memory Access
per: Guan, Hongyi, et al.
Publicat: (2025)

UniFormer: Unified and Efficient Transformer for Reasoning Across General and Custom Computing
per: Ran, Zhuoheng, et al.
Publicat: (2025)

SpArch: Efficient Architecture for Sparse Matrix Multiplication
per: Zhang, Zhekai, et al.
Publicat: (2020)

ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System
per: Barkhordar, Marzieh, et al.
Publicat: (2026)

SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems
per: Kim, Hyeseong, et al.
Publicat: (2026)

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
per: Lin, Bin, et al.
Publicat: (2024)

RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs
per: Chen, Yanru, et al.
Publicat: (2025)

Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters
per: Wang, Jing, et al.
Publicat: (2025)

Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
per: Tian, Yuyang, et al.
Publicat: (2025)

Memory-Centric Computing: Solving Computing's Memory Problem
per: Mutlu, Onur, et al.
Publicat: (2025)

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing
per: Oliveira, Geraldo F., et al.
Publicat: (2024)

PIMDAL: Mitigating the Memory Bottleneck in Data Analytics using a Real Processing-in-Memory System
per: Frouzakis, Manos, et al.
Publicat: (2025)

TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
per: Zhang, Yichao, et al.
Publicat: (2026)

Analyzing a Two-Tier Disaggregated Memory Protection Scheme Based on Memory Replication
per: Volos, Haris, et al.
Publicat: (2025)

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel
per: Liu, Yiqi, et al.
Publicat: (2026)

Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures
per: Jarmusch, Aaron, et al.
Publicat: (2026)

Investigating Memory Failure Prediction Across CPU Architectures
per: Yu, Qiao, et al.
Publicat: (2024)

PIUMA: Programmable Integrated Unified Memory Architecture
per: Aananthakrishnan, Sriram, et al.
Publicat: (2020)

PhD Forum: Efficient Privacy-Preserving Processing via Memory-Centric Computing
per: Mwaisela, Mpoki
Publicat: (2024)

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs
per: Lei, Jianlong, et al.
Publicat: (2026)

FengHuang: Next-Generation Memory Orchestration for AI Inferencing
per: Li, Jiamin, et al.
Publicat: (2025)