:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Jie, Ma, Bin, Yang, Shuangyan, Francis, Benjamin, Ardestani, Ehsan K., Si, Min, Li, Dong
Format:	Preprint
Published:	2025
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2511.08568
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory
by: Chen, Shangye, et al.
Published: (2024)

Dissecting Embedding Bag Performance in DLRM Inference
by: Ambati, Chandrish, et al.
Published: (2025)

Performance Characterization of AutoNUMA Memory Tiering on Graph Analytics
by: Moura, Diego, et al.
Published: (2022)

A Limits Study of Memory-side Tiering Telemetry
by: Petrucci, Vinicius, et al.
Published: (2025)

Exploring and Evaluating Real-world CXL: Use Cases and System Adoption
by: Wang, Xi, et al.
Published: (2024)

Hybrid Adaptive Tuning for Tiered Memory Systems
by: Wang, Xi, et al.
Published: (2026)

Modeling Utilization to Identify Shared-Memory Atomic Bottlenecks
by: Dong, Rongcui, et al.
Published: (2025)

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference
by: Ganjihal, Sanjeev Rao
Published: (2026)

Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
by: Zhang, Yaozheng, et al.
Published: (2025)

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference
by: Zhao, Xuanlei, et al.
Published: (2024)

Optimizing System Memory Bandwidth with Micron CXL Memory Expansion Modules on Intel Xeon 6 Processors
by: Sehgal, Rohit, et al.
Published: (2024)

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
by: Kermani, Arshia, et al.
Published: (2025)

Heterogeneous Memory Pool Tuning
by: Vaverka, Filip, et al.
Published: (2025)

FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models
by: Shao, Zishan, et al.
Published: (2025)

Accelerating LLM Inference via Dynamic KV Cache Placement in Heterogeneous Memory System
by: Fang, Yunhua, et al.
Published: (2025)

Examem: Low-Overhead Memory Instrumentation for Intelligent Memory Systems
by: Poduval, Ashwin, et al.
Published: (2024)

ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory
by: Wang, Liangyu, et al.
Published: (2025)

Glinthawk: A Two-Tiered Architecture for Offline LLM Inference
by: Hamadanian, Pouya, et al.
Published: (2025)

Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)

Updates on the Low-Level Abstraction of Memory Access
by: Gruber, Bernhard Manfred
Published: (2023)

FB$^+$-tree: A Memory-Optimized B$^+$-tree with Latch-Free Update
by: Chen, Yuan, et al.
Published: (2025)

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures
by: Yang, Peiming, et al.
Published: (2025)

Analysis and Evaluation of Using Microsecond-Latency Memory for In-Memory Indices and Caches in SSD-Based Key-Value Stores
by: Bando, Yosuke, et al.
Published: (2025)

Heterogeneous Memory Benchmarking Toolkit
by: Ghaemi, Golsana, et al.
Published: (2025)

Virtual-Memory Powersort
by: Moltmann, Finn, et al.
Published: (2026)

EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC
by: Shen, Siyuan, et al.
Published: (2025)

Beating vDSP: A 138 GFLOPS Radix-8 Stockham FFT on Apple Silicon via Two-Tier Register-Threadgroup Memory Decomposition
by: Bergach, Mohamed Amine
Published: (2026)

CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
by: Suo, Jiashun, et al.
Published: (2025)

HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
by: Huang, Haochen, et al.
Published: (2025)

WritePolicyBench: Benchmarking Memory Write Policies under Byte Budgets
by: Cham, Edgard El
Published: (2026)

A Controlled Study of Memory Hierarchy Transitions in Quantum Circuit Simulation on Apple M4 Pro Unified Memory Architecture
by: Pratipat, Gyan
Published: (2026)

Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024)

From Profiling to Optimization: Unveiling the Profile Guided Optimization
by: Liu, Bingxin, et al.
Published: (2025)

Putting the Context back into Memory
by: Roberts, David A.
Published: (2025)

Memory Analysis on the Training Course of DeepSeek Models
by: Zhang, Ping, et al.
Published: (2025)

ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025)

DGAP: Efficient Dynamic Graph Analysis on Persistent Memory
by: Islam, Abdullah Al Raqibul, et al.
Published: (2024)

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression
by: Fan, Ruibo, et al.
Published: (2026)

CUTHERMO: Understanding GPU Memory Inefficiencies with Heat Map Profiling
by: Zhao, Yanbo, et al.
Published: (2025)

ChatNeuroSim: An LLM Agent Framework for Automated Compute-in-Memory Accelerator Deployment and Optimization
by: Lee, Ming-Yen, et al.
Published: (2026)