:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Hao, Tian, Cong, He, Zixuan, Yu, Bin, Liu, Yepang, Cao, Jialun
Format:	Preprint
Published:	2025
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2508.11269
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark
by: Risco-Martín, José L., et al.
Published: (2024)

Optimizing Stateful Microservice Migration in Kubernetes with MS2M and Forensic Checkpointing
by: Dinh-Tuan, Hai, et al.
Published: (2025)

A relação entre a «performance» social e a «performance» económico-financeira
by: Daniel Taborda
Published: (2007)

Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024)

denet, A lightweight command-line tool for process monitoring in benchmarking and beyond
by: Carrillo, Ben, et al.
Published: (2025)

OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis Designs
by: Sarkar, Rishov, et al.
Published: (2025)

Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory
by: Ren, Jie, et al.
Published: (2025)

TALP-Pages: An easy-to-integrate continuous performance monitoring framework
by: Seitz, Valentin, et al.
Published: (2025)

The Price of Interoperability: Exploring Cross-Chain Bridges and Their Economic Consequences
by: Cao, Yiyue, et al.
Published: (2026)

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
by: Holmes, Connor, et al.
Published: (2024)

Empirical validity of basic profile and conceptual evaluation model of the annual performance appraisal process of the National Colleges of Education lecturers in Sri Lanka
by: Pathiraja, N. Kumara
Published: (2025)

LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization
by: Sarkar, Rishov, et al.
Published: (2024)

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
by: Zhao, Qidong, et al.
Published: (2024)

State of the art of ergonomic costs as criterion for evaluating and improving organizational performance in industry
by: Silvana Duarte-dos Santos
Published: (2015)

Meta-Metrics and Best Practices for System-Level Inference Performance Benchmarking
by: Salaria, Shweta, et al.
Published: (2025)

Back to Bits: Extending Shannon's communication performance framework to computing
by: Hawkins, Max, et al.
Published: (2025)

Redundant Array Computation Elimination
by: Wang, Zixuan, et al.
Published: (2025)

Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments
by: Zhu, Yuhan, et al.
Published: (2024)

Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
by: Arif, Moiz, et al.
Published: (2026)

A dynamic parallel method for performance optimization on hybrid CPUs
by: Yu, Luo, et al.
Published: (2024)

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)

RWKV-edge: Deeply Compressed RWKV for Resource-Constrained Devices
by: Choe, Wonkyo, et al.
Published: (2024)

Dissecting Embedding Bag Performance in DLRM Inference
by: Ambati, Chandrish, et al.
Published: (2025)

LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)

Statistical Modeling and Uncertainty Estimation of LLM Inference Systems
by: Ray, Kaustabha, et al.
Published: (2025)

Pulse-engineered Controlled-V gate and its applications on superconducting quantum device
by: Satoh, Takahiko, et al.
Published: (2021)

Modeling Tradeoffs between mobility, cost, and performance in Edge Computing
by: Waseem, Muhammad Danish, et al.
Published: (2026)

H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
by: Fu, Zizhuo, et al.
Published: (2025)

Performance metrics for the continuous distribution of entanglement in multi-user quantum networks
by: Iñesta, Álvaro G., et al.
Published: (2023)

A high-performance and portable implementation of the SISSO method for CPUs and GPUs
by: Eibl, Sebastian, et al.
Published: (2025)

Characterize LSM-tree Compaction Performance via On-Device LLM Inference
by: Ding, Jiabiao, et al.
Published: (2026)

Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices
by: Ganbarov, Ali, et al.
Published: (2024)

Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective
by: Benazir, Afsara, et al.
Published: (2025)

Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures
by: Ritter, Fabian, et al.
Published: (2024)

SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference
by: Shin, Jiho, et al.
Published: (2024)

Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking
by: Wang, Tuowei, et al.
Published: (2024)

KVDirect: Distributed Disaggregated LLM Inference
by: Chen, Shiyang, et al.
Published: (2024)

Towards self-optimization of publish/subscribe IoT systems using continuous performance monitoring
by: Djahafi, Mohammed, et al.
Published: (2024)

How Much Parallelism Is "Free"? A Principle of Near-Free Parallelism for Parallel Decoding
by: He, Minghua, et al.
Published: (2026)

Reducing Compute Waste in LLMs through Kernel-Level DVFS
by: Spaan, Jeffrey, et al.
Published: (2026)