Saved in:
| Main Authors: | Chen, Hao, Tian, Cong, He, Zixuan, Yu, Bin, Liu, Yepang, Cao, Jialun |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.11269 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark
by: Risco-Martín, José L., et al.
Published: (2024)
by: Risco-Martín, José L., et al.
Published: (2024)
Optimizing Stateful Microservice Migration in Kubernetes with MS2M and Forensic Checkpointing
by: Dinh-Tuan, Hai, et al.
Published: (2025)
by: Dinh-Tuan, Hai, et al.
Published: (2025)
A relação entre a «performance» social e a «performance» económico-financeira
by: Daniel Taborda
Published: (2007)
by: Daniel Taborda
Published: (2007)
Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024)
by: Wang, Qipeng, et al.
Published: (2024)
denet, A lightweight command-line tool for process monitoring in benchmarking and beyond
by: Carrillo, Ben, et al.
Published: (2025)
by: Carrillo, Ben, et al.
Published: (2025)
OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis Designs
by: Sarkar, Rishov, et al.
Published: (2025)
by: Sarkar, Rishov, et al.
Published: (2025)
Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory
by: Ren, Jie, et al.
Published: (2025)
by: Ren, Jie, et al.
Published: (2025)
TALP-Pages: An easy-to-integrate continuous performance monitoring framework
by: Seitz, Valentin, et al.
Published: (2025)
by: Seitz, Valentin, et al.
Published: (2025)
The Price of Interoperability: Exploring Cross-Chain Bridges and Their Economic Consequences
by: Cao, Yiyue, et al.
Published: (2026)
by: Cao, Yiyue, et al.
Published: (2026)
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
by: Holmes, Connor, et al.
Published: (2024)
by: Holmes, Connor, et al.
Published: (2024)
Empirical validity of basic profile and conceptual evaluation model of the annual performance appraisal process of the National Colleges of Education lecturers in Sri Lanka
by: Pathiraja, N. Kumara
Published: (2025)
by: Pathiraja, N. Kumara
Published: (2025)
LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization
by: Sarkar, Rishov, et al.
Published: (2024)
by: Sarkar, Rishov, et al.
Published: (2024)
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
by: Zhao, Qidong, et al.
Published: (2024)
by: Zhao, Qidong, et al.
Published: (2024)
State of the art of ergonomic costs as criterion for evaluating and improving organizational performance in industry
by: Silvana Duarte-dos Santos
Published: (2015)
by: Silvana Duarte-dos Santos
Published: (2015)
Meta-Metrics and Best Practices for System-Level Inference Performance Benchmarking
by: Salaria, Shweta, et al.
Published: (2025)
by: Salaria, Shweta, et al.
Published: (2025)
Back to Bits: Extending Shannon's communication performance framework to computing
by: Hawkins, Max, et al.
Published: (2025)
by: Hawkins, Max, et al.
Published: (2025)
Redundant Array Computation Elimination
by: Wang, Zixuan, et al.
Published: (2025)
by: Wang, Zixuan, et al.
Published: (2025)
Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments
by: Zhu, Yuhan, et al.
Published: (2024)
by: Zhu, Yuhan, et al.
Published: (2024)
Understanding Inference Scaling for LLMs: Bottlenecks, Trade-offs, and Performance Principles
by: Arif, Moiz, et al.
Published: (2026)
by: Arif, Moiz, et al.
Published: (2026)
A dynamic parallel method for performance optimization on hybrid CPUs
by: Yu, Luo, et al.
Published: (2024)
by: Yu, Luo, et al.
Published: (2024)
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)
by: Zhao, Xuanlei, et al.
Published: (2024)
RWKV-edge: Deeply Compressed RWKV for Resource-Constrained Devices
by: Choe, Wonkyo, et al.
Published: (2024)
by: Choe, Wonkyo, et al.
Published: (2024)
Dissecting Embedding Bag Performance in DLRM Inference
by: Ambati, Chandrish, et al.
Published: (2025)
by: Ambati, Chandrish, et al.
Published: (2025)
LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
Statistical Modeling and Uncertainty Estimation of LLM Inference Systems
by: Ray, Kaustabha, et al.
Published: (2025)
by: Ray, Kaustabha, et al.
Published: (2025)
Pulse-engineered Controlled-V gate and its applications on superconducting quantum device
by: Satoh, Takahiko, et al.
Published: (2021)
by: Satoh, Takahiko, et al.
Published: (2021)
Modeling Tradeoffs between mobility, cost, and performance in Edge Computing
by: Waseem, Muhammad Danish, et al.
Published: (2026)
by: Waseem, Muhammad Danish, et al.
Published: (2026)
H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference
by: Fu, Zizhuo, et al.
Published: (2025)
by: Fu, Zizhuo, et al.
Published: (2025)
Performance metrics for the continuous distribution of entanglement in multi-user quantum networks
by: Iñesta, Álvaro G., et al.
Published: (2023)
by: Iñesta, Álvaro G., et al.
Published: (2023)
A high-performance and portable implementation of the SISSO method for CPUs and GPUs
by: Eibl, Sebastian, et al.
Published: (2025)
by: Eibl, Sebastian, et al.
Published: (2025)
Characterize LSM-tree Compaction Performance via On-Device LLM Inference
by: Ding, Jiabiao, et al.
Published: (2026)
by: Ding, Jiabiao, et al.
Published: (2026)
Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices
by: Ganbarov, Ali, et al.
Published: (2024)
by: Ganbarov, Ali, et al.
Published: (2024)
Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective
by: Benazir, Afsara, et al.
Published: (2025)
by: Benazir, Afsara, et al.
Published: (2025)
Explainable Port Mapping Inference with Sparse Performance Counters for AMD's Zen Architectures
by: Ritter, Fabian, et al.
Published: (2024)
by: Ritter, Fabian, et al.
Published: (2024)
SparseInfer: Training-free Prediction of Activation Sparsity for Fast LLM Inference
by: Shin, Jiho, et al.
Published: (2024)
by: Shin, Jiho, et al.
Published: (2024)
Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking
by: Wang, Tuowei, et al.
Published: (2024)
by: Wang, Tuowei, et al.
Published: (2024)
KVDirect: Distributed Disaggregated LLM Inference
by: Chen, Shiyang, et al.
Published: (2024)
by: Chen, Shiyang, et al.
Published: (2024)
Towards self-optimization of publish/subscribe IoT systems using continuous performance monitoring
by: Djahafi, Mohammed, et al.
Published: (2024)
by: Djahafi, Mohammed, et al.
Published: (2024)
How Much Parallelism Is "Free"? A Principle of Near-Free Parallelism for Parallel Decoding
by: He, Minghua, et al.
Published: (2026)
by: He, Minghua, et al.
Published: (2026)
Reducing Compute Waste in LLMs through Kernel-Level DVFS
by: Spaan, Jeffrey, et al.
Published: (2026)
by: Spaan, Jeffrey, et al.
Published: (2026)
Similar Items
-
Reconsidering the performance of DEVS modeling and simulation environments using the DEVStone benchmark
by: Risco-Martín, José L., et al.
Published: (2024) -
Optimizing Stateful Microservice Migration in Kubernetes with MS2M and Forensic Checkpointing
by: Dinh-Tuan, Hai, et al.
Published: (2025) -
A relação entre a «performance» social e a «performance» económico-financeira
by: Daniel Taborda
Published: (2007) -
Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024) -
denet, A lightweight command-line tool for process monitoring in benchmarking and beyond
by: Carrillo, Ben, et al.
Published: (2025)