:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Gui, Haoyuan, Zhang, Xiaoyu, Zhang, Chong, Su, Zitong, Li, Huiyuan
Format:	Preprint
Published:	2024
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2411.16152
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)

Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity
by: Chang, Dali, et al.
Published: (2026)

Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
by: Cao, Zijian, et al.
Published: (2025)

Response time in a pair of processor sharing queues with Join-the-Shortest-Queue scheduling
by: Bor, Julianna, et al.
Published: (2024)

A clustering aggregation algorithm on neutral-atoms and annealing quantum processors
by: Scotti, Riccardo, et al.
Published: (2024)

Performance optimization of BLAS algorithms with band matrices for RISC-V processors
by: Pirova, Anna, et al.
Published: (2025)

DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information
by: Wang, Qiang, et al.
Published: (2024)

Memory Analysis on the Training Course of DeepSeek Models
by: Zhang, Ping, et al.
Published: (2025)

E-QUARTIC: Energy Efficient Edge Ensemble of Convolutional Neural Networks for Resource-Optimized Learning
by: Zhang, Le, et al.
Published: (2024)

Evaluating Compiler Optimization Impacts on zkVM Performance
by: Gassmann, Thomas, et al.
Published: (2025)

8 Years of Optimizing Apache Otava: How disconnected open source developers took an algorithm from n3 to constant time
by: Ingo, Henrik
Published: (2025)

An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning
by: Li, Yunbo, et al.
Published: (2024)

Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)
by: Chu, Xiaoyu, et al.
Published: (2026)

Performance is not All You Need: Sustainability Considerations for Algorithms
by: Li, Xiang, et al.
Published: (2025)

From Profiling to Optimization: Unveiling the Profile Guided Optimization
by: Liu, Bingxin, et al.
Published: (2025)

Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device
by: Zhang, Niansong, et al.
Published: (2025)

TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving Goodput
by: Liu, Xiaoxuan, et al.
Published: (2024)

Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory
by: Ren, Jie, et al.
Published: (2025)

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
by: Huang, Zhaolan, et al.
Published: (2025)

Two-Timescale Dynamic Service Deployment and Task Scheduling with Spatiotemporal Collaboration in Mobile Edge Networks
by: Li, Yang, et al.
Published: (2025)

Two Criteria for Performance Analysis of Optimization Algorithms
by: Jing, Yunpeng, et al.
Published: (2024)

Performance Characterization and Optimizations of Traditional ML Applications
by: Kumar, Harsh, et al.
Published: (2024)

Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things
by: Li, Yang, et al.
Published: (2025)

Attributing the System's Overall Effect to its Components
by: Wang, Chenxi, et al.
Published: (2026)

Pattern Tree: Enhancing Efficiency in Quantum Circuit Optimization Based on Pattern-matching
by: Chen, Mingyu, et al.
Published: (2024)

Scalable Binary CUR Low-Rank Approximation Algorithm
by: Su, Bowen
Published: (2025)

SAfEPaTh: A System-Level Approach for Efficient Power and Thermal Estimation of Convolutional Neural Network Accelerator
by: Chen, Yukai, et al.
Published: (2024)

Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding
by: Long, Fei, et al.
Published: (2026)

A Zoned Storage Optimized Flash Cache on ZNS SSDs
by: Yang, Chongzhuo, et al.
Published: (2024)

Tracing Optimization for Performance Modeling and Regression Detection
by: Shahedi, Kaveh, et al.
Published: (2024)

Redundant Array Computation Elimination
by: Wang, Zixuan, et al.
Published: (2025)

HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
by: Huang, Haochen, et al.
Published: (2025)

Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs
by: Zaeed, Mohammad, et al.
Published: (2025)

CPU Optimization of a Monocular 3D Biomechanics Pipeline for Low-Resource Deployment
by: Zhang, Yan, et al.
Published: (2026)

Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension
by: Chen, Hongguang
Published: (2025)

Towards Efficient Multi-Scale Deformable Attention on NPU
by: Huang, Chenghuan, et al.
Published: (2025)

Underwater Image Enhancement by Convolutional Spiking Neural Networks
by: Sudevan, Vidya, et al.
Published: (2025)

PerfSeer: An Efficient and Accurate Deep Learning Models Performance Predictor
by: Zhao, Xinlong, et al.
Published: (2025)

CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor
by: Xu, Buqing, et al.
Published: (2025)

Optimizing Cloud-native Services with SAGA: A Service Affinity Graph-based Approach
by: Dinh-Tuan, Hai, et al.
Published: (2025)