Saved in:
| Main Authors: | Gui, Haoyuan, Zhang, Xiaoyu, Zhang, Chong, Su, Zitong, Li, Huiyuan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.16152 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)
by: Liu, Shifang, et al.
Published: (2025)
Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity
by: Chang, Dali, et al.
Published: (2026)
by: Chang, Dali, et al.
Published: (2026)
Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
by: Cao, Zijian, et al.
Published: (2025)
by: Cao, Zijian, et al.
Published: (2025)
Response time in a pair of processor sharing queues with Join-the-Shortest-Queue scheduling
by: Bor, Julianna, et al.
Published: (2024)
by: Bor, Julianna, et al.
Published: (2024)
A clustering aggregation algorithm on neutral-atoms and annealing quantum processors
by: Scotti, Riccardo, et al.
Published: (2024)
by: Scotti, Riccardo, et al.
Published: (2024)
Performance optimization of BLAS algorithms with band matrices for RISC-V processors
by: Pirova, Anna, et al.
Published: (2025)
by: Pirova, Anna, et al.
Published: (2025)
DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information
by: Wang, Qiang, et al.
Published: (2024)
by: Wang, Qiang, et al.
Published: (2024)
Memory Analysis on the Training Course of DeepSeek Models
by: Zhang, Ping, et al.
Published: (2025)
by: Zhang, Ping, et al.
Published: (2025)
E-QUARTIC: Energy Efficient Edge Ensemble of Convolutional Neural Networks for Resource-Optimized Learning
by: Zhang, Le, et al.
Published: (2024)
by: Zhang, Le, et al.
Published: (2024)
Evaluating Compiler Optimization Impacts on zkVM Performance
by: Gassmann, Thomas, et al.
Published: (2025)
by: Gassmann, Thomas, et al.
Published: (2025)
8 Years of Optimizing Apache Otava: How disconnected open source developers took an algorithm from n3 to constant time
by: Ingo, Henrik
Published: (2025)
by: Ingo, Henrik
Published: (2025)
An Experimental Study of Different Aggregation Schemes in Semi-Asynchronous Federated Learning
by: Li, Yunbo, et al.
Published: (2024)
by: Li, Yunbo, et al.
Published: (2024)
Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)
by: Chu, Xiaoyu, et al.
Published: (2026)
by: Chu, Xiaoyu, et al.
Published: (2026)
Performance is not All You Need: Sustainability Considerations for Algorithms
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
From Profiling to Optimization: Unveiling the Profile Guided Optimization
by: Liu, Bingxin, et al.
Published: (2025)
by: Liu, Bingxin, et al.
Published: (2025)
Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device
by: Zhang, Niansong, et al.
Published: (2025)
by: Zhang, Niansong, et al.
Published: (2025)
TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving Goodput
by: Liu, Xiaoxuan, et al.
Published: (2024)
by: Liu, Xiaoxuan, et al.
Published: (2024)
Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory
by: Ren, Jie, et al.
Published: (2025)
by: Ren, Jie, et al.
Published: (2025)
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
by: Huang, Zhaolan, et al.
Published: (2025)
by: Huang, Zhaolan, et al.
Published: (2025)
Two-Timescale Dynamic Service Deployment and Task Scheduling with Spatiotemporal Collaboration in Mobile Edge Networks
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Two Criteria for Performance Analysis of Optimization Algorithms
by: Jing, Yunpeng, et al.
Published: (2024)
by: Jing, Yunpeng, et al.
Published: (2024)
Performance Characterization and Optimizations of Traditional ML Applications
by: Kumar, Harsh, et al.
Published: (2024)
by: Kumar, Harsh, et al.
Published: (2024)
Spatiotemporal Non-Uniformity-Aware Online Task Scheduling in Collaborative Edge Computing for Industrial Internet of Things
by: Li, Yang, et al.
Published: (2025)
by: Li, Yang, et al.
Published: (2025)
Attributing the System's Overall Effect to its Components
by: Wang, Chenxi, et al.
Published: (2026)
by: Wang, Chenxi, et al.
Published: (2026)
Pattern Tree: Enhancing Efficiency in Quantum Circuit Optimization Based on Pattern-matching
by: Chen, Mingyu, et al.
Published: (2024)
by: Chen, Mingyu, et al.
Published: (2024)
Scalable Binary CUR Low-Rank Approximation Algorithm
by: Su, Bowen
Published: (2025)
by: Su, Bowen
Published: (2025)
SAfEPaTh: A System-Level Approach for Efficient Power and Thermal Estimation of Convolutional Neural Network Accelerator
by: Chen, Yukai, et al.
Published: (2024)
by: Chen, Yukai, et al.
Published: (2024)
Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding
by: Long, Fei, et al.
Published: (2026)
by: Long, Fei, et al.
Published: (2026)
A Zoned Storage Optimized Flash Cache on ZNS SSDs
by: Yang, Chongzhuo, et al.
Published: (2024)
by: Yang, Chongzhuo, et al.
Published: (2024)
Tracing Optimization for Performance Modeling and Regression Detection
by: Shahedi, Kaveh, et al.
Published: (2024)
by: Shahedi, Kaveh, et al.
Published: (2024)
Redundant Array Computation Elimination
by: Wang, Zixuan, et al.
Published: (2025)
by: Wang, Zixuan, et al.
Published: (2025)
HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
by: Huang, Haochen, et al.
Published: (2025)
by: Huang, Haochen, et al.
Published: (2025)
Opal: A Modular Framework for Optimizing Performance using Analytics and LLMs
by: Zaeed, Mohammad, et al.
Published: (2025)
by: Zaeed, Mohammad, et al.
Published: (2025)
CPU Optimization of a Monocular 3D Biomechanics Pipeline for Low-Resource Deployment
by: Zhang, Yan, et al.
Published: (2026)
by: Zhang, Yan, et al.
Published: (2026)
Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension
by: Chen, Hongguang
Published: (2025)
by: Chen, Hongguang
Published: (2025)
Towards Efficient Multi-Scale Deformable Attention on NPU
by: Huang, Chenghuan, et al.
Published: (2025)
by: Huang, Chenghuan, et al.
Published: (2025)
Underwater Image Enhancement by Convolutional Spiking Neural Networks
by: Sudevan, Vidya, et al.
Published: (2025)
by: Sudevan, Vidya, et al.
Published: (2025)
PerfSeer: An Efficient and Accurate Deep Learning Models Performance Predictor
by: Zhao, Xinlong, et al.
Published: (2025)
by: Zhao, Xinlong, et al.
Published: (2025)
CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor
by: Xu, Buqing, et al.
Published: (2025)
by: Xu, Buqing, et al.
Published: (2025)
Optimizing Cloud-native Services with SAGA: A Service Affinity Graph-based Approach
by: Dinh-Tuan, Hai, et al.
Published: (2025)
by: Dinh-Tuan, Hai, et al.
Published: (2025)
Similar Items
-
Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025) -
Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity
by: Chang, Dali, et al.
Published: (2026) -
Towards a Higher Roofline for Matrix-Vector Multiplication in Matrix-Free HOSFEM
by: Cao, Zijian, et al.
Published: (2025) -
Response time in a pair of processor sharing queues with Join-the-Shortest-Queue scheduling
by: Bor, Julianna, et al.
Published: (2024) -
A clustering aggregation algorithm on neutral-atoms and annealing quantum processors
by: Scotti, Riccardo, et al.
Published: (2024)