:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Katagiri, Takahiro
Format:	Preprint
Published:	2024
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2408.16607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RAO-SS: A Prototype of Run-time Auto-tuning Facility for Sparse Direct Solvers
by: Katagiri, Takahiro, et al.
Published: (2024)

An Auto-tuning Method for Run-time Data Transformation for Sparse Matrix-Vector Multiplication
by: Katagiri, Takahiro, et al.
Published: (2024)

Xabclib:A Fully Auto-tuned Sparse Iterative Solver
by: Katagiri, Takahiro, et al.
Published: (2024)

Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach
by: Hanyu, Tatsuro, et al.
Published: (2025)

A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices
by: Katagiri, Takahiro, et al.
Published: (2024)

Performance Evaluation of CMOS Annealing with Support Vector Machine
by: Fukuhara, Ryoga, et al.
Published: (2024)

MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning
by: Jam, Mathys, et al.
Published: (2025)

WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026)

Learning-Augmented Performance Model for Tensor Product Factorization in High-Order FEM
by: Ren, Xuanzhengbo, et al.
Published: (2026)

Bringing Auto-tuning to HIP: Analysis of Tuning Impact and Difficulty on AMD and Nvidia GPUs
by: Lurati, Milo, et al.
Published: (2024)

iSpLib: A Library for Accelerating Graph Neural Networks using Auto-tuned Sparse Operations
by: Anik, Md Saidul Hoque, et al.
Published: (2024)

Effects of the Auto-Correlation of Delays on the Age of Information: A Gaussian Process Framework
by: Inoie, Atsushi, et al.
Published: (2025)

Performance Characterization of AutoNUMA Memory Tiering on Graph Analytics
by: Moura, Diego, et al.
Published: (2022)

A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading
by: Atif, Mohammad, et al.
Published: (2025)

Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework
by: Peccia, F. N., et al.
Published: (2022)

Integrating ytopt and libEnsemble to Autotune OpenMC
by: Wu, Xingfu, et al.
Published: (2024)

Deploying Open-Source Large Language Models: A performance Analysis
by: Bendi-Ouis, Yannis, et al.
Published: (2024)

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)

AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention
by: Stankovic, Aleksandar
Published: (2025)

Assessing the Performance of OpenTitan as Cryptographic Accelerator in Secure Open-Hardware System-on-Chips
by: Parisi, Emanuele, et al.
Published: (2024)

DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)

Towards a Scalable and Efficient PGAS-based Distributed OpenMP
by: Shan, Baodi, et al.
Published: (2024)

AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels
by: Zhu, Yifan, et al.
Published: (2026)

Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective
by: Benazir, Afsara, et al.
Published: (2025)

CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor
by: Xu, Buqing, et al.
Published: (2025)

Tuning the Tuner: Introducing Hyperparameter Optimization for Auto-Tuning
by: Willemsen, Floris-Jan, et al.
Published: (2025)

Optimizing Cloud-native Services with SAGA: A Service Affinity Graph-based Approach
by: Dinh-Tuan, Hai, et al.
Published: (2025)

AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs
by: Kumar, Anshul, et al.
Published: (2025)

Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
by: Pfister, Rolf, et al.
Published: (2025)

A relação entre a «performance» social e a «performance» económico-financeira
by: Daniel Taborda
Published: (2007)

From Concept to Reality: 5G Positioning with Open-Source Implementation of UL-TDoA in OpenAirInterface
by: Malik, Adeel, et al.
Published: (2024)

An Empirical Study on Method-Level Performance Evolution in Open-Source Java Projects
by: Shahedi, Kaveh, et al.
Published: (2025)

AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models
by: Mayr, Martin, et al.
Published: (2026)

PowerSensor3: A Fast and Accurate Open Source Power Measurement Tool
by: van der Vlugt, Steven, et al.
Published: (2025)

DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System
by: Rashid, Md Hasanur, et al.
Published: (2026)

Automated PMC-based Power Modeling Methodology for Modern Mobile GPUs
by: Dash, Pranab, et al.
Published: (2024)

Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observations
by: Rychkov, Valentin, et al.
Published: (2024)

Inspection of I/O Operations from System Call Traces using Directly-Follows-Graph
by: Sankaran, Aravind, et al.
Published: (2024)

Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions
by: Ataie, Ehsan, et al.
Published: (2026)

Tuning Fast Memory Size based on Modeling of Page Migration for Tiered Memory
by: Chen, Shangye, et al.
Published: (2024)