:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Ding, Jing, Diep, Trung
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Performance Artificial Intelligence C.4
Accesso online:	https://arxiv.org/abs/2507.14000
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Dissecting Embedding Bag Performance in DLRM Inference
di: Ambati, Chandrish, et al.
Pubblicazione: (2025)

Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM
di: Trappen, Tim, et al.
Pubblicazione: (2025)

Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization
di: Prabhakar, Karthik, et al.
Pubblicazione: (2025)

Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality
di: De Sensi, Daniele, et al.
Pubblicazione: (2025)

XTC, A Research Platform for Optimizing AI Workload Operators
di: Hugo, Pompougnac, et al.
Pubblicazione: (2025)

Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference
di: Liu, Yuyang, et al.
Pubblicazione: (2025)

PixLift: Accelerating Web Browsing via AI Upscaling
di: Atinafu, Yonas, et al.
Pubblicazione: (2025)

MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
di: Shakerdargah, Mohammadali, et al.
Pubblicazione: (2024)

AMD MI300X GPU Performance Analysis
di: Ambati, Chandrish, et al.
Pubblicazione: (2025)

Efficient Parallel Multi-Hop Reasoning: A Scalable Approach for Knowledge Graph Analysis
di: Tithi, Jesmin Jahan, et al.
Pubblicazione: (2024)

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs
di: Peng, Hongwu, et al.
Pubblicazione: (2023)

GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code Generation
di: Ilager, Shashikant, et al.
Pubblicazione: (2025)

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference
di: Ganjihal, Sanjeev Rao
Pubblicazione: (2026)

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects
di: De Sensi, Daniele, et al.
Pubblicazione: (2024)

GDEV-AI: A Generalized Evaluation of Deep Learning Inference Scaling and Architectural Saturation
di: Palaniappan, Kathiravan
Pubblicazione: (2026)

ALISE: Accelerating Large Language Model Serving with Speculative Scheduling
di: Zhao, Youpeng, et al.
Pubblicazione: (2024)

Opening the Black Box: Performance Estimation during Code Generation for GPUs
di: Ernst, Dominik, et al.
Pubblicazione: (2021)

Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization
di: Nagoriya, Heet, et al.
Pubblicazione: (2026)

Accelerating AI Performance using Anderson Extrapolation on GPUs
di: Dajani, Saleem Abdul Fattah Ahmed Al, et al.
Pubblicazione: (2024)

AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels
di: Zhu, Yifan, et al.
Pubblicazione: (2026)

Unlocking FedNL: Self-Contained Compute-Optimized Implementation
di: Burlachenko, Konstantin, et al.
Pubblicazione: (2024)

Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models
di: Shashidhar, Sumuk, et al.
Pubblicazione: (2023)

Stochastic Network Calculus with Localized Application of Martingales
di: Bouillard, Anne
Pubblicazione: (2022)

Decomposing Docker Container Startup Performance: A Three-Tier Measurement Study on Heterogeneous Infrastructure
di: Khan, Shamsher
Pubblicazione: (2026)

TurboMem: High-Performance Lock-Free Memory Pool with Transparent Huge Page Auto-Merging for DPDK
di: Yang, Junyi
Pubblicazione: (2026)

D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of Activations
di: Tahmasebi, Faraz, et al.
Pubblicazione: (2025)

Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
di: Pfister, Rolf, et al.
Pubblicazione: (2025)

It's all about PR -- Smart Benchmarking AI Accelerators using Performance Representatives
di: Jung, Alexander Louis-Ferdinand, et al.
Pubblicazione: (2024)

Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices
di: Burgon, Alexis, et al.
Pubblicazione: (2026)

EXAQ: Exponent Aware Quantization For LLMs Acceleration
di: Shkolnik, Moran, et al.
Pubblicazione: (2024)

Personalized Model-Based Design of Human Centric AI enabled CPS for Long term usage
di: Ngabonziza, Bernard, et al.
Pubblicazione: (2026)

Requirements for Quality Assurance of AI Models for Early Detection of Lung Cancer
di: Hahn, Horst K., et al.
Pubblicazione: (2025)

Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities
di: Cavagna, Hiari Pizzini, et al.
Pubblicazione: (2025)

LLM-Driven Design Space Exploration of FPGA-based Accelerators
di: Sharma, Vinamra, et al.
Pubblicazione: (2026)

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
di: Atinafu, Yonas, et al.
Pubblicazione: (2026)

The Configuration Wall: Characterization and Elimination of Accelerator Configuration Overhead
di: Van Delm, Josse, et al.
Pubblicazione: (2025)

KWT-Tiny: RISC-V Accelerated, Embedded Keyword Spotting Transformer
di: Al-Qawlaq, Aness, et al.
Pubblicazione: (2024)

Twill: Scheduling Compound AI Systems on Heterogeneous Mobile Edge Platforms
di: Taufique, Zain, et al.
Pubblicazione: (2025)

KForge: Program Synthesis for Diverse AI Hardware Accelerators
di: Sereda, Taras, et al.
Pubblicazione: (2025)

A performance analysis of VM-based Trusted Execution Environments for Confidential Federated Learning
di: Casella, Bruno
Pubblicazione: (2025)