:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hugo, Pompougnac, Christophe, Guillon, Sylvain, Noiry, Alban, Dutilleul, Guillaume, Iooss, Fabrice, Rastello
Format:	Preprint
Published:	2025
Subjects:	Performance Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.16512
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Performance bottlenecks detection through microarchitectural sensitivity
by: Pompougnac, Hugo, et al.
Published: (2024)

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers
by: Bastian, Théophile, et al.
Published: (2024)

Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
by: Dutilleul, Alban, et al.
Published: (2024)

In-Network Collective Operations: Game Changer or Challenge for AI Workloads?
by: Hoefler, Torsten, et al.
Published: (2026)

DRAGON (Differentiable Graph Execution) : A suite of Hardware Simulation and Optimization tools for Modern AI/Non-AI Workloads
by: Sethi, Khushal
Published: (2022)

SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
by: Ma, Jeffrey Jian, et al.
Published: (2025)

Photonic Fabric Platform for AI Accelerators
by: Ding, Jing, et al.
Published: (2025)

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
by: Zhao, Qidong, et al.
Published: (2024)

Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
by: Taufique, Zain, et al.
Published: (2023)

Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures
by: Vellaisamy, Prabhu, et al.
Published: (2025)

Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
by: John, Chelsea Maria, et al.
Published: (2024)

OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads
by: Agwa, Shady, et al.
Published: (2025)

Should AI Optimize Your Code? A Comparative Study of Classical Optimizing Compilers Versus Current Large Language Models
by: Rosas, Miguel Romero, et al.
Published: (2024)

Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)

SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
by: Odema, Mohanad, et al.
Published: (2024)

PixLift: Accelerating Web Browsing via AI Upscaling
by: Atinafu, Yonas, et al.
Published: (2025)

Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)

Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
by: Pfister, Rolf, et al.
Published: (2025)

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
by: Atinafu, Yonas, et al.
Published: (2026)

Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices
by: Burgon, Alexis, et al.
Published: (2026)

Personalized Model-Based Design of Human Centric AI enabled CPS for Long term usage
by: Ngabonziza, Bernard, et al.
Published: (2026)

Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
by: Zhang, Zongshun, et al.
Published: (2025)

TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving Goodput
by: Liu, Xiaoxuan, et al.
Published: (2024)

Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)

Twill: Scheduling Compound AI Systems on Heterogeneous Mobile Edge Platforms
by: Taufique, Zain, et al.
Published: (2025)

When AI Bends Metal: AI-Assisted Optimization of Design Parameters in Sheet Metal Forming
by: Tarraf, Ahmad, et al.
Published: (2025)

Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems
by: Zhao, Yushang, et al.
Published: (2025)

Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases
by: Dantart, Alex
Published: (2026)

QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference
by: Li, Xiangchen, et al.
Published: (2025)

The Race to Efficiency: A New Perspective on AI Scaling Laws
by: Lu, Chien-Ping
Published: (2025)

Is Sparse Matrix Reordering Effective for Sparse Matrix-Vector Multiplication?
by: Asudeh, Omid, et al.
Published: (2025)

On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)

MoEITS: A Green AI approach for simplifying MoE-LLMs
by: Balderas, Luis, et al.
Published: (2026)

Looking Forward: Challenges and Opportunities in Agentic AI Reliability
by: Xing, Liudong, et al.
Published: (2025)

Information Retrieval in the Age of Generative AI: The RGB Model
by: Garetto, Michele, et al.
Published: (2025)

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
by: Tang, Xinsheng, et al.
Published: (2026)

Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
by: Mohammadabadi, Seyed Mahmoud Sajjadi, et al.
Published: (2024)

Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs
by: Georganas, Evangelos, et al.
Published: (2025)

Revolutionizing System Reliability: The Role of AI in Predictive Maintenance Strategies
by: Bidollahkhani, Michael, et al.
Published: (2024)

LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
by: Cai, Yanan, et al.
Published: (2025)