Saved in:
| Main Authors: | Hugo, Pompougnac, Christophe, Guillon, Sylvain, Noiry, Alban, Dutilleul, Guillaume, Iooss, Fabrice, Rastello |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.16512 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Performance bottlenecks detection through microarchitectural sensitivity
by: Pompougnac, Hugo, et al.
Published: (2024)
by: Pompougnac, Hugo, et al.
Published: (2024)
CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers
by: Bastian, Théophile, et al.
Published: (2024)
by: Bastian, Théophile, et al.
Published: (2024)
Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
by: Dutilleul, Alban, et al.
Published: (2024)
by: Dutilleul, Alban, et al.
Published: (2024)
In-Network Collective Operations: Game Changer or Challenge for AI Workloads?
by: Hoefler, Torsten, et al.
Published: (2026)
by: Hoefler, Torsten, et al.
Published: (2026)
DRAGON (Differentiable Graph Execution) : A suite of Hardware Simulation and Optimization tools for Modern AI/Non-AI Workloads
by: Sethi, Khushal
Published: (2022)
by: Sethi, Khushal
Published: (2022)
SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?
by: Ma, Jeffrey Jian, et al.
Published: (2025)
by: Ma, Jeffrey Jian, et al.
Published: (2025)
Photonic Fabric Platform for AI Accelerators
by: Ding, Jing, et al.
Published: (2025)
by: Ding, Jing, et al.
Published: (2025)
DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads
by: Zhao, Qidong, et al.
Published: (2024)
by: Zhao, Qidong, et al.
Published: (2024)
Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
by: Taufique, Zain, et al.
Published: (2023)
by: Taufique, Zain, et al.
Published: (2023)
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures
by: Vellaisamy, Prabhu, et al.
Published: (2025)
by: Vellaisamy, Prabhu, et al.
Published: (2025)
Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
by: John, Chelsea Maria, et al.
Published: (2024)
by: John, Chelsea Maria, et al.
Published: (2024)
OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads
by: Agwa, Shady, et al.
Published: (2025)
by: Agwa, Shady, et al.
Published: (2025)
Should AI Optimize Your Code? A Comparative Study of Classical Optimizing Compilers Versus Current Large Language Models
by: Rosas, Miguel Romero, et al.
Published: (2024)
by: Rosas, Miguel Romero, et al.
Published: (2024)
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)
by: Almurshed, Osama, et al.
Published: (2025)
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
by: Odema, Mohanad, et al.
Published: (2024)
by: Odema, Mohanad, et al.
Published: (2024)
PixLift: Accelerating Web Browsing via AI Upscaling
by: Atinafu, Yonas, et al.
Published: (2025)
by: Atinafu, Yonas, et al.
Published: (2025)
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)
by: Barad, Haim, et al.
Published: (2023)
Understanding and Benchmarking Artificial Intelligence: OpenAI's o3 Is Not AGI
by: Pfister, Rolf, et al.
Published: (2025)
by: Pfister, Rolf, et al.
Published: (2025)
Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
by: Atinafu, Yonas, et al.
Published: (2026)
by: Atinafu, Yonas, et al.
Published: (2026)
Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices
by: Burgon, Alexis, et al.
Published: (2026)
by: Burgon, Alexis, et al.
Published: (2026)
Personalized Model-Based Design of Human Centric AI enabled CPS for Long term usage
by: Ngabonziza, Bernard, et al.
Published: (2026)
by: Ngabonziza, Bernard, et al.
Published: (2026)
Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions
by: Zhang, Zongshun, et al.
Published: (2025)
by: Zhang, Zongshun, et al.
Published: (2025)
TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving Goodput
by: Liu, Xiaoxuan, et al.
Published: (2024)
by: Liu, Xiaoxuan, et al.
Published: (2024)
Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)
by: Jha, Mayank
Published: (2026)
Twill: Scheduling Compound AI Systems on Heterogeneous Mobile Edge Platforms
by: Taufique, Zain, et al.
Published: (2025)
by: Taufique, Zain, et al.
Published: (2025)
When AI Bends Metal: AI-Assisted Optimization of Design Parameters in Sheet Metal Forming
by: Tarraf, Ahmad, et al.
Published: (2025)
by: Tarraf, Ahmad, et al.
Published: (2025)
Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems
by: Zhao, Yushang, et al.
Published: (2025)
by: Zhao, Yushang, et al.
Published: (2025)
Reliability by design: quantifying and eliminating fabrication risk in LLMs. From generative to consultative AI: a comparative analysis in the legal domain and lessons for high-stakes knowledge bases
by: Dantart, Alex
Published: (2026)
by: Dantart, Alex
Published: (2026)
QPART: Adaptive Model Quantization and Dynamic Workload Balancing for Accuracy-aware Edge Inference
by: Li, Xiangchen, et al.
Published: (2025)
by: Li, Xiangchen, et al.
Published: (2025)
The Race to Efficiency: A New Perspective on AI Scaling Laws
by: Lu, Chien-Ping
Published: (2025)
by: Lu, Chien-Ping
Published: (2025)
Is Sparse Matrix Reordering Effective for Sparse Matrix-Vector Multiplication?
by: Asudeh, Omid, et al.
Published: (2025)
by: Asudeh, Omid, et al.
Published: (2025)
On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)
by: Sobhani, Ghazal, et al.
Published: (2025)
MoEITS: A Green AI approach for simplifying MoE-LLMs
by: Balderas, Luis, et al.
Published: (2026)
by: Balderas, Luis, et al.
Published: (2026)
Looking Forward: Challenges and Opportunities in Agentic AI Reliability
by: Xing, Liudong, et al.
Published: (2025)
by: Xing, Liudong, et al.
Published: (2025)
Information Retrieval in the Age of Generative AI: The RGB Model
by: Garetto, Michele, et al.
Published: (2025)
by: Garetto, Michele, et al.
Published: (2025)
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators
by: Tang, Xinsheng, et al.
Published: (2026)
by: Tang, Xinsheng, et al.
Published: (2026)
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning
by: Mohammadabadi, Seyed Mahmoud Sajjadi, et al.
Published: (2024)
by: Mohammadabadi, Seyed Mahmoud Sajjadi, et al.
Published: (2024)
Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs
by: Georganas, Evangelos, et al.
Published: (2025)
by: Georganas, Evangelos, et al.
Published: (2025)
Revolutionizing System Reliability: The Role of AI in Predictive Maintenance Strategies
by: Bidollahkhani, Michael, et al.
Published: (2024)
by: Bidollahkhani, Michael, et al.
Published: (2024)
LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
by: Cai, Yanan, et al.
Published: (2025)
by: Cai, Yanan, et al.
Published: (2025)
Similar Items
-
Performance bottlenecks detection through microarchitectural sensitivity
by: Pompougnac, Hugo, et al.
Published: (2024) -
CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers
by: Bastian, Théophile, et al.
Published: (2024) -
Performance Debugging through Microarchitectural Sensitivity and Causality Analysis
by: Dutilleul, Alban, et al.
Published: (2024) -
In-Network Collective Operations: Game Changer or Challenge for AI Workloads?
by: Hoefler, Torsten, et al.
Published: (2026) -
DRAGON (Differentiable Graph Execution) : A suite of Hardware Simulation and Optimization tools for Modern AI/Non-AI Workloads
by: Sethi, Khushal
Published: (2022)