:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Okpala, Izunna, Halse, Shane, Kropczynski, Jess
Format:	Preprint
Published:	2023
Subjects:	Machine Learning Artificial Intelligence Performance
Online Access:	https://arxiv.org/abs/2302.02267
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing
by: Okpala, Izunna, et al.
Published: (2023)

Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach
by: Hanyu, Tatsuro, et al.
Published: (2025)

Cost-Effective Model Evaluation with Meta-Learning
by: Pham, Trinh, et al.
Published: (2026)

Deploying Open-Source Large Language Models: A performance Analysis
by: Bendi-Ouis, Yannis, et al.
Published: (2024)

Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning
by: Gong, Jingzhi, et al.
Published: (2024)

The Hidden Power of Pure 16-bit Floating-Point Neural Networks
by: Yun, Juyoung, et al.
Published: (2023)

Performance Modeling of Data Storage Systems using Generative Models
by: Al-Maeeni, Abdalaziz Rashid, et al.
Published: (2023)

Fairness in Serving Large Language Models
by: Sheng, Ying, et al.
Published: (2023)

Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)

LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic
by: Zheng, Weibing, et al.
Published: (2025)

FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models
by: Shao, Zishan, et al.
Published: (2025)

SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
by: Mozaffari, Mohammad, et al.
Published: (2024)

Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
by: Huang, Zixiao, et al.
Published: (2025)

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs
by: Yun, Vincent-Daniel, et al.
Published: (2026)

Quantum Neural Networks for Wind Energy Forecasting: A Comparative Study of Performance and Scalability with Classical Models
by: Hangun, Batuhan, et al.
Published: (2025)

Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
by: Jiang, Jevin, et al.
Published: (2026)

Generalizing Scaling Laws for Dense and Sparse Large Language Models
by: Hossain, Md Arafat, et al.
Published: (2025)

Rapid Augmentations for Time Series (RATS): A High-Performance Library for Time Series Augmentation
by: Skaf, Wadie, et al.
Published: (2026)

GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping
by: Yin, Yishu, et al.
Published: (2025)

EXAQ: Exponent Aware Quantization For LLMs Acceleration
by: Shkolnik, Moran, et al.
Published: (2024)

Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems
by: Zhao, Yushang, et al.
Published: (2025)

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
by: Kermani, Arshia, et al.
Published: (2025)

Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs
by: Knoop, Jonathan, et al.
Published: (2026)

APOLLO: SGD-like Memory, AdamW-level Performance
by: Zhu, Hanqing, et al.
Published: (2024)

An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference
by: Yao, Feiyu, et al.
Published: (2026)

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)

FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)

Exchangeability in Neural Network and its Application to Dynamic Pruning
by: Pu, et al.
Published: (2025)

Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)

FlashSVD v1.5: Making Low-Rank Transformers Inference Actually Fast
by: Wu, Wenhao, et al.
Published: (2026)

The Race to Efficiency: A New Perspective on AI Scaling Laws
by: Lu, Chien-Ping
Published: (2025)

Profiling LoRA/QLoRA Fine-Tuning Efficiency on Consumer GPUs: An RTX 4060 Case Study
by: Avinash, MSR
Published: (2025)

Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)

AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs
by: Kumar, Anshul, et al.
Published: (2025)

On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)

Knowledge Distillation for Reservoir-based Classifier: Human Activity Recognition
by: Kagiyama, Masaharu, et al.
Published: (2025)

Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
by: Estevez, Melissa, et al.
Published: (2025)

EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training
by: Yi, Qingao, et al.
Published: (2025)

OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
by: Mozaffari, Mohammad, et al.
Published: (2025)

Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial
by: Cortes, David, et al.
Published: (2025)