Saved in:
| Main Authors: | Okpala, Izunna, Halse, Shane, Kropczynski, Jess |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2302.02267 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing
by: Okpala, Izunna, et al.
Published: (2023)
by: Okpala, Izunna, et al.
Published: (2023)
Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach
by: Hanyu, Tatsuro, et al.
Published: (2025)
by: Hanyu, Tatsuro, et al.
Published: (2025)
Cost-Effective Model Evaluation with Meta-Learning
by: Pham, Trinh, et al.
Published: (2026)
by: Pham, Trinh, et al.
Published: (2026)
Deploying Open-Source Large Language Models: A performance Analysis
by: Bendi-Ouis, Yannis, et al.
Published: (2024)
by: Bendi-Ouis, Yannis, et al.
Published: (2024)
Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning
by: Gong, Jingzhi, et al.
Published: (2024)
by: Gong, Jingzhi, et al.
Published: (2024)
The Hidden Power of Pure 16-bit Floating-Point Neural Networks
by: Yun, Juyoung, et al.
Published: (2023)
by: Yun, Juyoung, et al.
Published: (2023)
Performance Modeling of Data Storage Systems using Generative Models
by: Al-Maeeni, Abdalaziz Rashid, et al.
Published: (2023)
by: Al-Maeeni, Abdalaziz Rashid, et al.
Published: (2023)
Fairness in Serving Large Language Models
by: Sheng, Ying, et al.
Published: (2023)
by: Sheng, Ying, et al.
Published: (2023)
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)
by: Barad, Haim, et al.
Published: (2023)
LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic
by: Zheng, Weibing, et al.
Published: (2025)
by: Zheng, Weibing, et al.
Published: (2025)
FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models
by: Shao, Zishan, et al.
Published: (2025)
by: Shao, Zishan, et al.
Published: (2025)
SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
by: Mozaffari, Mohammad, et al.
Published: (2024)
by: Mozaffari, Mohammad, et al.
Published: (2024)
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
by: Huang, Zixiao, et al.
Published: (2025)
by: Huang, Zixiao, et al.
Published: (2025)
Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs
by: Yun, Vincent-Daniel, et al.
Published: (2026)
by: Yun, Vincent-Daniel, et al.
Published: (2026)
Quantum Neural Networks for Wind Energy Forecasting: A Comparative Study of Performance and Scalability with Classical Models
by: Hangun, Batuhan, et al.
Published: (2025)
by: Hangun, Batuhan, et al.
Published: (2025)
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
by: Jiang, Jevin, et al.
Published: (2026)
by: Jiang, Jevin, et al.
Published: (2026)
Generalizing Scaling Laws for Dense and Sparse Large Language Models
by: Hossain, Md Arafat, et al.
Published: (2025)
by: Hossain, Md Arafat, et al.
Published: (2025)
Rapid Augmentations for Time Series (RATS): A High-Performance Library for Time Series Augmentation
by: Skaf, Wadie, et al.
Published: (2026)
by: Skaf, Wadie, et al.
Published: (2026)
GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping
by: Yin, Yishu, et al.
Published: (2025)
by: Yin, Yishu, et al.
Published: (2025)
EXAQ: Exponent Aware Quantization For LLMs Acceleration
by: Shkolnik, Moran, et al.
Published: (2024)
by: Shkolnik, Moran, et al.
Published: (2024)
Research on Low-Latency Inference and Training Efficiency Optimization for Graph Neural Network and Large Language Model-Based Recommendation Systems
by: Zhao, Yushang, et al.
Published: (2025)
by: Zhao, Yushang, et al.
Published: (2025)
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
by: Kermani, Arshia, et al.
Published: (2025)
by: Kermani, Arshia, et al.
Published: (2025)
Private LLM Inference on Consumer Blackwell GPUs: A Practical Guide for Cost-Effective Local Deployment in SMEs
by: Knoop, Jonathan, et al.
Published: (2026)
by: Knoop, Jonathan, et al.
Published: (2026)
APOLLO: SGD-like Memory, AdamW-level Performance
by: Zhu, Hanqing, et al.
Published: (2024)
by: Zhu, Hanqing, et al.
Published: (2024)
An Efficient Hybrid Sparse Attention with CPU-GPU Parallelism for Long-Context Inference
by: Yao, Feiyu, et al.
Published: (2026)
by: Yao, Feiyu, et al.
Published: (2026)
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)
by: Panigrahy, Deepak, et al.
Published: (2026)
FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
by: Qiao, Liang, et al.
Published: (2025)
by: Qiao, Liang, et al.
Published: (2025)
Exchangeability in Neural Network and its Application to Dynamic Pruning
by: Pu, et al.
Published: (2025)
by: Pu, et al.
Published: (2025)
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)
by: Almurshed, Osama, et al.
Published: (2025)
FlashSVD v1.5: Making Low-Rank Transformers Inference Actually Fast
by: Wu, Wenhao, et al.
Published: (2026)
by: Wu, Wenhao, et al.
Published: (2026)
The Race to Efficiency: A New Perspective on AI Scaling Laws
by: Lu, Chien-Ping
Published: (2025)
by: Lu, Chien-Ping
Published: (2025)
Profiling LoRA/QLoRA Fine-Tuning Efficiency on Consumer GPUs: An RTX 4060 Case Study
by: Avinash, MSR
Published: (2025)
by: Avinash, MSR
Published: (2025)
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)
by: Chu, Kexin, et al.
Published: (2025)
AdaGradSelect: An adaptive gradient-guided layer selection method for efficient fine-tuning of SLMs
by: Kumar, Anshul, et al.
Published: (2025)
by: Kumar, Anshul, et al.
Published: (2025)
On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)
by: Sobhani, Ghazal, et al.
Published: (2025)
Knowledge Distillation for Reservoir-based Classifier: Human Activity Recognition
by: Kagiyama, Masaharu, et al.
Published: (2025)
by: Kagiyama, Masaharu, et al.
Published: (2025)
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
by: Estevez, Melissa, et al.
Published: (2025)
by: Estevez, Melissa, et al.
Published: (2025)
EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training
by: Yi, Qingao, et al.
Published: (2025)
by: Yi, Qingao, et al.
Published: (2025)
OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
by: Mozaffari, Mohammad, et al.
Published: (2025)
by: Mozaffari, Mohammad, et al.
Published: (2025)
Estudio de la eficiencia en la escalabilidad de GPUs para el entrenamiento de Inteligencia Artificial
by: Cortes, David, et al.
Published: (2025)
by: Cortes, David, et al.
Published: (2025)
Similar Items
-
A Semantic Approach to Negation Detection and Word Disambiguation with Natural Language Processing
by: Okpala, Izunna, et al.
Published: (2023) -
Towards Generalized Parameter Tuning in Coherent Ising Machines: A Portfolio-Based Approach
by: Hanyu, Tatsuro, et al.
Published: (2025) -
Cost-Effective Model Evaluation with Meta-Learning
by: Pham, Trinh, et al.
Published: (2026) -
Deploying Open-Source Large Language Models: A performance Analysis
by: Bendi-Ouis, Yannis, et al.
Published: (2024) -
Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning
by: Gong, Jingzhi, et al.
Published: (2024)