Saved in:
Bibliographic Details
Main Authors: Jia, Fucheng, Jiang, Shiqi, Cao, Ting, Cui, Wei, Xia, Tianrui, Cao, Xu, Li, Yuanchun, Zhang, Deyu, Ren, Ju, Liu, Yunxin, Qiu, Lili, Yang, Mao
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2309.08978
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917713658511360
author Jia, Fucheng
Jiang, Shiqi
Cao, Ting
Cui, Wei
Xia, Tianrui
Cao, Xu
Li, Yuanchun
Zhang, Deyu
Ren, Ju
Liu, Yunxin
Qiu, Lili
Yang, Mao
author_facet Jia, Fucheng
Jiang, Shiqi
Cao, Ting
Cui, Wei
Xia, Tianrui
Cao, Xu
Li, Yuanchun
Zhang, Deyu
Ren, Ju
Liu, Yunxin
Qiu, Lili
Yang, Mao
contents Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web hardware acceleration practices, hinders current in-browser inference from achieving its full performance potential on target devices. To address this issue, this paper presents the pioneering inbrowser inference system, nnJIT, which enables just-in-time (JIT) auto-generation of optimized computing kernels for edge devices. nnJIT is built upon two novel techniques that significantly reduce kernel search and compilation overhead while improving performance firmly: Tensor-Web Compiling Co-Design lowers compiling costs by around 100X through eliminating redundant and ineffective compiling passes; Web-Specific Lite Kernel Optimization Space reduces kernel tuning costs by focusing on Web programming requirements and efficient device resource utilization, pruning the optimization space from millions to only dozens. nnJIT is evaluated for modern models, e.g., BART, T5, and Llama 2, on a range of edge devices including laptops and smartphones using different browsers and hardware from ARM, Intel, AMD and Nvidia. The results show that nnJIT can achieve up to 8.2X faster within 30 seconds compared to the existing baselines.
format Preprint
id arxiv_https___arxiv_org_abs_2309_08978
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations
Jia, Fucheng
Jiang, Shiqi
Cao, Ting
Cui, Wei
Xia, Tianrui
Cao, Xu
Li, Yuanchun
Zhang, Deyu
Ren, Ju
Liu, Yunxin
Qiu, Lili
Yang, Mao
Artificial Intelligence
Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web hardware acceleration practices, hinders current in-browser inference from achieving its full performance potential on target devices. To address this issue, this paper presents the pioneering inbrowser inference system, nnJIT, which enables just-in-time (JIT) auto-generation of optimized computing kernels for edge devices. nnJIT is built upon two novel techniques that significantly reduce kernel search and compilation overhead while improving performance firmly: Tensor-Web Compiling Co-Design lowers compiling costs by around 100X through eliminating redundant and ineffective compiling passes; Web-Specific Lite Kernel Optimization Space reduces kernel tuning costs by focusing on Web programming requirements and efficient device resource utilization, pruning the optimization space from millions to only dozens. nnJIT is evaluated for modern models, e.g., BART, T5, and Llama 2, on a range of edge devices including laptops and smartphones using different browsers and hardware from ARM, Intel, AMD and Nvidia. The results show that nnJIT can achieve up to 8.2X faster within 30 seconds compared to the existing baselines.
title Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations
topic Artificial Intelligence
url https://arxiv.org/abs/2309.08978