Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jia, Fucheng, Jiang, Shiqi, Cao, Ting, Cui, Wei, Xia, Tianrui, Cao, Xu, Li, Yuanchun, Zhang, Deyu, Ren, Ju, Liu, Yunxin, Qiu, Lili, Yang, Mao
Format:	Preprint
Published:	2023
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2309.08978
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917713658511360
author	Jia, Fucheng Jiang, Shiqi Cao, Ting Cui, Wei Xia, Tianrui Cao, Xu Li, Yuanchun Zhang, Deyu Ren, Ju Liu, Yunxin Qiu, Lili Yang, Mao
author_facet	Jia, Fucheng Jiang, Shiqi Cao, Ting Cui, Wei Xia, Tianrui Cao, Xu Li, Yuanchun Zhang, Deyu Ren, Ju Liu, Yunxin Qiu, Lili Yang, Mao
contents	Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web hardware acceleration practices, hinders current in-browser inference from achieving its full performance potential on target devices. To address this issue, this paper presents the pioneering inbrowser inference system, nnJIT, which enables just-in-time (JIT) auto-generation of optimized computing kernels for edge devices. nnJIT is built upon two novel techniques that significantly reduce kernel search and compilation overhead while improving performance firmly: Tensor-Web Compiling Co-Design lowers compiling costs by around 100X through eliminating redundant and ineffective compiling passes; Web-Specific Lite Kernel Optimization Space reduces kernel tuning costs by focusing on Web programming requirements and efficient device resource utilization, pruning the optimization space from millions to only dozens. nnJIT is evaluated for modern models, e.g., BART, T5, and Llama 2, on a range of edge devices including laptops and smartphones using different browsers and hardware from ARM, Intel, AMD and Nvidia. The results show that nnJIT can achieve up to 8.2X faster within 30 seconds compared to the existing baselines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2309_08978
institution	arXiv
publishDate	2023
record_format	arxiv
spellingShingle	Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations Jia, Fucheng Jiang, Shiqi Cao, Ting Cui, Wei Xia, Tianrui Cao, Xu Li, Yuanchun Zhang, Deyu Ren, Ju Liu, Yunxin Qiu, Lili Yang, Mao Artificial Intelligence Web is increasingly becoming the primary platform to deliver AI services onto edge devices, making in-browser deep learning (DL) inference more prominent. Nevertheless, the heterogeneity of edge devices, combined with the underdeveloped state of Web hardware acceleration practices, hinders current in-browser inference from achieving its full performance potential on target devices. To address this issue, this paper presents the pioneering inbrowser inference system, nnJIT, which enables just-in-time (JIT) auto-generation of optimized computing kernels for edge devices. nnJIT is built upon two novel techniques that significantly reduce kernel search and compilation overhead while improving performance firmly: Tensor-Web Compiling Co-Design lowers compiling costs by around 100X through eliminating redundant and ineffective compiling passes; Web-Specific Lite Kernel Optimization Space reduces kernel tuning costs by focusing on Web programming requirements and efficient device resource utilization, pruning the optimization space from millions to only dozens. nnJIT is evaluated for modern models, e.g., BART, T5, and Llama 2, on a range of edge devices including laptops and smartphones using different browsers and hardware from ARM, Intel, AMD and Nvidia. The results show that nnJIT can achieve up to 8.2X faster within 30 seconds compared to the existing baselines.
title	Empowering In-Browser Deep Learning Inference on Edge Devices with Just-in-Time Kernel Optimizations
topic	Artificial Intelligence
url	https://arxiv.org/abs/2309.08978

Similar Items