:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Banfic, Nenad, Fan, David, Vaishnavi, Kunal, Kemp, Sam, Choi, Sunghoon, Ren, Rui, Shaw, Sayan, Tang, Meng
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.14493
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)

Staircase Streaming for Low-Latency Multi-Agent Inference
by: Wang, Junlin, et al.
Published: (2025)

SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
by: Wang, Fangyuan, et al.
Published: (2022)

Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU
by: Roh, Hyeri, et al.
Published: (2024)

A Compact Model for English Grammar Error Correction in the Low‐Latency Edge Deployment
by: Shaoli Xiong
Published: (2026)

Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
by: Grigoryan, Lilit, et al.
Published: (2025)

Low-Latency Neural Stereo Streaming
by: Hou, Qiqi, et al.
Published: (2024)

Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications
by: Kudlur, Manjunath, et al.
Published: (2026)

Non-equilibrium dynamics of the disordered Power of Two model
by: Singh, Kunal, et al.
Published: (2026)

Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS
by: Ethiraj, Vignesh, et al.
Published: (2025)

Action Deviation-Aware Inference for Low-Latency Wireless Robots
by: Park, Jeyoung, et al.
Published: (2025)

Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
by: Bao, Rui, et al.
Published: (2025)

Lessons Learnt From Long‐Term Monitoring of River Restoration in an English Chalk Stream
by: Lewis A. Dolman, et al.
Published: (2026)

Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions
by: Li, Yang, et al.
Published: (2024)

Pushing the Limits of BFP on Narrow Precision LLM Inference
by: Wang, Hui, et al.
Published: (2025)

Greening AI Inference with Accuracy and Latency-aware User Incentives
by: Siris, Vasilios A., et al.
Published: (2026)

Grid-Free Evaluation of Phonon-Limited Electronic Relaxation Times and Transport Properties
by: Vukmirović, Nenad
Published: (2025)

Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning
by: Yang, Haoyu, et al.
Published: (2026)

Low-Latency Scalable Streaming for Event-Based Vision
by: Hamara, Andrew, et al.
Published: (2024)

Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems
by: Liu, Siyuan, et al.
Published: (2026)

Low-Latency Neural Inference on an Edge Device for Real-Time Handwriting Recognition from EEG Signals
by: Sen, Ovishake, et al.
Published: (2025)

A Study on Inference Latency for Vision Transformers on Mobile Devices
by: Li, Zhuojin, et al.
Published: (2025)

StreamVC: Real-Time Low-Latency Voice Conversion
by: Yang, Yang, et al.
Published: (2024)

Low Latency, High Bandwidth Streaming of Experimental Data with EJFAT
by: Baldin, Ilya, et al.
Published: (2025)

Pushing The Limit of LLM Capacity for Text Classification
by: Zhang, Yazhou, et al.
Published: (2024)

3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency
by: Jung, Minseok, et al.
Published: (2025)

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
by: Wang, Haoxin, et al.
Published: (2025)

Ultra-Low-Latency Edge Inference for Distributed Sensing
by: Wang, Zhanwei, et al.
Published: (2024)

VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
by: Torgashov, Nikita, et al.
Published: (2025)

Low-Latency Stateful Stream Processing through Timely and Accurate Prefetching
by: Zapridou, Eleni, et al.
Published: (2026)

Low-Latency Grid Intelligence with Self-Governing Stream and Calibration Agents
by: Parthasarathy, Adithya, et al.
Published: (2026)

DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement
by: Sun, Tao, et al.
Published: (2024)

An Experimental Study of Low-Latency Video Streaming over 5G
by: Khan, Imran, et al.
Published: (2024)

MOTION: ML-Assisted On-Device Low-Latency Motion Recognition
by: Pugazhenthi, Veeramani, et al.
Published: (2025)

Low-Latency Terrestrial Interference Detection for Satellite-to-Device Communications
by: Liu, Runnan, et al.
Published: (2025)

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators
by: Baldi, T., et al.
Published: (2026)

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
by: Kim, Kwanyoung, et al.
Published: (2025)

Depth-discriminative Metric Learning for Monocular 3D Object Detection
by: Choi, Wonhyeok, et al.
Published: (2024)

Collaboration and the Accuracy Imperative: Improving Reference Service Now.
by: Kemp, Jan, et al.
Published: (1989)

CMIR: A Corpus for Evaluation of Code Mixed Information Retrieval of Hindi-English Tweets
by: Kunal Chakma
Published: (2016)