Saved in:
| Main Authors: | Banfic, Nenad, Fan, David, Vaishnavi, Kunal, Kemp, Sam, Choi, Sunghoon, Ren, Rui, Shaw, Sayan, Tang, Meng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.14493 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025)
by: Li, Longhao, et al.
Published: (2025)
Staircase Streaming for Low-Latency Multi-Agent Inference
by: Wang, Junlin, et al.
Published: (2025)
by: Wang, Junlin, et al.
Published: (2025)
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
by: Wang, Fangyuan, et al.
Published: (2022)
by: Wang, Fangyuan, et al.
Published: (2022)
Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU
by: Roh, Hyeri, et al.
Published: (2024)
by: Roh, Hyeri, et al.
Published: (2024)
A Compact Model for English Grammar Error Correction in the Low‐Latency Edge Deployment
by: Shaoli Xiong
Published: (2026)
by: Shaoli Xiong
Published: (2026)
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
Low-Latency Neural Stereo Streaming
by: Hou, Qiqi, et al.
Published: (2024)
by: Hou, Qiqi, et al.
Published: (2024)
Moonshine v2: Ergodic Streaming Encoder ASR for Latency-Critical Speech Applications
by: Kudlur, Manjunath, et al.
Published: (2026)
by: Kudlur, Manjunath, et al.
Published: (2026)
Non-equilibrium dynamics of the disordered Power of Two model
by: Singh, Kunal, et al.
Published: (2026)
by: Singh, Kunal, et al.
Published: (2026)
Toward Low-Latency End-to-End Voice Agents for Telecommunications Using Streaming ASR, Quantized LLMs, and Real-Time TTS
by: Ethiraj, Vignesh, et al.
Published: (2025)
by: Ethiraj, Vignesh, et al.
Published: (2025)
Action Deviation-Aware Inference for Low-Latency Wireless Robots
by: Park, Jeyoung, et al.
Published: (2025)
by: Park, Jeyoung, et al.
Published: (2025)
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
by: Bao, Rui, et al.
Published: (2025)
by: Bao, Rui, et al.
Published: (2025)
Lessons Learnt From Long‐Term Monitoring of River Restoration in an English Chalk Stream
by: Lewis A. Dolman, et al.
Published: (2026)
by: Lewis A. Dolman, et al.
Published: (2026)
Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
Pushing the Limits of BFP on Narrow Precision LLM Inference
by: Wang, Hui, et al.
Published: (2025)
by: Wang, Hui, et al.
Published: (2025)
Greening AI Inference with Accuracy and Latency-aware User Incentives
by: Siris, Vasilios A., et al.
Published: (2026)
by: Siris, Vasilios A., et al.
Published: (2026)
Grid-Free Evaluation of Phonon-Limited Electronic Relaxation Times and Transport Properties
by: Vukmirović, Nenad
Published: (2025)
by: Vukmirović, Nenad
Published: (2025)
Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning
by: Yang, Haoyu, et al.
Published: (2026)
by: Yang, Haoyu, et al.
Published: (2026)
Low-Latency Scalable Streaming for Event-Based Vision
by: Hamara, Andrew, et al.
Published: (2024)
by: Hamara, Andrew, et al.
Published: (2024)
Discourse-Aware Dual-Track Streaming Response for Low-Latency Spoken Dialogue Systems
by: Liu, Siyuan, et al.
Published: (2026)
by: Liu, Siyuan, et al.
Published: (2026)
Low-Latency Neural Inference on an Edge Device for Real-Time Handwriting Recognition from EEG Signals
by: Sen, Ovishake, et al.
Published: (2025)
by: Sen, Ovishake, et al.
Published: (2025)
A Study on Inference Latency for Vision Transformers on Mobile Devices
by: Li, Zhuojin, et al.
Published: (2025)
by: Li, Zhuojin, et al.
Published: (2025)
StreamVC: Real-Time Low-Latency Voice Conversion
by: Yang, Yang, et al.
Published: (2024)
by: Yang, Yang, et al.
Published: (2024)
Low Latency, High Bandwidth Streaming of Experimental Data with EJFAT
by: Baldin, Ilya, et al.
Published: (2025)
by: Baldin, Ilya, et al.
Published: (2025)
Pushing The Limit of LLM Capacity for Text Classification
by: Zhang, Yazhou, et al.
Published: (2024)
by: Zhang, Yazhou, et al.
Published: (2024)
3D Optimization for AI Inference Scaling: Balancing Accuracy, Cost, and Latency
by: Jung, Minseok, et al.
Published: (2025)
by: Jung, Minseok, et al.
Published: (2025)
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
by: Wang, Haoxin, et al.
Published: (2025)
by: Wang, Haoxin, et al.
Published: (2025)
Ultra-Low-Latency Edge Inference for Distributed Sensing
by: Wang, Zhanwei, et al.
Published: (2024)
by: Wang, Zhanwei, et al.
Published: (2024)
VoXtream: Full-Stream Text-to-Speech with Extremely Low Latency
by: Torgashov, Nikita, et al.
Published: (2025)
by: Torgashov, Nikita, et al.
Published: (2025)
Low-Latency Stateful Stream Processing through Timely and Accurate Prefetching
by: Zapridou, Eleni, et al.
Published: (2026)
by: Zapridou, Eleni, et al.
Published: (2026)
Low-Latency Grid Intelligence with Self-Governing Stream and Calibration Agents
by: Parthasarathy, Adithya, et al.
Published: (2026)
by: Parthasarathy, Adithya, et al.
Published: (2026)
DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement
by: Sun, Tao, et al.
Published: (2024)
by: Sun, Tao, et al.
Published: (2024)
An Experimental Study of Low-Latency Video Streaming over 5G
by: Khan, Imran, et al.
Published: (2024)
by: Khan, Imran, et al.
Published: (2024)
MOTION: ML-Assisted On-Device Low-Latency Motion Recognition
by: Pugazhenthi, Veeramani, et al.
Published: (2025)
by: Pugazhenthi, Veeramani, et al.
Published: (2025)
Low-Latency Terrestrial Interference Detection for Satellite-to-Device Communications
by: Liu, Runnan, et al.
Published: (2025)
by: Liu, Runnan, et al.
Published: (2025)
ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators
by: Baldi, T., et al.
Published: (2026)
by: Baldi, T., et al.
Published: (2026)
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
by: Kim, Kwanyoung, et al.
Published: (2025)
by: Kim, Kwanyoung, et al.
Published: (2025)
Depth-discriminative Metric Learning for Monocular 3D Object Detection
by: Choi, Wonhyeok, et al.
Published: (2024)
by: Choi, Wonhyeok, et al.
Published: (2024)
Collaboration and the Accuracy Imperative: Improving Reference Service Now.
by: Kemp, Jan, et al.
Published: (1989)
by: Kemp, Jan, et al.
Published: (1989)
CMIR: A Corpus for Evaluation of Code Mixed Information Retrieval of Hindi-English Tweets
by: Kunal Chakma
Published: (2016)
by: Kunal Chakma
Published: (2016)
Similar Items
-
Delayed-KD: Delayed Knowledge Distillation based CTC for Low-Latency Streaming ASR
by: Li, Longhao, et al.
Published: (2025) -
Staircase Streaming for Low-Latency Multi-Agent Inference
by: Wang, Junlin, et al.
Published: (2025) -
SSCFormer: Push the Limit of Chunk-wise Conformer for Streaming ASR Using Sequentially Sampled Chunks and Chunked Causal Convolution
by: Wang, Fangyuan, et al.
Published: (2022) -
Flash: A Hybrid Private Inference Protocol for Deep CNNs with High Accuracy and Low Latency on CPU
by: Roh, Hyeri, et al.
Published: (2024) -
A Compact Model for English Grammar Error Correction in the Low‐Latency Edge Deployment
by: Shaoli Xiong
Published: (2026)