Saved in:
| Main Authors: | Ma, Xing, Zhou, Yangjie, Sun, Wu, Liu, Zihan, Leng, Jingwen, Lin, Yun, Sun, Shixuan, Guo, Minyi, Dong, Jin Song |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.05023 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
by: Qiang, Xinwei, et al.
Published: (2026)
by: Qiang, Xinwei, et al.
Published: (2026)
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
by: Luo, Xinhao, et al.
Published: (2025)
by: Luo, Xinhao, et al.
Published: (2025)
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
by: Guan, Yue, et al.
Published: (2025)
by: Guan, Yue, et al.
Published: (2025)
Design the Quantum Instruction Set with the Cartan Coordinate Analysis Framework
by: Wu, Anbang, et al.
Published: (2024)
by: Wu, Anbang, et al.
Published: (2024)
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference
by: Liu, Zihan, et al.
Published: (2025)
by: Liu, Zihan, et al.
Published: (2025)
Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy
by: Feng, Yu, et al.
Published: (2025)
by: Feng, Yu, et al.
Published: (2025)
FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
by: Huang, Ziyu, et al.
Published: (2025)
by: Huang, Ziyu, et al.
Published: (2025)
eLLM: Elastic Memory Management Framework for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2025)
by: Xu, Jiale, et al.
Published: (2025)
Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
by: Feng, Yu, et al.
Published: (2024)
by: Feng, Yu, et al.
Published: (2024)
LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025)
by: Hu, Huanqi, et al.
Published: (2025)
vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2024)
by: Xu, Jiale, et al.
Published: (2024)
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
by: Chen, Hongyu, et al.
Published: (2026)
by: Chen, Hongyu, et al.
Published: (2026)
SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting
by: Huang, Xiaotong, et al.
Published: (2025)
by: Huang, Xiaotong, et al.
Published: (2025)
SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity
by: Li, Xingyang, et al.
Published: (2025)
by: Li, Xingyang, et al.
Published: (2025)
Towards High-Goodput LLM Serving with Prefill-decode Multiplexing
by: Chen, Yukang, et al.
Published: (2025)
by: Chen, Yukang, et al.
Published: (2025)
gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs
by: Chen, Weitian, et al.
Published: (2026)
by: Chen, Weitian, et al.
Published: (2026)
Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers
by: Liu, Haosong, et al.
Published: (2025)
by: Liu, Haosong, et al.
Published: (2025)
StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic Termination
by: Feng, Yu, et al.
Published: (2025)
by: Feng, Yu, et al.
Published: (2025)
Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
by: Feng, Yu, et al.
Published: (2024)
by: Feng, Yu, et al.
Published: (2024)
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type
by: Hu, Weiming, et al.
Published: (2025)
by: Hu, Weiming, et al.
Published: (2025)
Gumbel Reranking: Differentiable End-to-End Reranker Optimization
by: Huang, Siyuan, et al.
Published: (2025)
by: Huang, Siyuan, et al.
Published: (2025)
Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025)
by: Huang, Xiaotong, et al.
Published: (2025)
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
by: Wang, Kunyun, et al.
Published: (2024)
by: Wang, Kunyun, et al.
Published: (2024)
Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)
by: Zhao, Han, et al.
Published: (2024)
On Distributionally Robust Multistage Convex Optimization: Data-driven Models and Performance
by: Zhang, Shixuan, et al.
Published: (2022)
by: Zhang, Shixuan, et al.
Published: (2022)
Visible‐Light‐Induced Deaminative Alkylation for the Synthesis of Chroman‐4‐One Derivatives via EDA Complexes
by: Jinke Yan, et al.
Published: (2024)
by: Jinke Yan, et al.
Published: (2024)
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
by: Saba, Tara, et al.
Published: (2026)
by: Saba, Tara, et al.
Published: (2026)
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
Efficient Serving of LLM Applications with Probabilistic Demand Modeling
by: Liu, Yifei, et al.
Published: (2025)
by: Liu, Yifei, et al.
Published: (2025)
Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
by: Mei, Junyi, et al.
Published: (2024)
by: Mei, Junyi, et al.
Published: (2024)
Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
by: Shen, Zhaiming, et al.
Published: (2025)
by: Shen, Zhaiming, et al.
Published: (2025)
Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness
by: Ma, Shixuan, et al.
Published: (2024)
by: Ma, Shixuan, et al.
Published: (2024)
An Efficient Private GPT Never Autoregressively Decodes
by: Li, Zhengyi, et al.
Published: (2025)
by: Li, Zhengyi, et al.
Published: (2025)
Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization
by: Zhou, Yangjie, et al.
Published: (2024)
by: Zhou, Yangjie, et al.
Published: (2024)
Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization
by: Yun, Taeyoung, et al.
Published: (2026)
by: Yun, Taeyoung, et al.
Published: (2026)
Pathways to High Corporate Environmental Responsibility: A Fuzzy‐Set and Necessary Condition Analysis
by: Yangjie Huang, et al.
Published: (2025)
by: Yangjie Huang, et al.
Published: (2025)
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
by: Jiang, Jevin, et al.
Published: (2026)
by: Jiang, Jevin, et al.
Published: (2026)
Heterogeneous Mean Field Game Framework for LEO Satellite-Assisted V2X Networks
by: Sun, Kangkang, et al.
Published: (2026)
by: Sun, Kangkang, et al.
Published: (2026)
Visible‐Light‐Induced Radical Cyclization of Unactivated Olefins with Perfluoroalkyl Iodides to Access Perfluoroalkylated Ortho‐Diazaheterocyclic Compounds
by: Kaixia Sui, et al.
Published: (2025)
by: Kaixia Sui, et al.
Published: (2025)
Similar Items
-
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
by: Qiang, Xinwei, et al.
Published: (2026) -
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
by: Luo, Xinhao, et al.
Published: (2025) -
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
by: Guan, Yue, et al.
Published: (2025) -
Design the Quantum Instruction Set with the Cartan Coordinate Analysis Framework
by: Wu, Anbang, et al.
Published: (2024) -
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference
by: Liu, Zihan, et al.
Published: (2025)