:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ma, Xing, Zhou, Yangjie, Sun, Wu, Liu, Zihan, Leng, Jingwen, Lin, Yun, Sun, Shixuan, Guo, Minyi, Dong, Jin Song
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.05023
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
by: Qiang, Xinwei, et al.
Published: (2026)

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
by: Luo, Xinhao, et al.
Published: (2025)

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
by: Guan, Yue, et al.
Published: (2025)

Design the Quantum Instruction Set with the Cartan Coordinate Analysis Framework
by: Wu, Anbang, et al.
Published: (2024)

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference
by: Liu, Zihan, et al.
Published: (2025)

Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy
by: Feng, Yu, et al.
Published: (2025)

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
by: Huang, Ziyu, et al.
Published: (2025)

eLLM: Elastic Memory Management Framework for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2025)

Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
by: Feng, Yu, et al.
Published: (2024)

LiquidGEMM: Hardware-Efficient W4A8 GEMM Kernel for High-Performance LLM Serving
by: Hu, Huanqi, et al.
Published: (2025)

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2024)

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
by: Chen, Hongyu, et al.
Published: (2026)

SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting
by: Huang, Xiaotong, et al.
Published: (2025)

SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity
by: Li, Xingyang, et al.
Published: (2025)

Towards High-Goodput LLM Serving with Prefill-decode Multiplexing
by: Chen, Yukang, et al.
Published: (2025)

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs
by: Chen, Weitian, et al.
Published: (2026)

Astraea: A Token-wise Acceleration Framework for Video Diffusion Transformers
by: Liu, Haosong, et al.
Published: (2025)

StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic Termination
by: Feng, Yu, et al.
Published: (2025)

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
by: Feng, Yu, et al.
Published: (2024)

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type
by: Hu, Weiming, et al.
Published: (2025)

Gumbel Reranking: Differentiable End-to-End Reranker Optimization
by: Huang, Siyuan, et al.
Published: (2025)

Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025)

SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
by: Wang, Kunyun, et al.
Published: (2024)

Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)

On Distributionally Robust Multistage Convex Optimization: Data-driven Models and Performance
by: Zhang, Shixuan, et al.
Published: (2022)

Visible‐Light‐Induced Deaminative Alkylation for the Synthesis of Chroman‐4‐One Derivatives via EDA Complexes
by: Jinke Yan, et al.
Published: (2024)

CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
by: Saba, Tara, et al.
Published: (2026)

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)

Efficient Serving of LLM Applications with Probabilistic Demand Modeling
by: Liu, Yifei, et al.
Published: (2025)

Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)

FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework
by: Mei, Junyi, et al.
Published: (2024)

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods
by: Shen, Zhaiming, et al.
Published: (2025)

Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness
by: Ma, Shixuan, et al.
Published: (2024)

An Efficient Private GPT Never Autoregressively Decodes
by: Li, Zhengyi, et al.
Published: (2025)

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization
by: Zhou, Yangjie, et al.
Published: (2024)

Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization
by: Yun, Taeyoung, et al.
Published: (2026)

Pathways to High Corporate Environmental Responsibility: A Fuzzy‐Set and Necessary Condition Analysis
by: Yangjie Huang, et al.
Published: (2025)

Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
by: Jiang, Jevin, et al.
Published: (2026)

Heterogeneous Mean Field Game Framework for LEO Satellite-Assisted V2X Networks
by: Sun, Kangkang, et al.
Published: (2026)

Visible‐Light‐Induced Radical Cyclization of Unactivated Olefins with Perfluoroalkyl Iodides to Access Perfluoroalkylated Ortho‐Diazaheterocyclic Compounds
by: Kaixia Sui, et al.
Published: (2025)