Saved in:
| Main Authors: | Liu, Haosong, Cheng, Yuge, Miao, Wenxuan, Liu, Zihan, Chen, Aiyue, Lin, Jing, Yao, Yiwu, Chen, Chen, Leng, Jingwen, Feng, Yu, Guo, Minyi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.05096 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
by: Miao, Wenxuan, et al.
Published: (2025)
by: Miao, Wenxuan, et al.
Published: (2025)
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy
by: Chen, Aiyue, et al.
Published: (2025)
by: Chen, Aiyue, et al.
Published: (2025)
Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy
by: Feng, Yu, et al.
Published: (2025)
by: Feng, Yu, et al.
Published: (2025)
RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention
by: Chen, Aiyue, et al.
Published: (2025)
by: Chen, Aiyue, et al.
Published: (2025)
Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
by: Feng, Yu, et al.
Published: (2024)
by: Feng, Yu, et al.
Published: (2024)
SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting
by: Huang, Xiaotong, et al.
Published: (2025)
by: Huang, Xiaotong, et al.
Published: (2025)
Design the Quantum Instruction Set with the Cartan Coordinate Analysis Framework
by: Wu, Anbang, et al.
Published: (2024)
by: Wu, Anbang, et al.
Published: (2024)
Accelerating Diffusion Transformers with Token-wise Feature Caching
by: Zou, Chang, et al.
Published: (2024)
by: Zou, Chang, et al.
Published: (2024)
Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
by: Feng, Yu, et al.
Published: (2024)
by: Feng, Yu, et al.
Published: (2024)
Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
by: Qiang, Xinwei, et al.
Published: (2026)
by: Qiang, Xinwei, et al.
Published: (2026)
SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity
by: Li, Xingyang, et al.
Published: (2025)
by: Li, Xingyang, et al.
Published: (2025)
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
by: Wang, Kunyun, et al.
Published: (2024)
by: Wang, Kunyun, et al.
Published: (2024)
StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic Termination
by: Feng, Yu, et al.
Published: (2025)
by: Feng, Yu, et al.
Published: (2025)
CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels
by: Ma, Xing, et al.
Published: (2026)
by: Ma, Xing, et al.
Published: (2026)
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
by: Li, Zhengyi, et al.
Published: (2026)
by: Li, Zhengyi, et al.
Published: (2026)
Rethinking Token-wise Feature Caching: Accelerating Diffusion Transformers with Dual Feature Caching
by: Zou, Chang, et al.
Published: (2024)
by: Zou, Chang, et al.
Published: (2024)
FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
by: Huang, Ziyu, et al.
Published: (2025)
by: Huang, Ziyu, et al.
Published: (2025)
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
by: Luo, Xinhao, et al.
Published: (2025)
by: Luo, Xinhao, et al.
Published: (2025)
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type
by: Hu, Weiming, et al.
Published: (2025)
by: Hu, Weiming, et al.
Published: (2025)
FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)
by: Liu, Guangda, et al.
Published: (2025)
An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
by: Zhang, Weichuang, et al.
Published: (2024)
by: Zhang, Weichuang, et al.
Published: (2024)
Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025)
by: Huang, Xiaotong, et al.
Published: (2025)
An Efficient Private GPT Never Autoregressively Decodes
by: Li, Zhengyi, et al.
Published: (2025)
by: Li, Zhengyi, et al.
Published: (2025)
Token Caching for Diffusion Transformer Acceleration
by: Lou, Jinming, et al.
Published: (2024)
by: Lou, Jinming, et al.
Published: (2024)
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference
by: Liu, Zihan, et al.
Published: (2025)
by: Liu, Zihan, et al.
Published: (2025)
A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification
by: Chen, Peng, et al.
Published: (2026)
by: Chen, Peng, et al.
Published: (2026)
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
by: Chen, Hongyu, et al.
Published: (2026)
by: Chen, Hongyu, et al.
Published: (2026)
Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task
by: Wang, Jing, et al.
Published: (2024)
by: Wang, Jing, et al.
Published: (2024)
Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
by: Wang, Cheng, et al.
Published: (2026)
by: Wang, Cheng, et al.
Published: (2026)
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
eLLM: Elastic Memory Management Framework for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2025)
by: Xu, Jiale, et al.
Published: (2025)
DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
by: Zhu, Haowei, et al.
Published: (2026)
by: Zhu, Haowei, et al.
Published: (2026)
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
by: Peng, Haosong, et al.
Published: (2024)
by: Peng, Haosong, et al.
Published: (2024)
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
by: Zhang, Qijun, et al.
Published: (2026)
by: Zhang, Qijun, et al.
Published: (2026)
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
by: Guan, Yue, et al.
Published: (2025)
by: Guan, Yue, et al.
Published: (2025)
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
by: Zhong, Yiwu, et al.
Published: (2024)
by: Zhong, Yiwu, et al.
Published: (2024)
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
by: Chen, Pengtao, et al.
Published: (2025)
by: Chen, Pengtao, et al.
Published: (2025)
Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)
by: Zhao, Han, et al.
Published: (2024)
Block-wise Adaptive Caching for Accelerating Diffusion Policy
by: Ji, Kangye, et al.
Published: (2025)
by: Ji, Kangye, et al.
Published: (2025)
Similar Items
-
TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
by: Miao, Wenxuan, et al.
Published: (2025) -
RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy
by: Chen, Aiyue, et al.
Published: (2025) -
Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy
by: Feng, Yu, et al.
Published: (2025) -
RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention
by: Chen, Aiyue, et al.
Published: (2025) -
Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
by: Feng, Yu, et al.
Published: (2024)