:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Haosong, Cheng, Yuge, Miao, Wenxuan, Liu, Zihan, Chen, Aiyue, Lin, Jing, Yao, Yiwu, Chen, Chen, Leng, Jingwen, Feng, Yu, Guo, Minyi
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2506.05096
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
by: Miao, Wenxuan, et al.
Published: (2025)

RainFusion: Adaptive Video Generation Acceleration via Multi-Dimensional Visual Redundancy
by: Chen, Aiyue, et al.
Published: (2025)

Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy
by: Feng, Yu, et al.
Published: (2025)

RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention
by: Chen, Aiyue, et al.
Published: (2025)

Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations
by: Feng, Yu, et al.
Published: (2024)

SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting
by: Huang, Xiaotong, et al.
Published: (2025)

Design the Quantum Instruction Set with the Cartan Coordinate Analysis Framework
by: Wu, Anbang, et al.
Published: (2024)

Accelerating Diffusion Transformers with Token-wise Feature Caching
by: Zou, Chang, et al.
Published: (2024)

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture
by: Feng, Yu, et al.
Published: (2024)

Accelerating Sparse DNNs Based on Tiled GEMM
by: Guo, Cong, et al.
Published: (2024)

DASH: Deterministic Attention Scheduling for High-throughput Reproducible LLM Training
by: Qiang, Xinwei, et al.
Published: (2026)

SLTarch: Towards Scalable Point-Based Neural Rendering by Taming Workload Imbalance and Memory Irregularity
by: Li, Xingyang, et al.
Published: (2025)

SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
by: Wang, Kunyun, et al.
Published: (2024)

StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic Termination
by: Feng, Yu, et al.
Published: (2025)

CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels
by: Ma, Xing, et al.
Published: (2026)

On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
by: Li, Zhengyi, et al.
Published: (2026)

Rethinking Token-wise Feature Caching: Accelerating Diffusion Transformers with Dual Feature Caching
by: Zou, Chang, et al.
Published: (2024)

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
by: Huang, Ziyu, et al.
Published: (2025)

ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
by: Luo, Xinhao, et al.
Published: (2025)

M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type
by: Hu, Weiming, et al.
Published: (2025)

FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference
by: Liu, Guangda, et al.
Published: (2025)

An Optimizing Framework on MLIR for Efficient FPGA-based Accelerator Generation
by: Zhang, Weichuang, et al.
Published: (2024)

Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing
by: Huang, Xiaotong, et al.
Published: (2025)

An Efficient Private GPT Never Autoregressively Decodes
by: Li, Zhengyi, et al.
Published: (2025)

Token Caching for Diffusion Transformer Acceleration
by: Lou, Jinming, et al.
Published: (2024)

VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference
by: Liu, Zihan, et al.
Published: (2025)

A Synergistic CNN-Transformer Network with Pooling Attention Fusion for Hyperspectral Image Classification
by: Chen, Peng, et al.
Published: (2026)

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
by: Chen, Hongyu, et al.
Published: (2026)

Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-Task
by: Wang, Jing, et al.
Published: (2024)

Taming Extreme Tokens: Covariance-Aware GRPO with Gaussian-Kernel Advantage Reweighting
by: Wang, Cheng, et al.
Published: (2026)

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)

eLLM: Elastic Memory Management Framework for Efficient LLM Serving
by: Xu, Jiale, et al.
Published: (2025)

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity
by: Zhu, Haowei, et al.
Published: (2026)

Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
by: Peng, Haosong, et al.
Published: (2024)

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
by: Zhang, Qijun, et al.
Published: (2026)

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding
by: Guan, Yue, et al.
Published: (2025)

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
by: Zhong, Yiwu, et al.
Published: (2024)

Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
by: Chen, Pengtao, et al.
Published: (2025)

Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)

Block-wise Adaptive Caching for Accelerating Diffusion Policy
by: Ji, Kangye, et al.
Published: (2025)