Saved in:
| Main Authors: | Wang, Shengnan, Bai, Youhui, Zhang, Lin, Zhou, Pingyi, Zhao, Shixiong, Zhang, Gong, Wang, Sen, Chen, Renhai, Xu, Hua, Sun, Hongwei |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2405.17755 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)
by: Hu, Jie, et al.
Published: (2025)
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference
by: Jin, Zewen, et al.
Published: (2025)
by: Jin, Zewen, et al.
Published: (2025)
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
by: Gong, Ping, et al.
Published: (2025)
by: Gong, Ping, et al.
Published: (2025)
LiteCache: A Query Similarity-Driven, GPU-Centric KVCache Subsystem for Efficient LLM Inference
by: Yi, Jiawei, et al.
Published: (2025)
by: Yi, Jiawei, et al.
Published: (2025)
HyLRA: Hybrid Layer Reuse Attention for Efficient Long-Context Inference
by: Ai, Xuan, et al.
Published: (2026)
by: Ai, Xuan, et al.
Published: (2026)
AdaCluster: Adaptive Query-Key Clustering for Sparse Attention in Video Generation
by: Tan, Haoyue, et al.
Published: (2026)
by: Tan, Haoyue, et al.
Published: (2026)
Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism
by: Zhao, Long, et al.
Published: (2026)
by: Zhao, Long, et al.
Published: (2026)
Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training
by: Xu, Guanbin, et al.
Published: (2026)
by: Xu, Guanbin, et al.
Published: (2026)
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
by: Liu, Jiaheng, et al.
Published: (2024)
by: Liu, Jiaheng, et al.
Published: (2024)
Train Short, Inference Long: Training-free Horizon Extension for Autoregressive Video Generation
by: Li, Jia, et al.
Published: (2026)
by: Li, Jia, et al.
Published: (2026)
Making MoE-based LLM Inference Resilient with Tarragon
by: Zhang, Songyu, et al.
Published: (2026)
by: Zhang, Songyu, et al.
Published: (2026)
iSeg: An Iterative Refinement-based Framework for Training-free Segmentation
by: Sun, Lin, et al.
Published: (2024)
by: Sun, Lin, et al.
Published: (2024)
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference
by: Zuo, Youhui, et al.
Published: (2025)
by: Zuo, Youhui, et al.
Published: (2025)
MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs
by: Sun, Hui, et al.
Published: (2025)
by: Sun, Hui, et al.
Published: (2025)
A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance
by: Tzachristas, Georgios, et al.
Published: (2025)
by: Tzachristas, Georgios, et al.
Published: (2025)
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
by: Wang, Zixin, et al.
Published: (2024)
by: Wang, Zixin, et al.
Published: (2024)
Uncertainty-Aware Bayes' Rule and Its Applications
by: Wang, Shixiong
Published: (2023)
by: Wang, Shixiong
Published: (2023)
SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition
by: Yang, Jingxiao, et al.
Published: (2026)
by: Yang, Jingxiao, et al.
Published: (2026)
Scene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction
by: Li, Wenhao, et al.
Published: (2024)
by: Li, Wenhao, et al.
Published: (2024)
Bioinspired Directional Hydrogel‐Based High‐Performance Flexible Sensor for Multiple Jumping Pattern Detection in Athletic Training
by: Hanqi Wang, et al.
Published: (2025)
by: Hanqi Wang, et al.
Published: (2025)
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
by: Hua, Ermo, et al.
Published: (2024)
by: Hua, Ermo, et al.
Published: (2024)
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension
by: Wang, Ning, et al.
Published: (2024)
by: Wang, Ning, et al.
Published: (2024)
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
by: Zhu, Dawei, et al.
Published: (2023)
by: Zhu, Dawei, et al.
Published: (2023)
Segmentation-guided Layer-wise Image Vectorization with Gradient Fills
by: Zhou, Hengyu, et al.
Published: (2024)
by: Zhou, Hengyu, et al.
Published: (2024)
Using Cu‐Based Metal–Organic Framework as a Comprehensive and Powerful Antioxidant Nanozyme for Efficient Osteoarthritis Treatment
by: Bo Yu, et al.
Published: (2024)
by: Bo Yu, et al.
Published: (2024)
Training-free Geometric Image Editing on Diffusion Models
by: Zhu, Hanshen, et al.
Published: (2025)
by: Zhu, Hanshen, et al.
Published: (2025)
Towards Feedback-to-Plan Decisions for Self-Evolving LLM Agents in CUDA Kernel Generation
by: Chong, Yee Hin, et al.
Published: (2026)
by: Chong, Yee Hin, et al.
Published: (2026)
SSMRadNet : A Sample-wise State-Space Framework for Efficient and Ultra-Light Radar Segmentation and Object Detection
by: Sen, Anuab, et al.
Published: (2025)
by: Sen, Anuab, et al.
Published: (2025)
Distributional Robustness Bounds Generalization Errors
by: Wang, Shixiong, et al.
Published: (2022)
by: Wang, Shixiong, et al.
Published: (2022)
NEFT: A Unified Transformer Framework for Efficient Near-Field CSI Feedback in XL-MIMO Systems
by: Mao, Tianqi, et al.
Published: (2025)
by: Mao, Tianqi, et al.
Published: (2025)
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
by: Luo, Cheng, et al.
Published: (2025)
by: Luo, Cheng, et al.
Published: (2025)
Tensor-Structured Bayesian Channel Prediction for Upper Mid-Band XL-MIMO Systems
by: Hou, Hongwei, et al.
Published: (2025)
by: Hou, Hongwei, et al.
Published: (2025)
Predicting Miscibility in Binary Compounds: A Machine Learning and Genetic Algorithm Study
by: Feng, Chiwen, et al.
Published: (2024)
by: Feng, Chiwen, et al.
Published: (2024)
Beam-Delay Domain Channel Estimation for mmWave XL-MIMO Systems
by: Hou, Hongwei, et al.
Published: (2023)
by: Hou, Hongwei, et al.
Published: (2023)
SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training
by: Liu, Yuliang, et al.
Published: (2025)
by: Liu, Yuliang, et al.
Published: (2025)
VecAttention: Vector-wise Sparse Attention for Accelerating Long Context Inference
by: Liu, Anmin, et al.
Published: (2026)
by: Liu, Anmin, et al.
Published: (2026)
Near-Field Multiuser Beam Training for XL-MIMO: An End-to-End Interference-Aware Approach with Pilot Limitations
by: Li, Xinyang, et al.
Published: (2026)
by: Li, Xinyang, et al.
Published: (2026)
Using a Functional Wool Keratin Photoresist to Build Iridescent and Fluorescent 3D Micro‐Pattern for Dual‐Mode Optical Anti‐Counterfeiting
by: Shuang Xia, et al.
Published: (2025)
by: Shuang Xia, et al.
Published: (2025)
Multimodal Contrastive Learning for 3D Object Classification and Part‐Segmentation by Leveraging V‐LLM and CNNs
by: Jiaxin Jiang, et al.
Published: (2025)
by: Jiaxin Jiang, et al.
Published: (2025)
Dissecting Conditional Branch Predictors of Apple Firestorm and Qualcomm Oryon for Software Optimization and Architectural Analysis
by: Chen, Jiajie, et al.
Published: (2024)
by: Chen, Jiajie, et al.
Published: (2024)
Similar Items
-
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025) -
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference
by: Jin, Zewen, et al.
Published: (2025) -
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
by: Gong, Ping, et al.
Published: (2025) -
LiteCache: A Query Similarity-Driven, GPU-Centric KVCache Subsystem for Efficient LLM Inference
by: Yi, Jiawei, et al.
Published: (2025) -
HyLRA: Hybrid Layer Reuse Attention for Efficient Long-Context Inference
by: Ai, Xuan, et al.
Published: (2026)