Saved in:
| Main Authors: | Zheng, Liang, Shi, Bowen, Hu, Yitao, Zhang, Jiawei, Li, Ruofan, Chen, Sheng, Li, Wenxin, Li, Keqiu |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.06562 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Harpagon: Minimizing DNN Serving Cost via Efficient Dispatching, Scheduling and Splitting
by: Zhao, Zhixin, et al.
Published: (2024)
by: Zhao, Zhixin, et al.
Published: (2024)
PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel
by: Yi, Jinjun, et al.
Published: (2025)
by: Yi, Jinjun, et al.
Published: (2025)
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
by: Wang, Zhengchao, et al.
Published: (2025)
by: Wang, Zhengchao, et al.
Published: (2025)
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
by: Liu, Xiaoran, et al.
Published: (2025)
by: Liu, Xiaoran, et al.
Published: (2025)
Taming Wild Knots with Mosaics
by: Deng, Mary Y., et al.
Published: (2026)
by: Deng, Mary Y., et al.
Published: (2026)
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
by: Jin, Bowen, et al.
Published: (2024)
by: Jin, Bowen, et al.
Published: (2024)
UT-ACA: Uncertainty-Triggered Adaptive Context Allocation for Long-Context Inference
by: Zhou, Lang, et al.
Published: (2026)
by: Zhou, Lang, et al.
Published: (2026)
A multi‐dimensional incentive mechanism based on age of update in hierarchical federated learning
by: Zhaohua Zheng, et al.
Published: (2024)
by: Zhaohua Zheng, et al.
Published: (2024)
Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching
by: Pang, Bowen, et al.
Published: (2025)
by: Pang, Bowen, et al.
Published: (2025)
Taming Stable Diffusion for Computed Tomography Blind Super-Resolution
by: Li, Chunlei, et al.
Published: (2025)
by: Li, Chunlei, et al.
Published: (2025)
Are Large Language Models In-Context Graph Learners?
by: Li, Jintang, et al.
Published: (2025)
by: Li, Jintang, et al.
Published: (2025)
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
by: Wang, Zihao, et al.
Published: (2024)
by: Wang, Zihao, et al.
Published: (2024)
Training-Inference Consistent Segmented Execution for Long-Context LLMs
by: Shang, Xianpeng, et al.
Published: (2026)
by: Shang, Xianpeng, et al.
Published: (2026)
Long-Context Speech Synthesis with Context-Aware Memory
by: Li, Zhipeng, et al.
Published: (2025)
by: Li, Zhipeng, et al.
Published: (2025)
Memory Mosaics
by: Zhang, Jianyu, et al.
Published: (2024)
by: Zhang, Jianyu, et al.
Published: (2024)
ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs
by: Sui, Yifan, et al.
Published: (2025)
by: Sui, Yifan, et al.
Published: (2025)
LongRM: Revealing and Unlocking the Context Boundary of Reward Modeling
by: Tang, Zecheng, et al.
Published: (2025)
by: Tang, Zecheng, et al.
Published: (2025)
XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference
by: Li, Weizhuo, et al.
Published: (2024)
by: Li, Weizhuo, et al.
Published: (2024)
Lookahead Path Likelihood Optimization for Diffusion LLMs
by: Liu, Xuejie, et al.
Published: (2026)
by: Liu, Xuejie, et al.
Published: (2026)
HyperMem: Hypergraph Memory for Long-Term Conversations
by: Yue, Juwei, et al.
Published: (2026)
by: Yue, Juwei, et al.
Published: (2026)
Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving
by: Fan, Jiakun, et al.
Published: (2025)
by: Fan, Jiakun, et al.
Published: (2025)
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference
by: Wang, Guangtao, et al.
Published: (2025)
by: Wang, Guangtao, et al.
Published: (2025)
Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning
by: Chen, Zhuoen, et al.
Published: (2026)
by: Chen, Zhuoen, et al.
Published: (2026)
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts
by: Sivtsov, Danil, et al.
Published: (2025)
by: Sivtsov, Danil, et al.
Published: (2025)
Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)
by: Hu, Jie, et al.
Published: (2025)
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
by: Duan, Zheng-Peng, et al.
Published: (2025)
by: Duan, Zheng-Peng, et al.
Published: (2025)
Memory Mosaics at scale
by: Zhang, Jianyu, et al.
Published: (2025)
by: Zhang, Jianyu, et al.
Published: (2025)
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization
by: Shen, Weizhou, et al.
Published: (2025)
by: Shen, Weizhou, et al.
Published: (2025)
Linear recurrence sequences and palindromic concatenations of two repdigits in base $β$
by: Li, Ruofan
Published: (2026)
by: Li, Ruofan
Published: (2026)
Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail
by: Li, Yingru, et al.
Published: (2025)
by: Li, Yingru, et al.
Published: (2025)
CoDiCast: Conditional Diffusion Model for Global Weather Prediction with Uncertainty Quantification
by: Shi, Jimeng, et al.
Published: (2024)
by: Shi, Jimeng, et al.
Published: (2024)
Query-focused and Memory-aware Reranker for Long Context Processing
by: Li, Yuqing, et al.
Published: (2026)
by: Li, Yuqing, et al.
Published: (2026)
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
by: Li, Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu
Published: (2025)
by: Li, Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu
Published: (2025)
Mitigating Context-Memory Conflicts in LLMs through Dynamic Cognitive Reconciliation Decoding
by: Zhou, Yigeng, et al.
Published: (2026)
by: Zhou, Yigeng, et al.
Published: (2026)
Chameleon: Taming Dynamic Operator Sequences for Memory-Intensive LLM Training
by: Wang, Zibo, et al.
Published: (2025)
by: Wang, Zibo, et al.
Published: (2025)
MiA-Signature: Approximating Global Activation for Long-Context Understanding
by: Li, Yuqing, et al.
Published: (2026)
by: Li, Yuqing, et al.
Published: (2026)
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
by: Ge, Suyu, et al.
Published: (2024)
by: Ge, Suyu, et al.
Published: (2024)
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference
by: Xiao, Qingfa, et al.
Published: (2025)
by: Xiao, Qingfa, et al.
Published: (2025)
DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs
by: Zhou, Xiabin, et al.
Published: (2024)
by: Zhou, Xiabin, et al.
Published: (2024)
TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions
by: Li, Kevin, et al.
Published: (2024)
by: Li, Kevin, et al.
Published: (2024)
Similar Items
-
Harpagon: Minimizing DNN Serving Cost via Efficient Dispatching, Scheduling and Splitting
by: Zhao, Zhixin, et al.
Published: (2024) -
PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel
by: Yi, Jinjun, et al.
Published: (2025) -
RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems
by: Wang, Zhengchao, et al.
Published: (2025) -
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs
by: Liu, Xiaoran, et al.
Published: (2025) -
Taming Wild Knots with Mosaics
by: Deng, Mary Y., et al.
Published: (2026)