Saved in:
| Main Authors: | Liu, Kai, Su, Zhan, Dong, Peijie, Mo, Fengran, Gao, Jianfei, Zhang, ShaoTing, Chen, Kai |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.19353 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)
by: Ma, Da, et al.
Published: (2024)
Language Modeling Using Tensor Trains
by: Su, Zhan, et al.
Published: (2024)
by: Su, Zhan, et al.
Published: (2024)
Conversational Search: From Fundamentals to Frontiers in the LLM Era
by: Mo, Fengran, et al.
Published: (2025)
by: Mo, Fengran, et al.
Published: (2025)
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
by: Liu, Di, et al.
Published: (2024)
by: Liu, Di, et al.
Published: (2024)
Evaluating Zero-Shot Long-Context LLM Compression
by: Wang, Chenyu, et al.
Published: (2024)
by: Wang, Chenyu, et al.
Published: (2024)
Bridging the Gap: From Ad-hoc to Proactive Search in Conversations
by: Meng, Chuan, et al.
Published: (2025)
by: Meng, Chuan, et al.
Published: (2025)
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks
by: Zhou, Xin, et al.
Published: (2025)
by: Zhou, Xin, et al.
Published: (2025)
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)
by: Liu, Xiang, et al.
Published: (2025)
Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation
by: Shao, Qiwei, et al.
Published: (2024)
by: Shao, Qiwei, et al.
Published: (2024)
Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
by: Huang, Chensen, et al.
Published: (2024)
by: Huang, Chensen, et al.
Published: (2024)
FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
by: Chen, Zhuokun, et al.
Published: (2026)
by: Chen, Zhuokun, et al.
Published: (2026)
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models
by: Wang, Jiayin, et al.
Published: (2024)
by: Wang, Jiayin, et al.
Published: (2024)
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
by: Ling, Zhan, et al.
Published: (2025)
by: Ling, Zhan, et al.
Published: (2025)
Squeezed Attention: Accelerating Long Context Length LLM Inference
by: Hooper, Coleman, et al.
Published: (2024)
by: Hooper, Coleman, et al.
Published: (2024)
ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)
by: Tian, Yuxing, et al.
Published: (2026)
MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?
by: Yan, Kai, et al.
Published: (2025)
by: Yan, Kai, et al.
Published: (2025)
Scaling Long-Horizon LLM Agent via Context-Folding
by: Sun, Weiwei, et al.
Published: (2025)
by: Sun, Weiwei, et al.
Published: (2025)
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
by: Gu, Zhuohan, et al.
Published: (2024)
by: Gu, Zhuohan, et al.
Published: (2024)
AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache
by: Song, Dinghong, et al.
Published: (2025)
by: Song, Dinghong, et al.
Published: (2025)
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
by: Jin, Hongye, et al.
Published: (2024)
by: Jin, Hongye, et al.
Published: (2024)
Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search
by: Mo, Fengran, et al.
Published: (2024)
by: Mo, Fengran, et al.
Published: (2024)
History-Aware Conversational Dense Retrieval
by: Mo, Fengran, et al.
Published: (2024)
by: Mo, Fengran, et al.
Published: (2024)
Bridging the Gap between Different Vocabularies for LLM Ensemble
by: Xu, Yangyifan, et al.
Published: (2024)
by: Xu, Yangyifan, et al.
Published: (2024)
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
by: Xiao, Guangxuan, et al.
Published: (2024)
by: Xiao, Guangxuan, et al.
Published: (2024)
Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards
by: Jia, Ruipeng, et al.
Published: (2025)
by: Jia, Ruipeng, et al.
Published: (2025)
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
by: Tang, Zhenheng, et al.
Published: (2025)
by: Tang, Zhenheng, et al.
Published: (2025)
Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions
by: Jing, Dong, et al.
Published: (2025)
by: Jing, Dong, et al.
Published: (2025)
LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning
by: Zhu, Yu, et al.
Published: (2026)
by: Zhu, Yu, et al.
Published: (2026)
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)
by: Zhu, Qianchao, et al.
Published: (2024)
When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life
by: Lou, Xinyue, et al.
Published: (2026)
by: Lou, Xinyue, et al.
Published: (2026)
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
by: Liu, Yanming, et al.
Published: (2024)
by: Liu, Yanming, et al.
Published: (2024)
FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models
by: Li, Chuan, et al.
Published: (2025)
by: Li, Chuan, et al.
Published: (2025)
Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention
by: Mo, Shibing, et al.
Published: (2025)
by: Mo, Shibing, et al.
Published: (2025)
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
by: Li, Zheng, et al.
Published: (2025)
by: Li, Zheng, et al.
Published: (2025)
Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models
by: Tian, Yuxing, et al.
Published: (2026)
by: Tian, Yuxing, et al.
Published: (2026)
DiSRouter: Distributed Self-Routing for LLM Selections
by: Zheng, Hang, et al.
Published: (2025)
by: Zheng, Hang, et al.
Published: (2025)
Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference
by: Wu, Zimeng, et al.
Published: (2026)
by: Wu, Zimeng, et al.
Published: (2026)
NeedleBench: Evaluating LLM Retrieval and Reasoning Across Varying Information Densities
by: Li, Mo, et al.
Published: (2024)
by: Li, Mo, et al.
Published: (2024)
Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
by: Shao, Shuo, et al.
Published: (2025)
by: Shao, Shuo, et al.
Published: (2025)
Bridging Human and LLM Judgments: Understanding and Narrowing the Gap
by: Polo, Felipe Maia, et al.
Published: (2025)
by: Polo, Felipe Maia, et al.
Published: (2025)
Similar Items
-
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024) -
Language Modeling Using Tensor Trains
by: Su, Zhan, et al.
Published: (2024) -
Conversational Search: From Fundamentals to Frontiers in the LLM Era
by: Mo, Fengran, et al.
Published: (2025) -
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
by: Liu, Di, et al.
Published: (2024) -
Evaluating Zero-Shot Long-Context LLM Compression
by: Wang, Chenyu, et al.
Published: (2024)