:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Kai, Su, Zhan, Dong, Peijie, Mo, Fengran, Gao, Jianfei, Zhang, ShaoTing, Chen, Kai
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2507.19353
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity
by: Ma, Da, et al.
Published: (2024)

Language Modeling Using Tensor Trains
by: Su, Zhan, et al.
Published: (2024)

Conversational Search: From Fundamentals to Frontiers in the LLM Era
by: Mo, Fengran, et al.
Published: (2025)

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
by: Liu, Di, et al.
Published: (2024)

Evaluating Zero-Shot Long-Context LLM Compression
by: Wang, Chenyu, et al.
Published: (2024)

Bridging the Gap: From Ad-hoc to Proactive Search in Conversations
by: Meng, Chuan, et al.
Published: (2025)

An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks
by: Zhou, Xin, et al.
Published: (2025)

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation
by: Shao, Qiwei, et al.
Published: (2024)

Recurrent Context Compression: Efficiently Expanding the Context Window of LLM
by: Huang, Chensen, et al.
Published: (2024)

FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
by: Chen, Zhuokun, et al.
Published: (2026)

A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models
by: Wang, Jiayin, et al.
Published: (2024)

LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
by: Ling, Zhan, et al.
Published: (2025)

Squeezed Attention: Accelerating Long Context Length LLM Inference
by: Hooper, Coleman, et al.
Published: (2024)

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)

MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning?
by: Yan, Kai, et al.
Published: (2025)

Scaling Long-Horizon LLM Agent via Context-Folding
by: Sun, Weiwei, et al.
Published: (2025)

LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
by: Gu, Zhuohan, et al.
Published: (2024)

AttnCache: Accelerating Self-Attention Inference for LLM Prefill via Attention Cache
by: Song, Dinghong, et al.
Published: (2025)

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
by: Jin, Hongye, et al.
Published: (2024)

Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search
by: Mo, Fengran, et al.
Published: (2024)

History-Aware Conversational Dense Retrieval
by: Mo, Fengran, et al.
Published: (2024)

Bridging the Gap between Different Vocabularies for LLM Ensemble
by: Xu, Yangyifan, et al.
Published: (2024)

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
by: Xiao, Guangxuan, et al.
Published: (2024)

Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards
by: Jia, Ruipeng, et al.
Published: (2025)

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
by: Tang, Zhenheng, et al.
Published: (2025)

Bridging Writing Manner Gap in Visual Instruction Tuning by Creating LLM-aligned Instructions
by: Jing, Dong, et al.
Published: (2025)

LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning
by: Zhu, Yu, et al.
Published: (2026)

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
by: Zhu, Qianchao, et al.
Published: (2024)

When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life
by: Lou, Xinyue, et al.
Published: (2026)

Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
by: Liu, Yanming, et al.
Published: (2024)

FedCoT: Communication-Efficient Federated Reasoning Enhancement for Large Language Models
by: Li, Chuan, et al.
Published: (2025)

Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention
by: Mo, Shibing, et al.
Published: (2025)

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
by: Li, Zheng, et al.
Published: (2025)

Learning to Route Queries to Heads for Attention-based Re-ranking with Large Language Models
by: Tian, Yuxing, et al.
Published: (2026)

DiSRouter: Distributed Self-Routing for LLM Selections
by: Zheng, Hang, et al.
Published: (2025)

Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference
by: Wu, Zimeng, et al.
Published: (2026)

NeedleBench: Evaluating LLM Retrieval and Reasoning Across Varying Information Densities
by: Li, Mo, et al.
Published: (2024)

Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation
by: Shao, Shuo, et al.
Published: (2025)

Bridging Human and LLM Judgments: Understanding and Narrowing the Gap
by: Polo, Felipe Maia, et al.
Published: (2025)