:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Xie, Wenxuan, Wang, Yujia, Tan, Xin, Lu, Chaochao, Hu, Xia, Wang, Xuhong
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.10021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Decoupling Reasoning and Knowledge Injection for In-Context Knowledge Editing
by: Wang, Changyue, et al.
Published: (2025)

DSPC: Dual-Stage Progressive Compression Framework for Efficient Long-Context Reasoning
by: Gao, Yaxin, et al.
Published: (2025)

Probe and Skip: Self-Predictive Token Skipping for Efficient Long-Context LLM Inference
by: Wu, Zimeng, et al.
Published: (2026)

Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models
by: Bao, Yicheng, et al.
Published: (2026)

Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection
by: Jo, Dongwon, et al.
Published: (2026)

VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
by: Liu, Xin, et al.
Published: (2025)

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models
by: Weng, Fenghua, et al.
Published: (2025)

Dual-Density Inference for Efficient Language Model Reasoning
by: Zhao, Zhengyi, et al.
Published: (2025)

From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs
by: Wang, Shaojie, et al.
Published: (2026)

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
by: Wu, Wei, et al.
Published: (2024)

IP-Dialog: Evaluating Implicit Personalization in Dialogue Systems with Synthetic Data
by: Peng, Bo, et al.
Published: (2025)

D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
by: Wan, Zhongwei, et al.
Published: (2024)

LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model
by: Shao, Wei, et al.
Published: (2025)

MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
by: Wan, Zhongwei, et al.
Published: (2025)

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
by: Fu, Qichen, et al.
Published: (2024)

DiffAdapt: Difficulty-Adaptive Reasoning for Token-Efficient LLM Inference
by: Liu, Xiang, et al.
Published: (2025)

DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
by: Zarch, Hossein Entezari, et al.
Published: (2025)

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning
by: Wang, Li, et al.
Published: (2026)

Exploiting Sparsity for Long Context Inference: Million Token Contexts on Commodity GPUs
by: Synk, Ryan, et al.
Published: (2025)

SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Modeling
by: Liu, Dong, et al.
Published: (2025)

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
by: Jeong, Soyeong, et al.
Published: (2025)

Training-free Context-adaptive Attention for Efficient Long Context Modeling
by: You, Zeng, et al.
Published: (2025)

Efficient Long-Context LLM Inference via KV Cache Clustering
by: Hu, Jie, et al.
Published: (2025)

Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
by: Fei, Weizhi, et al.
Published: (2025)

Latent-Condensed Transformer for Efficient Long Context Modeling
by: You, Zeng, et al.
Published: (2026)

JEPA-Reasoner: Decoupling Latent Reasoning from Token Generation
by: Liu, Bingyang Kelvin, et al.
Published: (2025)

Tokenization Falling Short: On Subword Robustness in Large Language Models
by: Chai, Yekun, et al.
Published: (2024)

Membership Inference Attack against Long-Context Large Language Models
by: Wang, Zixiong, et al.
Published: (2024)

Decoupling Understanding from Reasoning via Problem Space Mapping for Small-Scale Model Reasoning
by: Wang, Li, et al.
Published: (2025)

Enhancing Retrieval Systems with Inference-Time Logical Reasoning
by: Faltings, Felix, et al.
Published: (2025)

LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
by: Hu, Zhiyuan, et al.
Published: (2024)

Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models
by: Chen, Wei, et al.
Published: (2024)

R$^2$PO: Decoupling Training Trajectories from Inference Responses for LLM Reasoning
by: Wang, Jingchu, et al.
Published: (2026)

ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs
by: Li, Xuancheng, et al.
Published: (2026)

SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning
by: Long, Lingkun, et al.
Published: (2025)

VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning
by: Wang, Yibo, et al.
Published: (2026)

Less Languages, Less Tokens: An Efficient Unified Logic Cross-lingual Chain-of-Thought Reasoning Framework
by: Zhang, Chenyuan, et al.
Published: (2026)

HyLRA: Hybrid Layer Reuse Attention for Efficient Long-Context Inference
by: Ai, Xuan, et al.
Published: (2026)

Think Through Uncertainty: Improving Long-Form Generation Factuality via Reasoning Calibration
by: Liu, Xin, et al.
Published: (2026)

Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning
by: Peng, Keqin, et al.
Published: (2026)