:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wei, Jinfeng, Zhang, Xiaofeng
Format:	Preprint
Published:	2024
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2407.15130
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders
by: Zhu, Xiaofeng, et al.
Published: (2024)

Measuring the Redundancy of Decoder Layers in SpeechLLMs
by: Moumen, Adel, et al.
Published: (2026)

DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
by: Gema, Aryo Pradipta, et al.
Published: (2024)

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
by: Li, Jia-Nan, et al.
Published: (2025)

Self-Speculative Biased Decoding for Faster Re-Translation
by: Zeng, Linxiao, et al.
Published: (2025)

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
by: Hu, Lanxiang, et al.
Published: (2024)

Model Assembly Learning with Heterogeneous Layer Weight Merging
by: Zhang, Yi-Kai, et al.
Published: (2025)

LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation
by: Sun, Yang, et al.
Published: (2025)

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
by: Zhang, Hongxiang, et al.
Published: (2025)

Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
by: Zeng, Ziqian, et al.
Published: (2024)

Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models
by: Wu, Jialiang, et al.
Published: (2025)

Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
by: Sun, Chenxi, et al.
Published: (2024)

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)

ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
by: Anwar, Abrar, et al.
Published: (2024)

Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
by: Koneru, Sai, et al.
Published: (2024)

Cross-Attention Speculative Decoding
by: Zhong, Wei, et al.
Published: (2025)

Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration
by: Wei, Longxuan, et al.
Published: (2026)

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning
by: Li, Xintong, et al.
Published: (2026)

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
by: Elhoushi, Mostafa, et al.
Published: (2024)

Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery
by: Wei, Bowen
Published: (2025)

OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)

DeepEdit: Knowledge Editing as Decoding with Constraints
by: Wang, Yiwei, et al.
Published: (2024)

Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
by: Zhang, Kexun, et al.
Published: (2023)

Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement
by: Zhu, Zipeng, et al.
Published: (2026)

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
by: Chuang, Yung-Sung, et al.
Published: (2023)

Scaling Laws for Speculative Decoding
by: Yan, Siyuan, et al.
Published: (2025)

Batch Speculative Decoding Done Right
by: Zhang, Ranran Haoran, et al.
Published: (2025)

Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
by: Liu, Quan, et al.
Published: (2024)

SynDec: A Synthesize-then-Decode Approach for Arbitrary Textual Style Transfer via Large Language Models
by: Sun, Han, et al.
Published: (2025)

Dynamic Depth Decoding: Faster Speculative Decoding for LLMs
by: Brown, Oscar, et al.
Published: (2024)

LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
by: Lin, Gang, et al.
Published: (2026)

Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph
by: Huang, Rikui, et al.
Published: (2024)

fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations
by: Li, Jinfeng, et al.
Published: (2024)

Revealing and Mitigating Over-Attention in Knowledge Editing
by: Wang, Pinzheng, et al.
Published: (2025)

Speculative Decoding for Multi-Sample Inference
by: Li, Yiwei, et al.
Published: (2025)

Reverse That Number! Decoding Order Matters in Arithmetic Learning
by: Zhang-Li, Daniel, et al.
Published: (2024)

Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity
by: Zhang, Bowen, et al.
Published: (2026)

RASD: Retrieval-Augmented Speculative Decoding
by: Quan, Guofeng, et al.
Published: (2025)

The Diminishing Returns of Early-Exit Decoding in Modern LLMs
by: Wei, Rui, et al.
Published: (2026)

RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding
by: Zhang, Zihong, et al.
Published: (2026)