Saved in:
| Main Authors: | Wei, Jinfeng, Zhang, Xiaofeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.15130 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders
by: Zhu, Xiaofeng, et al.
Published: (2024)
by: Zhu, Xiaofeng, et al.
Published: (2024)
Measuring the Redundancy of Decoder Layers in SpeechLLMs
by: Moumen, Adel, et al.
Published: (2026)
by: Moumen, Adel, et al.
Published: (2026)
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
by: Gema, Aryo Pradipta, et al.
Published: (2024)
by: Gema, Aryo Pradipta, et al.
Published: (2024)
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
by: Li, Jia-Nan, et al.
Published: (2025)
by: Li, Jia-Nan, et al.
Published: (2025)
Self-Speculative Biased Decoding for Faster Re-Translation
by: Zeng, Linxiao, et al.
Published: (2025)
by: Zeng, Linxiao, et al.
Published: (2025)
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
by: Hu, Lanxiang, et al.
Published: (2024)
by: Hu, Lanxiang, et al.
Published: (2024)
Model Assembly Learning with Heterogeneous Layer Weight Merging
by: Zhang, Yi-Kai, et al.
Published: (2025)
by: Zhang, Yi-Kai, et al.
Published: (2025)
LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation
by: Sun, Yang, et al.
Published: (2025)
by: Sun, Yang, et al.
Published: (2025)
Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
by: Zhang, Hongxiang, et al.
Published: (2025)
by: Zhang, Hongxiang, et al.
Published: (2025)
Chimera: A Lossless Decoding Method for Accelerating Large Language Models Inference by Fusing all Tokens
by: Zeng, Ziqian, et al.
Published: (2024)
by: Zeng, Ziqian, et al.
Published: (2024)
Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models
by: Wu, Jialiang, et al.
Published: (2025)
by: Wu, Jialiang, et al.
Published: (2025)
Decoding at the Speed of Thought: Harnessing Parallel Decoding of Lexical Units for LLMs
by: Sun, Chenxi, et al.
Published: (2024)
by: Sun, Chenxi, et al.
Published: (2024)
ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting
by: Tian, Yuxing, et al.
Published: (2026)
by: Tian, Yuxing, et al.
Published: (2026)
ReMEmbR: Building and Reasoning Over Long-Horizon Spatio-Temporal Memory for Robot Navigation
by: Anwar, Abrar, et al.
Published: (2024)
by: Anwar, Abrar, et al.
Published: (2024)
Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies
by: Koneru, Sai, et al.
Published: (2024)
by: Koneru, Sai, et al.
Published: (2024)
Cross-Attention Speculative Decoding
by: Zhong, Wei, et al.
Published: (2025)
by: Zhong, Wei, et al.
Published: (2025)
Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration
by: Wei, Longxuan, et al.
Published: (2026)
by: Wei, Longxuan, et al.
Published: (2026)
Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning
by: Li, Xintong, et al.
Published: (2026)
by: Li, Xintong, et al.
Published: (2026)
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
by: Elhoushi, Mostafa, et al.
Published: (2024)
by: Elhoushi, Mostafa, et al.
Published: (2024)
Rationale-Augmented Retrieval with Constrained LLM Re-Ranking for Task Discovery
by: Wei, Bowen
Published: (2025)
by: Wei, Bowen
Published: (2025)
OR-Bench: An Over-Refusal Benchmark for Large Language Models
by: Cui, Justin, et al.
Published: (2024)
by: Cui, Justin, et al.
Published: (2024)
DeepEdit: Knowledge Editing as Decoding with Constraints
by: Wang, Yiwei, et al.
Published: (2024)
by: Wang, Yiwei, et al.
Published: (2024)
Don't Fine-Tune, Decode: Syntax Error-Free Tool Use via Constrained Decoding
by: Zhang, Kexun, et al.
Published: (2023)
by: Zhang, Kexun, et al.
Published: (2023)
Beyond Static Cropping: Layer-Adaptive Visual Localization and Decoding Enhancement
by: Zhu, Zipeng, et al.
Published: (2026)
by: Zhu, Zipeng, et al.
Published: (2026)
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
by: Chuang, Yung-Sung, et al.
Published: (2023)
by: Chuang, Yung-Sung, et al.
Published: (2023)
Scaling Laws for Speculative Decoding
by: Yan, Siyuan, et al.
Published: (2025)
by: Yan, Siyuan, et al.
Published: (2025)
Batch Speculative Decoding Done Right
by: Zhang, Ranran Haoran, et al.
Published: (2025)
by: Zhang, Ranran Haoran, et al.
Published: (2025)
Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions
by: Liu, Quan, et al.
Published: (2024)
by: Liu, Quan, et al.
Published: (2024)
SynDec: A Synthesize-then-Decode Approach for Arbitrary Textual Style Transfer via Large Language Models
by: Sun, Han, et al.
Published: (2025)
by: Sun, Han, et al.
Published: (2025)
Dynamic Depth Decoding: Faster Speculative Decoding for LLMs
by: Brown, Oscar, et al.
Published: (2024)
by: Brown, Oscar, et al.
Published: (2024)
LycheeDecode: Accelerating Long-Context LLM Inference via Hybrid-Head Sparse Decoding
by: Lin, Gang, et al.
Published: (2026)
by: Lin, Gang, et al.
Published: (2026)
Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph
by: Huang, Rikui, et al.
Published: (2024)
by: Huang, Rikui, et al.
Published: (2024)
fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations
by: Li, Jinfeng, et al.
Published: (2024)
by: Li, Jinfeng, et al.
Published: (2024)
Revealing and Mitigating Over-Attention in Knowledge Editing
by: Wang, Pinzheng, et al.
Published: (2025)
by: Wang, Pinzheng, et al.
Published: (2025)
Speculative Decoding for Multi-Sample Inference
by: Li, Yiwei, et al.
Published: (2025)
by: Li, Yiwei, et al.
Published: (2025)
Reverse That Number! Decoding Order Matters in Arithmetic Learning
by: Zhang-Li, Daniel, et al.
Published: (2024)
by: Zhang-Li, Daniel, et al.
Published: (2024)
Not All Layers Need Tuning: Selective Layer Restoration Recovers Diversity
by: Zhang, Bowen, et al.
Published: (2026)
by: Zhang, Bowen, et al.
Published: (2026)
RASD: Retrieval-Augmented Speculative Decoding
by: Quan, Guofeng, et al.
Published: (2025)
by: Quan, Guofeng, et al.
Published: (2025)
The Diminishing Returns of Early-Exit Decoding in Modern LLMs
by: Wei, Rui, et al.
Published: (2026)
by: Wei, Rui, et al.
Published: (2026)
RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding
by: Zhang, Zihong, et al.
Published: (2026)
by: Zhang, Zihong, et al.
Published: (2026)
Similar Items
-
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders
by: Zhu, Xiaofeng, et al.
Published: (2024) -
Measuring the Redundancy of Decoder Layers in SpeechLLMs
by: Moumen, Adel, et al.
Published: (2026) -
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
by: Gema, Aryo Pradipta, et al.
Published: (2024) -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
by: Li, Jia-Nan, et al.
Published: (2025) -
Self-Speculative Biased Decoding for Faster Re-Translation
by: Zeng, Linxiao, et al.
Published: (2025)