Saved in:
| Main Authors: | Zeng, Ziqian, Yu, Jiahong, Pang, Qianshi, Wang, Zihao, Zhuang, Huiping, Shao, Hongen, Zou, Xiaofeng |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.15758 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
by: Zeng, Ziqian, et al.
Published: (2023)
by: Zeng, Ziqian, et al.
Published: (2023)
LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model
by: Shao, Wei, et al.
Published: (2025)
by: Shao, Wei, et al.
Published: (2025)
Subkv: Quantizing Long Context KV Cache for Sub‐Billion Parameter Language Models on Edge Devices
by: Ziqian Zeng, et al.
Published: (2025)
by: Ziqian Zeng, et al.
Published: (2025)
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
by: Lu, Weikai, et al.
Published: (2025)
by: Lu, Weikai, et al.
Published: (2025)
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
by: Lu, Weikai, et al.
Published: (2024)
by: Lu, Weikai, et al.
Published: (2024)
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
by: Zhang, Tao, et al.
Published: (2024)
by: Zhang, Tao, et al.
Published: (2024)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration
by: Wu, Pengfei, et al.
Published: (2024)
by: Wu, Pengfei, et al.
Published: (2024)
RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior
by: Yang, Junyao, et al.
Published: (2025)
by: Yang, Junyao, et al.
Published: (2025)
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
by: Zhang, Jun, et al.
Published: (2023)
by: Zhang, Jun, et al.
Published: (2023)
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
by: Ou, Jie, et al.
Published: (2024)
by: Ou, Jie, et al.
Published: (2024)
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
by: Cho, Sukmin, et al.
Published: (2025)
by: Cho, Sukmin, et al.
Published: (2025)
Decompose, Plan in Parallel, and Merge: A Novel Paradigm for Large Language Models based Planning with Multiple Constraints
by: Lu, Zhengdong, et al.
Published: (2025)
by: Lu, Zhengdong, et al.
Published: (2025)
TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
by: Wu, Tong, et al.
Published: (2025)
by: Wu, Tong, et al.
Published: (2025)
Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction
by: Mao, Yu, et al.
Published: (2025)
by: Mao, Yu, et al.
Published: (2025)
Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
by: Xiong, Chen, et al.
Published: (2025)
by: Xiong, Chen, et al.
Published: (2025)
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
by: Lin, Feng, et al.
Published: (2024)
by: Lin, Feng, et al.
Published: (2024)
Chimera: Diagnosing Shortcut Learning in Visual-Language Understanding
by: Chi, Ziheng, et al.
Published: (2025)
by: Chi, Ziheng, et al.
Published: (2025)
SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens
by: Liu, Chengbo, et al.
Published: (2024)
by: Liu, Chengbo, et al.
Published: (2024)
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
by: Agrawal, Sudhanshu, et al.
Published: (2025)
by: Agrawal, Sudhanshu, et al.
Published: (2025)
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
by: Sun, Hanshi, et al.
Published: (2024)
by: Sun, Hanshi, et al.
Published: (2024)
Dissecting Fine-Tuning Unlearning in Large Language Models
by: Hong, Yihuai, et al.
Published: (2024)
by: Hong, Yihuai, et al.
Published: (2024)
PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
by: Zeng, Ziqian, et al.
Published: (2024)
by: Zeng, Ziqian, et al.
Published: (2024)
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
by: Luo, Xianzhen, et al.
Published: (2024)
by: Luo, Xianzhen, et al.
Published: (2024)
Token-Efficient Leverage Learning in Large Language Models
by: Zeng, Yuanhao, et al.
Published: (2024)
by: Zeng, Yuanhao, et al.
Published: (2024)
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
by: Yang, Penghui, et al.
Published: (2025)
by: Yang, Penghui, et al.
Published: (2025)
Lossless Token Sequence Compression via Meta-Tokens
by: Harvill, John, et al.
Published: (2025)
by: Harvill, John, et al.
Published: (2025)
CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing
by: Zheng, Wenhao, et al.
Published: (2025)
by: Zheng, Wenhao, et al.
Published: (2025)
Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference
by: Schelpe, Sietse
Published: (2026)
by: Schelpe, Sietse
Published: (2026)
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
by: Wang, Pei-Shuo, et al.
Published: (2025)
by: Wang, Pei-Shuo, et al.
Published: (2025)
Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding
by: Yi, Hanling, et al.
Published: (2024)
by: Yi, Hanling, et al.
Published: (2024)
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
by: Zhao, Yao, et al.
Published: (2023)
by: Zhao, Yao, et al.
Published: (2023)
CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference
by: Li, Zeping, et al.
Published: (2024)
by: Li, Zeping, et al.
Published: (2024)
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
by: Tuli, Shikhar, et al.
Published: (2024)
by: Tuli, Shikhar, et al.
Published: (2024)
TASE: Token Awareness and Structured Evaluation for Multilingual Language Models
by: Zhao, Chenzhuo, et al.
Published: (2025)
by: Zhao, Chenzhuo, et al.
Published: (2025)
FlashDecoding++: Faster Large Language Model Inference on GPUs
by: Hong, Ke, et al.
Published: (2023)
by: Hong, Ke, et al.
Published: (2023)
HAMburger: Accelerating LLM Inference via Token Smashing
by: Liu, Jingyu, et al.
Published: (2025)
by: Liu, Jingyu, et al.
Published: (2025)
Token Signature: Predicting Chain-of-Thought Gains with Token Decoding Feature in Large Language Models
by: Liu, Peijie, et al.
Published: (2025)
by: Liu, Peijie, et al.
Published: (2025)
Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
by: Ye, Haotian, et al.
Published: (2025)
by: Ye, Haotian, et al.
Published: (2025)
Similar Items
-
ConsistentEE: A Consistent and Hardness-Guided Early Exiting Method for Accelerating Language Models Inference
by: Zeng, Ziqian, et al.
Published: (2023) -
LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model
by: Shao, Wei, et al.
Published: (2025) -
Subkv: Quantizing Long Context KV Cache for Sub‐Billion Parameter Language Models on Edge Devices
by: Ziqian Zeng, et al.
Published: (2025) -
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
by: Lu, Weikai, et al.
Published: (2025) -
Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge
by: Lu, Weikai, et al.
Published: (2024)