Saved in:
| Main Authors: | Mesa, Alejandro Ruiz y, Korol, Guilherme, Riesterer, Moritz, de Lima, João Paulo Cardoso, Castrillon, Jeronimo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08060 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Leveraging Stochastic Depth Training for Adaptive Inference
by: Korol, Guilherme, et al.
Published: (2025)
by: Korol, Guilherme, et al.
Published: (2025)
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)
by: de Lima, João Paulo Cardoso, et al.
Published: (2025)
Full-Stack Optimization for CAM-Only DNN Inference
by: de Lima, João Paulo C., et al.
Published: (2024)
by: de Lima, João Paulo C., et al.
Published: (2024)
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
by: Zhu, Bingjie, et al.
Published: (2025)
by: Zhu, Bingjie, et al.
Published: (2025)
Count2Multiply: Reliable In-Memory High-Radix Counting
by: de Lima, João Paulo Cardoso, et al.
Published: (2024)
by: de Lima, João Paulo Cardoso, et al.
Published: (2024)
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
by: Timor, Nadav, et al.
Published: (2025)
by: Timor, Nadav, et al.
Published: (2025)
The Landscape of Compute-near-memory and Compute-in-memory: A Research and Commercial Overview
by: Khan, Asif Ali, et al.
Published: (2024)
by: Khan, Asif Ali, et al.
Published: (2024)
MING: An Automated CNN-to-Edge MLIR HLS framework
by: Bi, Jiahong, et al.
Published: (2026)
by: Bi, Jiahong, et al.
Published: (2026)
CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms
by: Khan, Asif Ali, et al.
Published: (2022)
by: Khan, Asif Ali, et al.
Published: (2022)
Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement
by: Jeon, Wonseok, et al.
Published: (2024)
by: Jeon, Wonseok, et al.
Published: (2024)
Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding
by: Zhang, Guangyi, et al.
Published: (2025)
by: Zhang, Guangyi, et al.
Published: (2025)
WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference
by: Liu, Zixuan, et al.
Published: (2026)
by: Liu, Zixuan, et al.
Published: (2026)
Demonstrating a Future for MLIR-native DSL Compilers on a NumPy-like Example
by: Friebel, Karl F. A., et al.
Published: (2026)
by: Friebel, Karl F. A., et al.
Published: (2026)
DSSD: Efficient Edge-Device LLM Deployment and Collaborative Inference via Distributed Split Speculative Decoding
by: Ning, Jiahong, et al.
Published: (2025)
by: Ning, Jiahong, et al.
Published: (2025)
GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference
by: Tang, Zengzipeng, et al.
Published: (2026)
by: Tang, Zengzipeng, et al.
Published: (2026)
Hierarchical Verification of Speculative Beams for Accelerating LLM Inference
by: Sen, Jaydip, et al.
Published: (2025)
by: Sen, Jaydip, et al.
Published: (2025)
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
by: Hamdi, Mohamed Amine, et al.
Published: (2024)
by: Hamdi, Mohamed Amine, et al.
Published: (2024)
AMUSD: Asynchronous Multi-Device Speculative Decoding for LLM Acceleration
by: McDanel, Bradley
Published: (2024)
by: McDanel, Bradley
Published: (2024)
SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models
by: Chen, Fahao, et al.
Published: (2025)
by: Chen, Fahao, et al.
Published: (2025)
E-Mapper: Energy-Efficient Resource Allocation for Traditional Operating Systems on Heterogeneous Processors
by: Smejkal, Till, et al.
Published: (2024)
by: Smejkal, Till, et al.
Published: (2024)
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
by: Sun, Mingyu, et al.
Published: (2025)
by: Sun, Mingyu, et al.
Published: (2025)
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
by: Xia, Heming, et al.
Published: (2024)
by: Xia, Heming, et al.
Published: (2024)
VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices
by: Lin, Zi-Wei, et al.
Published: (2026)
by: Lin, Zi-Wei, et al.
Published: (2026)
All-in-Memory Stochastic Computing using ReRAM
by: de Lima, João Paulo C., et al.
Published: (2025)
by: de Lima, João Paulo C., et al.
Published: (2025)
CHIME: Chiplet-based Heterogeneous Near-Memory Acceleration for Edge Multimodal LLM Inference
by: Chen, Yanru, et al.
Published: (2025)
by: Chen, Yanru, et al.
Published: (2025)
Designing Efficient LLM Accelerators for Edge Devices
by: Haris, Jude, et al.
Published: (2024)
by: Haris, Jude, et al.
Published: (2024)
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
by: Butler, Branden, et al.
Published: (2024)
by: Butler, Branden, et al.
Published: (2024)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
by: Wen, Zhuofan, et al.
Published: (2024)
by: Wen, Zhuofan, et al.
Published: (2024)
CARD: A Cache-Assisted Parallel Speculative Decoding Framework via Query-and-Correct Paradigm for Accelerating LLM Inference
by: Zhou, Enyu, et al.
Published: (2025)
by: Zhou, Enyu, et al.
Published: (2025)
Ghidorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism
by: Wei, Jinhui, et al.
Published: (2025)
by: Wei, Jinhui, et al.
Published: (2025)
AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices
by: Zirui, Ma, et al.
Published: (2026)
by: Zirui, Ma, et al.
Published: (2026)
SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission
by: Zheng, Ce, et al.
Published: (2026)
by: Zheng, Ce, et al.
Published: (2026)
SpecPipe: Accelerating Pipeline Parallelism-based LLM Inference with Speculative Decoding
by: Yin, Haofei, et al.
Published: (2025)
by: Yin, Haofei, et al.
Published: (2025)
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding
by: Xi, Yunjia, et al.
Published: (2024)
by: Xi, Yunjia, et al.
Published: (2024)
CATS: Cascaded Adaptive Tree Speculation for Memory-Limited LLM Inference Acceleration
by: Han, Yuning, et al.
Published: (2026)
by: Han, Yuning, et al.
Published: (2026)
SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens
by: Liu, Chengbo, et al.
Published: (2024)
by: Liu, Chengbo, et al.
Published: (2024)
Speculative Decoding for Multi-Sample Inference
by: Li, Yiwei, et al.
Published: (2025)
by: Li, Yiwei, et al.
Published: (2025)
A Pipelined Collaborative Speculative Decoding Framework for Efficient Edge-Cloud LLM Inference
by: Zhang, Yida, et al.
Published: (2026)
by: Zhang, Yida, et al.
Published: (2026)
DiP-SD: Distributed Pipelined Speculative Decoding for Efficient LLM Inference at the Edge
by: Xu, Yaodan, et al.
Published: (2026)
by: Xu, Yaodan, et al.
Published: (2026)
CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding
by: Jia, Yuanyuan, et al.
Published: (2026)
by: Jia, Yuanyuan, et al.
Published: (2026)
Similar Items
-
Leveraging Stochastic Depth Training for Adaptive Inference
by: Korol, Guilherme, et al.
Published: (2025) -
Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
by: de Lima, João Paulo Cardoso, et al.
Published: (2025) -
Full-Stack Optimization for CAM-Only DNN Inference
by: de Lima, João Paulo C., et al.
Published: (2024) -
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
by: Zhu, Bingjie, et al.
Published: (2025) -
Count2Multiply: Reliable In-Memory High-Radix Counting
by: de Lima, João Paulo Cardoso, et al.
Published: (2024)