Saved in:
| Main Authors: | Metz, C., Bichler, O., Dupret, A. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.06237 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
by: Rogers, Ethan G., et al.
Published: (2025)
by: Rogers, Ethan G., et al.
Published: (2025)
INEUS: Iterative Neural Solver for High-Dimensional PIDEs
by: Dupret, Jean-Loup, et al.
Published: (2026)
by: Dupret, Jean-Loup, et al.
Published: (2026)
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
by: Hao, Yongchang, et al.
Published: (2024)
by: Hao, Yongchang, et al.
Published: (2024)
Inference-friendly Graph Compression for Graph Neural Networks
by: Fan, Yangxin, et al.
Published: (2025)
by: Fan, Yangxin, et al.
Published: (2025)
Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
Efficient Decoder Scaling Strategy for Neural Routing Solvers
by: Luo, Qing, et al.
Published: (2026)
by: Luo, Qing, et al.
Published: (2026)
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
by: Arriola, Marianne, et al.
Published: (2025)
by: Arriola, Marianne, et al.
Published: (2025)
Efficient Model Compression for Bayesian Neural Networks
by: Saha, Diptarka, et al.
Published: (2024)
by: Saha, Diptarka, et al.
Published: (2024)
Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
by: Li, Xueyan, et al.
Published: (2026)
by: Li, Xueyan, et al.
Published: (2026)
Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
by: Maisonnave, Lucas, et al.
Published: (2025)
by: Maisonnave, Lucas, et al.
Published: (2025)
ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
by: Kaufmann, Timo, et al.
Published: (2025)
by: Kaufmann, Timo, et al.
Published: (2025)
Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference
by: Zhou, Xuwen, et al.
Published: (2026)
by: Zhou, Xuwen, et al.
Published: (2026)
Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference
by: Simonds, Toby
Published: (2025)
by: Simonds, Toby
Published: (2025)
Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)
by: Yubeaton, Patrick, et al.
Published: (2025)
Nudging: Inference-time Alignment of LLMs via Guided Decoding
by: Fei, Yu, et al.
Published: (2024)
by: Fei, Yu, et al.
Published: (2024)
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference
by: Tang, Xiaojuan, et al.
Published: (2025)
by: Tang, Xiaojuan, et al.
Published: (2025)
Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling
by: Gomes, Carlos, et al.
Published: (2024)
by: Gomes, Carlos, et al.
Published: (2024)
FDC: Fast KV Dimensionality Compression for Efficient LLM Inference
by: Zhang, Zeyu, et al.
Published: (2024)
by: Zhang, Zeyu, et al.
Published: (2024)
Score $\times$ Decoder: A Unified View of Unsupervised Inference-Time Scaling for Hallucination Mitigation
by: Cheng, Yun-Chen, et al.
Published: (2026)
by: Cheng, Yun-Chen, et al.
Published: (2026)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
by: Welleck, Sean, et al.
Published: (2024)
by: Welleck, Sean, et al.
Published: (2024)
Fast Inference via Hierarchical Speculative Decoding
by: Mohri, Clara, et al.
Published: (2025)
by: Mohri, Clara, et al.
Published: (2025)
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
by: Ryu, Hyun, et al.
Published: (2024)
by: Ryu, Hyun, et al.
Published: (2024)
GraphPI: Efficient Protein Inference with Graph Neural Networks
by: Ma, Zheng, et al.
Published: (2026)
by: Ma, Zheng, et al.
Published: (2026)
Early-Exit with Class Exclusion for Efficient Inference of Neural Networks
by: Wang, Jingcun, et al.
Published: (2023)
by: Wang, Jingcun, et al.
Published: (2023)
A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks
by: Mentzelopoulos, Georgios, et al.
Published: (2025)
by: Mentzelopoulos, Georgios, et al.
Published: (2025)
Energy-Efficient Wireless LLM Inference via Uncertainty and Importance-Aware Speculative Decoding
by: Park, Jihoon, et al.
Published: (2025)
by: Park, Jihoon, et al.
Published: (2025)
DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference
by: Xia, Xiang, et al.
Published: (2026)
by: Xia, Xiang, et al.
Published: (2026)
BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference
by: Wu, Xiaoyou, et al.
Published: (2026)
by: Wu, Xiaoyou, et al.
Published: (2026)
An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling
by: Kim, Yuriy, et al.
Published: (2025)
by: Kim, Yuriy, et al.
Published: (2025)
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
by: Sanyal, Arnab, et al.
Published: (2025)
by: Sanyal, Arnab, et al.
Published: (2025)
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
by: Neelam, Sanjit, et al.
Published: (2025)
by: Neelam, Sanjit, et al.
Published: (2025)
Neural Network Compression for Reinforcement Learning Tasks
by: Ivanov, Dmitry A., et al.
Published: (2024)
by: Ivanov, Dmitry A., et al.
Published: (2024)
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)
by: Dong, Harry, et al.
Published: (2024)
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration
by: Xiang, Maoyang, et al.
Published: (2025)
by: Xiang, Maoyang, et al.
Published: (2025)
NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
by: Andronic, Marta, et al.
Published: (2025)
by: Andronic, Marta, et al.
Published: (2025)
List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression
by: Rowan, Joseph, et al.
Published: (2025)
by: Rowan, Joseph, et al.
Published: (2025)
Adaptive Error-Bounded Hierarchical Matrices for Efficient Neural Network Compression
by: Mango, John, et al.
Published: (2024)
by: Mango, John, et al.
Published: (2024)
Generative Binary Memory: Pseudo-Replay Class-Incremental Learning on Binarized Embeddings
by: Basso-Bert, Yanis, et al.
Published: (2025)
by: Basso-Bert, Yanis, et al.
Published: (2025)
Towards Experience Replay for Class-Incremental Learning in Fully-Binary Networks
by: Basso-Bert, Yanis, et al.
Published: (2025)
by: Basso-Bert, Yanis, et al.
Published: (2025)
Similar Items
-
Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
by: Rogers, Ethan G., et al.
Published: (2025) -
INEUS: Iterative Neural Solver for High-Dimensional PIDEs
by: Dupret, Jean-Loup, et al.
Published: (2026) -
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
by: Hao, Yongchang, et al.
Published: (2024) -
Inference-friendly Graph Compression for Graph Neural Networks
by: Fan, Yangxin, et al.
Published: (2025) -
Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)