:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Metz, C., Bichler, O., Dupret, A.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.06237
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
by: Rogers, Ethan G., et al.
Published: (2025)

INEUS: Iterative Neural Solver for High-Dimensional PIDEs
by: Dupret, Jean-Loup, et al.
Published: (2026)

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
by: Hao, Yongchang, et al.
Published: (2024)

Inference-friendly Graph Compression for Graph Neural Networks
by: Fan, Yangxin, et al.
Published: (2025)

Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)

Efficient Decoder Scaling Strategy for Neural Routing Solvers
by: Luo, Qing, et al.
Published: (2026)

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
by: Arriola, Marianne, et al.
Published: (2025)

Efficient Model Compression for Bayesian Neural Networks
by: Saha, Diptarka, et al.
Published: (2024)

Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
by: Li, Xueyan, et al.
Published: (2026)

Gradual Binary Search and Dimension Expansion : A general method for activation quantization in LLMs
by: Maisonnave, Lucas, et al.
Published: (2025)

ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning
by: Kaufmann, Timo, et al.
Published: (2025)

Calibrated Speculative Decoding: Frequency-Guided Candidate Selection for Efficient Inference
by: Zhou, Xuwen, et al.
Published: (2026)

Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference
by: Simonds, Toby
Published: (2025)

Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference
by: Yubeaton, Patrick, et al.
Published: (2025)

Nudging: Inference-time Alignment of LLMs via Guided Decoding
by: Fei, Yu, et al.
Published: (2024)

Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference
by: Tang, Xiaojuan, et al.
Published: (2025)

Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling
by: Gomes, Carlos, et al.
Published: (2024)

FDC: Fast KV Dimensionality Compression for Efficient LLM Inference
by: Zhang, Zeyu, et al.
Published: (2024)

Score $\times$ Decoder: A Unified View of Unsupervised Inference-Time Scaling for Hallucination Mitigation
by: Cheng, Yun-Chen, et al.
Published: (2026)

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
by: Welleck, Sean, et al.
Published: (2024)

Fast Inference via Hierarchical Speculative Decoding
by: Mohri, Clara, et al.
Published: (2025)

Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
by: Ryu, Hyun, et al.
Published: (2024)

GraphPI: Efficient Protein Inference with Graph Neural Networks
by: Ma, Zheng, et al.
Published: (2026)

Early-Exit with Class Exclusion for Efficient Inference of Neural Networks
by: Wang, Jingcun, et al.
Published: (2023)

A Scalable, Causal, and Energy Efficient Framework for Neural Decoding with Spiking Neural Networks
by: Mentzelopoulos, Georgios, et al.
Published: (2025)

Energy-Efficient Wireless LLM Inference via Uncertainty and Importance-Aware Speculative Decoding
by: Park, Jihoon, et al.
Published: (2025)

DepCap: Adaptive Block-Wise Parallel Decoding for Efficient Diffusion LM Inference
by: Xia, Xiang, et al.
Published: (2026)

BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference
by: Wu, Xiaoyou, et al.
Published: (2026)

An Efficient Compression of Deep Neural Network Checkpoints Based on Prediction and Context Modeling
by: Kim, Yuriy, et al.
Published: (2025)

EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
by: Sanyal, Arnab, et al.
Published: (2025)

SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
by: Neelam, Sanjit, et al.
Published: (2025)

Neural Network Compression for Reinforcement Learning Tasks
by: Ivanov, Dmitry A., et al.
Published: (2024)

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
by: Dong, Harry, et al.
Published: (2024)

On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration
by: Xiang, Maoyang, et al.
Published: (2025)

NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
by: Andronic, Marta, et al.
Published: (2025)

List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression
by: Rowan, Joseph, et al.
Published: (2025)

Adaptive Error-Bounded Hierarchical Matrices for Efficient Neural Network Compression
by: Mango, John, et al.
Published: (2024)

Generative Binary Memory: Pseudo-Replay Class-Incremental Learning on Binarized Embeddings
by: Basso-Bert, Yanis, et al.
Published: (2025)

Towards Experience Replay for Class-Incremental Learning in Fully-Binary Networks
by: Basso-Bert, Yanis, et al.
Published: (2025)