Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Grover, Kabir
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2601.00942
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915704827019264
author	Grover, Kabir
author_facet	Grover, Kabir
contents	The increasing prevalence of sparse Mixture-of-Experts (MoE) architectures in large language models raises important questions regarding their reliability under stochastic decoding. While conditional computation enables substantial gains in computational efficiency, it remains unclear whether the interaction between sparse routing and temperature-based sampling compromises output stability relative to dense architectures. This work investigates whether conditional computation in MoE models amplifies decoding-induced randomness, leading to reduced reliability as temperature increases. We evaluate three representative models: OLMoE-7B (sparse base), Mixtral-8x7B (sparse instruction-tuned), and Qwen2.5-3B (dense instruction-tuned) on deterministic arithmetic reasoning tasks with objectively verifiable answers. Experiments span four decoding configurations, ranging from greedy decoding to T=1.0. Our evaluation encompasses accuracy, format compliance, output consistency across repeated generations, and confidence metrics, totaling 9,360 model generations. Results demonstrate that the sparse instruction-tuned model exhibits stability comparable to the dense instruction-tuned model across all decoding temperatures, while the sparse base model shows systematic degradation as temperature increases. These findings indicate that instruction tuning, rather than architectural sparsity, is the primary determinant of robustness to decoding randomness on deterministic tasks. We discuss the implications of these results for deploying sparse language models in reliability-critical applications, highlighting scenarios in which sparse architectures can be safely adopted without sacrificing output stability.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_00942
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures Grover, Kabir Machine Learning Computation and Language The increasing prevalence of sparse Mixture-of-Experts (MoE) architectures in large language models raises important questions regarding their reliability under stochastic decoding. While conditional computation enables substantial gains in computational efficiency, it remains unclear whether the interaction between sparse routing and temperature-based sampling compromises output stability relative to dense architectures. This work investigates whether conditional computation in MoE models amplifies decoding-induced randomness, leading to reduced reliability as temperature increases. We evaluate three representative models: OLMoE-7B (sparse base), Mixtral-8x7B (sparse instruction-tuned), and Qwen2.5-3B (dense instruction-tuned) on deterministic arithmetic reasoning tasks with objectively verifiable answers. Experiments span four decoding configurations, ranging from greedy decoding to T=1.0. Our evaluation encompasses accuracy, format compliance, output consistency across repeated generations, and confidence metrics, totaling 9,360 model generations. Results demonstrate that the sparse instruction-tuned model exhibits stability comparable to the dense instruction-tuned model across all decoding temperatures, while the sparse base model shows systematic degradation as temperature increases. These findings indicate that instruction tuning, rather than architectural sparsity, is the primary determinant of robustness to decoding randomness on deterministic tasks. We discuss the implications of these results for deploying sparse language models in reliability-critical applications, highlighting scenarios in which sparse architectures can be safely adopted without sacrificing output stability.
title	Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures
topic	Machine Learning Computation and Language
url	https://arxiv.org/abs/2601.00942

Similar Items