Saved in:
| Main Author: | Grover, Kabir |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.00942 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Speculative Decoding Across Languages
by: Paudel, Nirajan, et al.
Published: (2026)
by: Paudel, Nirajan, et al.
Published: (2026)
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
by: Li, Guanchen, et al.
Published: (2024)
by: Li, Guanchen, et al.
Published: (2024)
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024)
by: O'Neill, Charles, et al.
Published: (2024)
SqueezeLLM: Dense-and-Sparse Quantization
by: Kim, Sehoon, et al.
Published: (2023)
by: Kim, Sehoon, et al.
Published: (2023)
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)
by: Pan, Bowen, et al.
Published: (2024)
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
by: Chen, Guanjie, et al.
Published: (2024)
by: Chen, Guanjie, et al.
Published: (2024)
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
by: Tao, Leitian, et al.
Published: (2025)
by: Tao, Leitian, et al.
Published: (2025)
Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval
by: Song, Jonghyun, et al.
Published: (2025)
by: Song, Jonghyun, et al.
Published: (2025)
Learn from the Past: Fast Sparse Indexing for Large Language Model Decoding
by: Yao, Feiyu, et al.
Published: (2025)
by: Yao, Feiyu, et al.
Published: (2025)
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
by: Zhao, Siyan, et al.
Published: (2025)
by: Zhao, Siyan, et al.
Published: (2025)
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models
by: He, Wei, et al.
Published: (2024)
by: He, Wei, et al.
Published: (2024)
Probing the Decision Boundaries of In-context Learning in Large Language Models
by: Zhao, Siyan, et al.
Published: (2024)
by: Zhao, Siyan, et al.
Published: (2024)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
by: Israel, Daniel, et al.
Published: (2025)
by: Israel, Daniel, et al.
Published: (2025)
Group Preference Optimization: Few-Shot Alignment of Large Language Models
by: Zhao, Siyan, et al.
Published: (2023)
by: Zhao, Siyan, et al.
Published: (2023)
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
by: Lan, Michael, et al.
Published: (2024)
by: Lan, Michael, et al.
Published: (2024)
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
by: Bansal, Hritik, et al.
Published: (2023)
by: Bansal, Hritik, et al.
Published: (2023)
PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding
by: Koromilas, Panagiotis, et al.
Published: (2026)
by: Koromilas, Panagiotis, et al.
Published: (2026)
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices
by: Song, Chenyang, et al.
Published: (2026)
by: Song, Chenyang, et al.
Published: (2026)
Empirical Analysis of Efficient Fine-Tuning Methods for Large Pre-Trained Language Models
by: Doering, Nigel, et al.
Published: (2024)
by: Doering, Nigel, et al.
Published: (2024)
Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models
by: Messina, Alberto, et al.
Published: (2026)
by: Messina, Alberto, et al.
Published: (2026)
SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization
by: Zhang, Taolin, et al.
Published: (2026)
by: Zhang, Taolin, et al.
Published: (2026)
Sparse Attention Remapping with Clustering for Efficient LLM Decoding on PIM
by: Fan, Zehao, et al.
Published: (2025)
by: Fan, Zehao, et al.
Published: (2025)
Decoding Uncertainty: The Impact of Decoding Strategies for Uncertainty Estimation in Large Language Models
by: Hashimoto, Wataru, et al.
Published: (2025)
by: Hashimoto, Wataru, et al.
Published: (2025)
An Empirical Study of Mamba-based Language Models
by: Waleffe, Roger, et al.
Published: (2024)
by: Waleffe, Roger, et al.
Published: (2024)
How Reliable is Language Model Micro-Benchmarking?
by: Yauney, Gregory, et al.
Published: (2025)
by: Yauney, Gregory, et al.
Published: (2025)
Stability-Weighted Decoding for Diffusion Language Models
by: Wu, Yue, et al.
Published: (2026)
by: Wu, Yue, et al.
Published: (2026)
Learning to Decode Collaboratively with Multiple Language Models
by: Shen, Shannon Zejiang, et al.
Published: (2024)
by: Shen, Shannon Zejiang, et al.
Published: (2024)
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
by: Zhao, Siyan, et al.
Published: (2026)
by: Zhao, Siyan, et al.
Published: (2026)
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
Do Large Language Model Benchmarks Test Reliability?
by: Vendrow, Joshua, et al.
Published: (2025)
by: Vendrow, Joshua, et al.
Published: (2025)
Dense vs Sparse Pretraining at Tiny Scale: Active-Parameter vs Total-Parameter Matching
by: Wael, Abdalrahman
Published: (2026)
by: Wael, Abdalrahman
Published: (2026)
BanglaEmbed: Efficient Sentence Embedding Models for a Low-Resource Language Using Cross-Lingual Distillation Techniques
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)
by: Kabir, Muhammad Rafsan, et al.
Published: (2024)
Layer-wise Representation Dynamics: An Empirical Investigation Across Embedders and Base LLMs
by: Jiang, Jingzhou, et al.
Published: (2026)
by: Jiang, Jingzhou, et al.
Published: (2026)
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
by: Yang, Lijie, et al.
Published: (2024)
by: Yang, Lijie, et al.
Published: (2024)
Transferring Linear Features Across Language Models With Model Stitching
by: Chen, Alan, et al.
Published: (2025)
by: Chen, Alan, et al.
Published: (2025)
Sparse Layers are Critical to Scaling Looped Language Models
by: Lee, Ryan, et al.
Published: (2026)
by: Lee, Ryan, et al.
Published: (2026)
The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws
by: Jin, Tian, et al.
Published: (2025)
by: Jin, Tian, et al.
Published: (2025)
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models
by: Mundra, Nandini, et al.
Published: (2024)
by: Mundra, Nandini, et al.
Published: (2024)
Assessing Adversarial Robustness of Large Language Models: An Empirical Study
by: Yang, Zeyu, et al.
Published: (2024)
by: Yang, Zeyu, et al.
Published: (2024)
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
by: Zhao, Siyan, et al.
Published: (2024)
by: Zhao, Siyan, et al.
Published: (2024)
Similar Items
-
Speculative Decoding Across Languages
by: Paudel, Nirajan, et al.
Published: (2026) -
Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism
by: Li, Guanchen, et al.
Published: (2024) -
Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models
by: O'Neill, Charles, et al.
Published: (2024) -
SqueezeLLM: Dense-and-Sparse Quantization
by: Kim, Sehoon, et al.
Published: (2023) -
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)