Saved in:
| Main Authors: | Kim, Yongwan, Park, Sungchul |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.25813 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
by: Nielsen, Jacob, et al.
Published: (2025)
by: Nielsen, Jacob, et al.
Published: (2025)
BitNet Distillation
by: Wu, Xun, et al.
Published: (2025)
by: Wu, Xun, et al.
Published: (2025)
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
by: Nielsen, Jacob, et al.
Published: (2024)
by: Nielsen, Jacob, et al.
Published: (2024)
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
by: Panfilov, Alexander, et al.
Published: (2026)
by: Panfilov, Alexander, et al.
Published: (2026)
Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering
by: Hong, Sungchul, et al.
Published: (2024)
by: Hong, Sungchul, et al.
Published: (2024)
PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
by: Zhao, Yushu, et al.
Published: (2025)
by: Zhao, Yushu, et al.
Published: (2025)
Bilevel Autoresearch: Meta-Autoresearching Itself
by: Qu, Yaonan, et al.
Published: (2026)
by: Qu, Yaonan, et al.
Published: (2026)
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
by: Lee, Banseok, et al.
Published: (2025)
by: Lee, Banseok, et al.
Published: (2025)
LittleBit-2: Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment
by: Lee, Banseok, et al.
Published: (2026)
by: Lee, Banseok, et al.
Published: (2026)
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
by: Park, Jungwoo, et al.
Published: (2025)
by: Park, Jungwoo, et al.
Published: (2025)
Transcendence: Generative Models Can Outperform The Experts That Train Them
by: Zhang, Edwin, et al.
Published: (2024)
by: Zhang, Edwin, et al.
Published: (2024)
Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training
by: Soltany, Milad, et al.
Published: (2024)
by: Soltany, Milad, et al.
Published: (2024)
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
by: Farhat, Yehya, et al.
Published: (2023)
by: Farhat, Yehya, et al.
Published: (2023)
BitNet b1.58 2B4T Technical Report
by: Ma, Shuming, et al.
Published: (2025)
by: Ma, Shuming, et al.
Published: (2025)
How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts
by: Park, Sumin, et al.
Published: (2025)
by: Park, Sumin, et al.
Published: (2025)
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
by: Xie, Luyuan, et al.
Published: (2025)
by: Xie, Luyuan, et al.
Published: (2025)
Uncovering Intra-expert Activation Sparsity for Efficient Mixture-of-Expert Model Execution
by: Park, Jongseok, et al.
Published: (2026)
by: Park, Jongseok, et al.
Published: (2026)
Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks
by: Kim, Jeongmo, et al.
Published: (2025)
by: Kim, Jeongmo, et al.
Published: (2025)
Decentralized Adversarial Training over Graphs
by: Cao, Ying, et al.
Published: (2023)
by: Cao, Ying, et al.
Published: (2023)
DAMEL: Dual-Axis Multi-Expert Learning for Class-Imbalanced Learning
by: Lee, Hyuck, et al.
Published: (2026)
by: Lee, Hyuck, et al.
Published: (2026)
Attn-QAT: 4-Bit Attention With Quantization-Aware Training
by: Zhang, Peiyuan, et al.
Published: (2026)
by: Zhang, Peiyuan, et al.
Published: (2026)
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
by: Gernigon, Cédric, et al.
Published: (2024)
by: Gernigon, Cédric, et al.
Published: (2024)
MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models
by: Chamma, Ahmad, et al.
Published: (2025)
by: Chamma, Ahmad, et al.
Published: (2025)
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging
by: Bertolissi, Ryo, et al.
Published: (2025)
by: Bertolissi, Ryo, et al.
Published: (2025)
BitNet a4.8: 4-bit Activations for 1-bit LLMs
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
Forecasting VIX using interpretable Kolmogorov-Arnold networks
by: Cho, So-Yoon, et al.
Published: (2025)
by: Cho, So-Yoon, et al.
Published: (2025)
Decentralized Autoregressive Generation
by: Maschan, Stepan, et al.
Published: (2026)
by: Maschan, Stepan, et al.
Published: (2026)
Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
by: Zhang, Tianao, et al.
Published: (2025)
by: Zhang, Tianao, et al.
Published: (2025)
Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning
by: Kainuma, Haruki, et al.
Published: (2025)
by: Kainuma, Haruki, et al.
Published: (2025)
Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation
by: Kim, Donghwan, et al.
Published: (2026)
by: Kim, Donghwan, et al.
Published: (2026)
FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
by: Park, Sanghyeon, et al.
Published: (2025)
by: Park, Sanghyeon, et al.
Published: (2025)
Loop Corrections to the Training Error and Generalization Gap of Random Feature Models
by: Kim, Taeyoung
Published: (2026)
by: Kim, Taeyoung
Published: (2026)
Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
by: Nikolic, Strahinja, et al.
Published: (2025)
by: Nikolic, Strahinja, et al.
Published: (2025)
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)
Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation
by: Tian, Hanlin, et al.
Published: (2024)
by: Tian, Hanlin, et al.
Published: (2024)
When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
by: Nielsen, Jacob, et al.
Published: (2024)
by: Nielsen, Jacob, et al.
Published: (2024)
Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling
by: Mirtaheri, Mehrnoosh, et al.
Published: (2025)
by: Mirtaheri, Mehrnoosh, et al.
Published: (2025)
Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation
by: Park, Seonghyeon, et al.
Published: (2026)
by: Park, Seonghyeon, et al.
Published: (2026)
PFedDST: Personalized Federated Learning with Decentralized Selection Training
by: Fan, Mengchen, et al.
Published: (2025)
by: Fan, Mengchen, et al.
Published: (2025)
HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
by: Zhang, Jinhao Zhang Yunquan, et al.
Published: (2026)
by: Zhang, Jinhao Zhang Yunquan, et al.
Published: (2026)
Similar Items
-
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
by: Nielsen, Jacob, et al.
Published: (2025) -
BitNet Distillation
by: Wu, Xun, et al.
Published: (2025) -
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
by: Nielsen, Jacob, et al.
Published: (2024) -
Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
by: Panfilov, Alexander, et al.
Published: (2026) -
Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering
by: Hong, Sungchul, et al.
Published: (2024)