:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kim, Yongwan, Park, Sungchul
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2603.25813
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
by: Nielsen, Jacob, et al.
Published: (2025)

BitNet Distillation
by: Wu, Xun, et al.
Published: (2025)

BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
by: Nielsen, Jacob, et al.
Published: (2024)

Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs
by: Panfilov, Alexander, et al.
Published: (2026)

Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering
by: Hong, Sungchul, et al.
Published: (2024)

PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference
by: Zhao, Yushu, et al.
Published: (2025)

Bilevel Autoresearch: Meta-Autoresearching Itself
by: Qu, Yaonan, et al.
Published: (2026)

LittleBit: Ultra Low-Bit Quantization via Latent Factorization
by: Lee, Banseok, et al.
Published: (2025)

LittleBit-2: Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment
by: Lee, Banseok, et al.
Published: (2026)

Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
by: Park, Jungwoo, et al.
Published: (2025)

Transcendence: Generative Models Can Outperform The Experts That Train Them
by: Zhang, Edwin, et al.
Published: (2024)

Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training
by: Soltany, Milad, et al.
Published: (2024)

Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
by: Farhat, Yehya, et al.
Published: (2023)

BitNet b1.58 2B4T Technical Report
by: Ma, Shuming, et al.
Published: (2025)

How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts
by: Park, Sumin, et al.
Published: (2025)

dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
by: Xie, Luyuan, et al.
Published: (2025)

Uncovering Intra-expert Activation Sparsity for Efficient Mixture-of-Expert Model Execution
by: Park, Jongseok, et al.
Published: (2026)

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks
by: Kim, Jeongmo, et al.
Published: (2025)

Decentralized Adversarial Training over Graphs
by: Cao, Ying, et al.
Published: (2023)

DAMEL: Dual-Axis Multi-Expert Learning for Class-Imbalanced Learning
by: Lee, Hyuck, et al.
Published: (2026)

Attn-QAT: 4-Bit Attention With Quantization-Aware Training
by: Zhang, Peiyuan, et al.
Published: (2026)

AdaQAT: Adaptive Bit-Width Quantization-Aware Training
by: Gernigon, Cédric, et al.
Published: (2024)

MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models
by: Chamma, Ahmad, et al.
Published: (2025)

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging
by: Bertolissi, Ryo, et al.
Published: (2025)

BitNet a4.8: 4-bit Activations for 1-bit LLMs
by: Wang, Hongyu, et al.
Published: (2024)

Forecasting VIX using interpretable Kolmogorov-Arnold networks
by: Cho, So-Yoon, et al.
Published: (2025)

Decentralized Autoregressive Generation
by: Maschan, Stepan, et al.
Published: (2026)

Quant-dLLM: Post-Training Extreme Low-Bit Quantization for Diffusion Large Language Models
by: Zhang, Tianao, et al.
Published: (2025)

Load-Aware Training Scheduling for Model Circulation-based Decentralized Federated Learning
by: Kainuma, Haruki, et al.
Published: (2025)

Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation
by: Kim, Donghwan, et al.
Published: (2026)

FRAIN to Train: A Fast-and-Reliable Solution for Decentralized Federated Learning
by: Park, Sanghyeon, et al.
Published: (2025)

Loop Corrections to the Training Error and Generalization Gap of Random Feature Models
by: Kim, Taeyoung
Published: (2026)

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts
by: Nikolic, Strahinja, et al.
Published: (2025)

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)

Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation
by: Tian, Hanlin, et al.
Published: (2024)

When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
by: Nielsen, Jacob, et al.
Published: (2024)

Towards Improving Long-Tail Entity Predictions in Temporal Knowledge Graphs through Global Similarity and Weighted Sampling
by: Mirtaheri, Mehrnoosh, et al.
Published: (2025)

Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation
by: Park, Seonghyeon, et al.
Published: (2026)

PFedDST: Personalized Federated Learning with Decentralized Selection Training
by: Fan, Mengchen, et al.
Published: (2025)

HeRo-Q: A General Framework for Stable Low Bit Quantization via Hessian Conditioning
by: Zhang, Jinhao Zhang Yunquan, et al.
Published: (2026)