Saved in:
| Main Authors: | Ma, Shuming, Wang, Hongyu, Ma, Lingxiao, Wang, Lei, Wang, Wenhui, Huang, Shaohan, Dong, Li, Wang, Ruiping, Xue, Jilong, Wei, Furu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.17764 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BitNet a4.8: 4-bit Activations for 1-bit LLMs
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
by: Wang, Hongyu, et al.
Published: (2025)
by: Wang, Hongyu, et al.
Published: (2025)
BitNet b1.58 2B4T Technical Report
by: Ma, Shuming, et al.
Published: (2025)
by: Ma, Shuming, et al.
Published: (2025)
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
by: Wang, Hongyu, et al.
Published: (2024)
by: Wang, Hongyu, et al.
Published: (2024)
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
by: Wang, Jinheng, et al.
Published: (2024)
by: Wang, Jinheng, et al.
Published: (2024)
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity
by: Zhang, Di, et al.
Published: (2026)
by: Zhang, Di, et al.
Published: (2026)
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation
by: Wang, Hongyu, et al.
Published: (2025)
by: Wang, Hongyu, et al.
Published: (2025)
BitNet Distillation
by: Wu, Xun, et al.
Published: (2025)
by: Wu, Xun, et al.
Published: (2025)
You Only Cache Once: Decoder-Decoder Architectures for Language Models
by: Sun, Yutao, et al.
Published: (2024)
by: Sun, Yutao, et al.
Published: (2024)
MH-MoE: Multi-Head Mixture-of-Experts
by: Huang, Shaohan, et al.
Published: (2024)
by: Huang, Shaohan, et al.
Published: (2024)
Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
by: Yuan, Zhengqing, et al.
Published: (2024)
by: Yuan, Zhengqing, et al.
Published: (2024)
Adapting Large Language Models to Domains via Reading Comprehension
by: Cheng, Daixuan, et al.
Published: (2023)
by: Cheng, Daixuan, et al.
Published: (2023)
Multimodal Latent Language Modeling with Next-Token Diffusion
by: Sun, Yutao, et al.
Published: (2024)
by: Sun, Yutao, et al.
Published: (2024)
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
The Era of Agentic Organization: Learning to Organize with Language Models
by: Chi, Zewen, et al.
Published: (2025)
by: Chi, Zewen, et al.
Published: (2025)
RedStone: Curating General, Code, Math, and QA Data for Large Language Models
by: Chang, Yaoyao, et al.
Published: (2024)
by: Chang, Yaoyao, et al.
Published: (2024)
1.58-bit FLUX
by: Yang, Chenglin, et al.
Published: (2024)
by: Yang, Chenglin, et al.
Published: (2024)
LongReasonArena: A Long Reasoning Benchmark for Large Language Models
by: Ding, Jiayu, et al.
Published: (2025)
by: Ding, Jiayu, et al.
Published: (2025)
Textual Aesthetics in Large Language Models
by: Jiang, Lingjie, et al.
Published: (2024)
by: Jiang, Lingjie, et al.
Published: (2024)
When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
by: Nielsen, Jacob, et al.
Published: (2024)
by: Nielsen, Jacob, et al.
Published: (2024)
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs
by: Wang, Jinheng, et al.
Published: (2025)
by: Wang, Jinheng, et al.
Published: (2025)
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
by: Mo, Zhiwen, et al.
Published: (2024)
by: Mo, Zhiwen, et al.
Published: (2024)
Large Search Model: Redefining Search Stack in the Era of LLMs
by: Wang, Liang, et al.
Published: (2023)
by: Wang, Liang, et al.
Published: (2023)
Thinking Augmented Pre-training
by: Wang, Liang, et al.
Published: (2025)
by: Wang, Liang, et al.
Published: (2025)
Auto-ICL: In-Context Learning without Human Supervision
by: Yang, Jinghan, et al.
Published: (2023)
by: Yang, Jinghan, et al.
Published: (2023)
BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing
by: Kawamura, Masaya, et al.
Published: (2025)
by: Kawamura, Masaya, et al.
Published: (2025)
Black-Box On-Policy Distillation of Large Language Models
by: Ye, Tianzhu, et al.
Published: (2025)
by: Ye, Tianzhu, et al.
Published: (2025)
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits
by: Steinmetz, Cody, et al.
Published: (2025)
by: Steinmetz, Cody, et al.
Published: (2025)
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
by: Pan, Xichen, et al.
Published: (2023)
by: Pan, Xichen, et al.
Published: (2023)
KOSMOS-2.5: A Multimodal Literate Model
by: Lv, Tengchao, et al.
Published: (2023)
by: Lv, Tengchao, et al.
Published: (2023)
On-Policy Context Distillation for Language Models
by: Ye, Tianzhu, et al.
Published: (2026)
by: Ye, Tianzhu, et al.
Published: (2026)
WaferLLM: Large Language Model Inference at Wafer Scale
by: He, Congjie, et al.
Published: (2025)
by: He, Congjie, et al.
Published: (2025)
Mixture of LoRA Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
by: Nielsen, Jacob, et al.
Published: (2025)
by: Nielsen, Jacob, et al.
Published: (2025)
Online Experiential Learning for Language Models
by: Ye, Tianzhu, et al.
Published: (2026)
by: Ye, Tianzhu, et al.
Published: (2026)
Universal YOCO for Efficient Depth Scaling
by: Sun, Yutao, et al.
Published: (2026)
by: Sun, Yutao, et al.
Published: (2026)
BitROM: Weight Reload-Free CiROM Architecture Towards Billion-Parameter 1.58-bit LLM Inference
by: Zhang, Wenlun, et al.
Published: (2025)
by: Zhang, Wenlun, et al.
Published: (2025)
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
by: Yang, Wenkai, et al.
Published: (2025)
by: Yang, Wenkai, et al.
Published: (2025)
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
by: Ren, Shuhuai, et al.
Published: (2025)
by: Ren, Shuhuai, et al.
Published: (2025)
Similar Items
-
BitNet a4.8: 4-bit Activations for 1-bit LLMs
by: Wang, Hongyu, et al.
Published: (2024) -
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
by: Wang, Hongyu, et al.
Published: (2025) -
BitNet b1.58 2B4T Technical Report
by: Ma, Shuming, et al.
Published: (2025) -
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
by: Wang, Hongyu, et al.
Published: (2024) -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
by: Wang, Jinheng, et al.
Published: (2024)