Saved in:
| Main Authors: | Liu, James, Xiao, Guangxuan, Li, Kai, Lee, Jason D., Han, Song, Dao, Tri, Cai, Tianle |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.10193 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
by: Cai, Tianle, et al.
Published: (2024)
by: Cai, Tianle, et al.
Published: (2024)
OneBit: Towards Extremely Low-bit Large Language Models
by: Xu, Yuzhuang, et al.
Published: (2024)
by: Xu, Yuzhuang, et al.
Published: (2024)
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits
by: Steinmetz, Cody, et al.
Published: (2025)
by: Steinmetz, Cody, et al.
Published: (2025)
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
by: Lee, Banseok, et al.
Published: (2025)
by: Lee, Banseok, et al.
Published: (2025)
BitNet Distillation
by: Wu, Xun, et al.
Published: (2025)
by: Wu, Xun, et al.
Published: (2025)
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
by: Zhou, Zikai, et al.
Published: (2025)
by: Zhou, Zikai, et al.
Published: (2025)
Optimizing Mixture of Block Attention
by: Xiao, Guangxuan, et al.
Published: (2025)
by: Xiao, Guangxuan, et al.
Published: (2025)
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation
by: Du, Dayou, et al.
Published: (2024)
by: Du, Dayou, et al.
Published: (2024)
Reward Collapse in Aligning Large Language Models
by: Song, Ziang, et al.
Published: (2023)
by: Song, Ziang, et al.
Published: (2023)
Bit Blasting Probabilistic Programs
by: Garg, Poorva, et al.
Published: (2023)
by: Garg, Poorva, et al.
Published: (2023)
Multi-Bit Distortion-Free Watermarking for Large Language Models
by: Boroujeny, Massieh Kordi, et al.
Published: (2024)
by: Boroujeny, Massieh Kordi, et al.
Published: (2024)
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
by: Hu, Xing, et al.
Published: (2024)
by: Hu, Xing, et al.
Published: (2024)
Bit-Vector CHC Solving for Binary Analysis and Binary Analysis for Bit-Vector CHC Solving
by: Bembenek, Aaron, et al.
Published: (2026)
by: Bembenek, Aaron, et al.
Published: (2026)
Equational Bit-Vector Solving via Strong Gröbner Bases
by: Song, Jiaxin, et al.
Published: (2024)
by: Song, Jiaxin, et al.
Published: (2024)
Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models
by: Liu, Wanlong, et al.
Published: (2025)
by: Liu, Wanlong, et al.
Published: (2025)
Efficient Streaming Language Models with Attention Sinks
by: Xiao, Guangxuan, et al.
Published: (2023)
by: Xiao, Guangxuan, et al.
Published: (2023)
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
by: Chee, Jerry, et al.
Published: (2023)
by: Chee, Jerry, et al.
Published: (2023)
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache
by: Du, Dayou, et al.
Published: (2025)
by: Du, Dayou, et al.
Published: (2025)
Dr. SoW: Density Ratio of Strong-over-weak LLMs for Reducing the Cost of Human Annotation in Preference Tuning
by: Xu, Guangxuan, et al.
Published: (2024)
by: Xu, Guangxuan, et al.
Published: (2024)
BitMar: Low-Bit Multimodal Fusion with Episodic Memory for Edge Devices
by: Aman, Euhid, et al.
Published: (2025)
by: Aman, Euhid, et al.
Published: (2025)
Bit-level BPE: Below the byte boundary
by: Moon, Sangwhan, et al.
Published: (2025)
by: Moon, Sangwhan, et al.
Published: (2025)
XAttention: Block Sparse Attention with Antidiagonal Scoring
by: Xu, Ruyi, et al.
Published: (2025)
by: Xu, Ruyi, et al.
Published: (2025)
Unlocking the Theory Behind Scaling 1-Bit Neural Networks
by: Daliri, Majid, et al.
Published: (2024)
by: Daliri, Majid, et al.
Published: (2024)
Marking: Visual Grading with Highlighting Errors and Annotating Missing Bits
by: Sonkar, Shashank, et al.
Published: (2024)
by: Sonkar, Shashank, et al.
Published: (2024)
To be Continuous, or to be Discrete, Those are Bits of Questions
by: Wang, Yiran, et al.
Published: (2024)
by: Wang, Yiran, et al.
Published: (2024)
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
by: Ben-Zaken, Elad, et al.
Published: (2021)
by: Ben-Zaken, Elad, et al.
Published: (2021)
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
by: Tang, Jiaming, et al.
Published: (2024)
by: Tang, Jiaming, et al.
Published: (2024)
Learning to Prioritize IT Tickets: A Comparative Evaluation of Embedding-based Approaches and Fine-Tuned Transformer Models
by: LÊ, Minh Tri, et al.
Published: (2025)
by: LÊ, Minh Tri, et al.
Published: (2025)
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
by: Zhuang, Shaobin, et al.
Published: (2026)
by: Zhuang, Shaobin, et al.
Published: (2026)
Hardware-Efficient Attention for Fast Decoding
by: Zadouri, Ted, et al.
Published: (2025)
by: Zadouri, Ted, et al.
Published: (2025)
Majority Bit-Aware Watermarking For Large Language Models
by: Xu, Jiahao, et al.
Published: (2025)
by: Xu, Jiahao, et al.
Published: (2025)
FrameQuant: Flexible Low-Bit Quantization for Transformers
by: Adepu, Harshavardhan, et al.
Published: (2024)
by: Adepu, Harshavardhan, et al.
Published: (2024)
BitNet b1.58 2B4T Technical Report
by: Ma, Shuming, et al.
Published: (2025)
by: Ma, Shuming, et al.
Published: (2025)
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
by: Xiao, Guangxuan, et al.
Published: (2022)
by: Xiao, Guangxuan, et al.
Published: (2022)
H1B-KV: Hybrid One-Bit Caches for Memory-Efficient Large Language Model Inference
by: Vejendla, Harshil
Published: (2025)
by: Vejendla, Harshil
Published: (2025)
A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning
by: Song, Chih-Wei, et al.
Published: (2024)
by: Song, Chih-Wei, et al.
Published: (2024)
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
by: Zandieh, Amir, et al.
Published: (2024)
by: Zandieh, Amir, et al.
Published: (2024)
REST: Retrieval-Based Speculative Decoding
by: He, Zhenyu, et al.
Published: (2023)
by: He, Zhenyu, et al.
Published: (2023)
StreamingVLM: Real-Time Understanding for Infinite Video Streams
by: Xu, Ruyi, et al.
Published: (2025)
by: Xu, Ruyi, et al.
Published: (2025)
Similar Items
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
by: Cai, Tianle, et al.
Published: (2024) -
OneBit: Towards Extremely Low-bit Large Language Models
by: Xu, Yuzhuang, et al.
Published: (2024) -
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits
by: Steinmetz, Cody, et al.
Published: (2025) -
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
by: Lee, Banseok, et al.
Published: (2025) -
BitNet Distillation
by: Wu, Xun, et al.
Published: (2025)