:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lee, Dongyoung, Choi, Seungkyu, Chang, Ik Joon
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Machine Learning
Online-Zugang:	https://arxiv.org/abs/2501.13331
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

SERQ: Saliency-Aware Low-Rank Error Reconstruction for LLM Quantization
von: Park, Yeonsik, et al.
Veröffentlicht: (2026)

Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models
von: Kim, Donghoon, et al.
Veröffentlicht: (2025)

Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder
von: Kim, Seunghwan, et al.
Veröffentlicht: (2024)

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
von: Zhao, Yilong, et al.
Veröffentlicht: (2023)

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
von: Jia, Jinda, et al.
Veröffentlicht: (2024)

LO-BCQ: Block Clustered Quantization for 4-bit (W4A4) LLM Inference
von: Elangovan, Reena, et al.
Veröffentlicht: (2025)

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
von: IslamBouli, Beshr, et al.
Veröffentlicht: (2026)

RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy
von: Lee, Geonho, et al.
Veröffentlicht: (2024)

4bit-Quantization in Vector-Embedding for RAG
von: Jeong, Taehee
Veröffentlicht: (2025)

OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization
von: Li, Zhikai, et al.
Veröffentlicht: (2026)

ICQuant: Index Coding enables Low-bit LLM Quantization
von: Li, Xinlin, et al.
Veröffentlicht: (2025)

Training-free LLM Verification via Recycling Few-shot Examples
von: Lee, Dongseok, et al.
Veröffentlicht: (2025)

CCQ: Convolutional Code for Extreme Low-bit Quantization in LLMs
von: Zhou, Zhaojing, et al.
Veröffentlicht: (2025)

Occam's Razor is Only as Sharp as Your ELBO
von: Harvey, Ethan, et al.
Veröffentlicht: (2026)

SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization
von: Bai, Runsheng, et al.
Veröffentlicht: (2024)

Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignment
von: Kim, Dongyoung, et al.
Veröffentlicht: (2024)

Toward Architecture-Agnostic Local Control of Posterior Collapse in VAEs
von: Song, Hyunsoo, et al.
Veröffentlicht: (2025)

EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation
von: Zhang, Shu-Hao, et al.
Veröffentlicht: (2026)

A Geometric Modeling of Occam's Razor in Deep Learning
von: Sun, Ke, et al.
Veröffentlicht: (2019)

Improving Block-Wise LLM Quantization by 4-bit Block-Wise Optimal Float (BOF4): Analysis and Variations
von: Blumenberg, Patrick, et al.
Veröffentlicht: (2025)

LoRaQ: Optimized Low Rank Approximation for 4-bit Quantization
von: Bouquet, Yann, et al.
Veröffentlicht: (2026)

Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models
von: Vetter, Julius, et al.
Veröffentlicht: (2025)

yProv4ML: Effortless Provenance Tracking for Machine Learning Systems
von: Padovani, Gabriele, et al.
Veröffentlicht: (2025)

OTTER: Effortless Label Distribution Adaptation of Zero-shot Models
von: Shin, Changho, et al.
Veröffentlicht: (2024)

LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
von: Zhang, Xuan, et al.
Veröffentlicht: (2024)

RL's Razor: Why Online Reinforcement Learning Forgets Less
von: Shenfeld, Idan, et al.
Veröffentlicht: (2025)

1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
von: Gao, Chang, et al.
Veröffentlicht: (2024)

FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration
von: Baek, Daehyeon, et al.
Veröffentlicht: (2025)

On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
von: Fan, Chenghao, et al.
Veröffentlicht: (2024)

SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size
von: Xia, Junhao, et al.
Veröffentlicht: (2025)

Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models
von: Heo, Jung Hwan, et al.
Veröffentlicht: (2023)

DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers
von: Sharify, Sayeh, et al.
Veröffentlicht: (2026)

MergeQuant: Accurate 4-bit Static Quantization of Large Language Models by Channel-wise Calibration
von: Wang, Jinguang, et al.
Veröffentlicht: (2025)

ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
von: Xu, Bingxin, et al.
Veröffentlicht: (2025)

ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
von: Liu, Zechun, et al.
Veröffentlicht: (2025)

Effortless Active Labeling for Long-Term Test-Time Adaptation
von: Wang, Guowei, et al.
Veröffentlicht: (2025)

FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration
von: Yi, Donghyeon, et al.
Veröffentlicht: (2024)

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks
von: Shahverdi, Vahid, et al.
Veröffentlicht: (2025)

RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
von: Tang, Hanlin, et al.
Veröffentlicht: (2024)

Shaving Weights with Occam's Razor: Bayesian Sparsification for Neural Networks Using the Marginal Likelihood
von: Dhahri, Rayen, et al.
Veröffentlicht: (2024)