Guardado en:
| Autores principales: | Edalati, Ali, Ghaffari, Alireza, Nejad, Mahsa Ghazvini, Hou, Lu, Chen, Boxing, Asgharian, Masoud, Nia, Vahid Partovi |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2405.15025 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
Mitigating Outlier Activations in Low-Precision Fine-Tuning of Language Models
por: Ghaffari, Alireza, et al.
Publicado: (2023)
por: Ghaffari, Alireza, et al.
Publicado: (2023)
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
por: Ghaffari, Alireza, et al.
Publicado: (2024)
por: Ghaffari, Alireza, et al.
Publicado: (2024)
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach
por: Ghaffari, Alireza, et al.
Publicado: (2025)
por: Ghaffari, Alireza, et al.
Publicado: (2025)
Tiny Noise-Robust Voice Activity Detector for Voice Assistants
por: Asl, Hamed Jafarzadeh, et al.
Publicado: (2025)
por: Asl, Hamed Jafarzadeh, et al.
Publicado: (2025)
MoKA: Mixture of Kronecker Adapters
por: Sadeghi, Mohammadreza, et al.
Publicado: (2025)
por: Sadeghi, Mohammadreza, et al.
Publicado: (2025)
Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning
por: Gu, Yu, et al.
Publicado: (2026)
por: Gu, Yu, et al.
Publicado: (2026)
HyWA: Hypernetwork Weight Adapting Personalized Voice Activity Detection
por: Nejad, Mahsa Ghazvini, et al.
Publicado: (2025)
por: Nejad, Mahsa Ghazvini, et al.
Publicado: (2025)
PoTPTQ: A Two-step Power-of-Two Post-training for LLMs
por: Wang, Xinyu, et al.
Publicado: (2025)
por: Wang, Xinyu, et al.
Publicado: (2025)
Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers
por: Lu, Yiwei, et al.
Publicado: (2024)
por: Lu, Yiwei, et al.
Publicado: (2024)
On the Impact of Calibration Data in Post-training Quantization and Pruning
por: Williams, Miles, et al.
Publicado: (2023)
por: Williams, Miles, et al.
Publicado: (2023)
Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry
por: Yazdanpourmoghadam, Samira, et al.
Publicado: (2026)
por: Yazdanpourmoghadam, Samira, et al.
Publicado: (2026)
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
por: Lin, Haokun, et al.
Publicado: (2025)
por: Lin, Haokun, et al.
Publicado: (2025)
Visualizing Spatial Point Clouds: A Task-Oriented Taxonomy
por: Partovi, Mahsa, et al.
Publicado: (2025)
por: Partovi, Mahsa, et al.
Publicado: (2025)
Efficient Post-training Quantization with FP8 Formats
por: Shen, Haihao, et al.
Publicado: (2023)
por: Shen, Haihao, et al.
Publicado: (2023)
QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning
por: Rajabzadeh, Hossein, et al.
Publicado: (2024)
por: Rajabzadeh, Hossein, et al.
Publicado: (2024)
From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling
por: Rahimzadeh, Vahid, et al.
Publicado: (2025)
por: Rahimzadeh, Vahid, et al.
Publicado: (2025)
Towards Accurate Post-training Quantization for Diffusion Models
por: Wang, Changyuan, et al.
Publicado: (2023)
por: Wang, Changyuan, et al.
Publicado: (2023)
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
por: Xiao, Guangxuan, et al.
Publicado: (2022)
por: Xiao, Guangxuan, et al.
Publicado: (2022)
Demystifying Domain-adaptive Post-training for Financial LLMs
por: Ke, Zixuan, et al.
Publicado: (2025)
por: Ke, Zixuan, et al.
Publicado: (2025)
K-Quantization and its Impact on Output Performance
por: Davidsson, Robin Baki, et al.
Publicado: (2026)
por: Davidsson, Robin Baki, et al.
Publicado: (2026)
ReGLA: Refining Gated Linear Attention
por: Lu, Peng, et al.
Publicado: (2025)
por: Lu, Peng, et al.
Publicado: (2025)
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
por: Dadgarnia, Alireza, et al.
Publicado: (2026)
por: Dadgarnia, Alireza, et al.
Publicado: (2026)
Tensor Train Recurrent Network Language Model Prediction
por: Alejandro Murua‐Sazo, et al.
Publicado: (2025)
por: Alejandro Murua‐Sazo, et al.
Publicado: (2025)
Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression
por: Metel, Michael R., et al.
Publicado: (2024)
por: Metel, Michael R., et al.
Publicado: (2024)
Accelerated Gradient Methods for Sparse Statistical Learning with Nonconvex Penalties
por: Yang, Kai, et al.
Publicado: (2020)
por: Yang, Kai, et al.
Publicado: (2020)
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
por: Metel, Michael R., et al.
Publicado: (2024)
por: Metel, Michael R., et al.
Publicado: (2024)
SLaNC: Static LayerNorm Calibration
por: Salmani, Mahsa, et al.
Publicado: (2024)
por: Salmani, Mahsa, et al.
Publicado: (2024)
STAGE: Simplified Text-Attributed Graph Embeddings Using Pre-trained LLMs
por: Zolnai-Lucas, Aaron, et al.
Publicado: (2024)
por: Zolnai-Lucas, Aaron, et al.
Publicado: (2024)
On the importance of Data Scale in Pretraining Arabic Language Models
por: Ghaddar, Abbas, et al.
Publicado: (2024)
por: Ghaddar, Abbas, et al.
Publicado: (2024)
Integral Transformer: Denoising Attention, Not Too Much Not Too Little
por: Kobyzev, Ivan, et al.
Publicado: (2025)
por: Kobyzev, Ivan, et al.
Publicado: (2025)
BOSCH: Black-Box Binary Optimization for Short-Context Attention-Head Selection in LLMs
por: Ghaddar, Abbas, et al.
Publicado: (2026)
por: Ghaddar, Abbas, et al.
Publicado: (2026)
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference
por: Kavehzadeh, Parsa, et al.
Publicado: (2023)
por: Kavehzadeh, Parsa, et al.
Publicado: (2023)
Towards Accurate Post-training Quantization for Reparameterized Models
por: Zhang, Luoming, et al.
Publicado: (2024)
por: Zhang, Luoming, et al.
Publicado: (2024)
Safety Verification for Evasive Collision Avoidance in Autonomous Vehicles with Enhanced Resolutions
por: Arab, Aliasghar, et al.
Publicado: (2024)
por: Arab, Aliasghar, et al.
Publicado: (2024)
Accurate KV Cache Quantization with Outlier Tokens Tracing
por: Su, Yi, et al.
Publicado: (2025)
por: Su, Yi, et al.
Publicado: (2025)
Assessing Influential Observations in Pain Prediction using fMRI Data
por: Zhang, Dongliang, et al.
Publicado: (2024)
por: Zhang, Dongliang, et al.
Publicado: (2024)
Detection of Multiple Influential Observations on Model Selection
por: Zhang, Dongliang, et al.
Publicado: (2024)
por: Zhang, Dongliang, et al.
Publicado: (2024)
Transferable Post-training via Inverse Value Learning
por: Lu, Xinyu, et al.
Publicado: (2024)
por: Lu, Xinyu, et al.
Publicado: (2024)
On Predicting the Post-training Potential of Pre-trained LLMs
por: Li, Xiaoyuan, et al.
Publicado: (2026)
por: Li, Xiaoyuan, et al.
Publicado: (2026)
On the Importance of a Multi-Scale Calibration for Quantization
por: Son, Seungwoo, et al.
Publicado: (2026)
por: Son, Seungwoo, et al.
Publicado: (2026)
Ejemplares similares
-
Mitigating Outlier Activations in Low-Precision Fine-Tuning of Language Models
por: Ghaffari, Alireza, et al.
Publicado: (2023) -
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs
por: Ghaffari, Alireza, et al.
Publicado: (2024) -
Rethinking Post-Training Quantization: Introducing a Statistical Pre-Calibration Approach
por: Ghaffari, Alireza, et al.
Publicado: (2025) -
Tiny Noise-Robust Voice Activity Detector for Voice Assistants
por: Asl, Hamed Jafarzadeh, et al.
Publicado: (2025) -
MoKA: Mixture of Kronecker Adapters
por: Sadeghi, Mohammadreza, et al.
Publicado: (2025)