:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Zhang, Zixi, Zhang, Cheng, Gao, Xitong, Mullins, Robert D., Constantinides, George A., Zhao, Yiren
Format:	Preprint
Publié:	2024
Sujets:	Machine Learning Computation and Language
Accès en ligne:	https://arxiv.org/abs/2406.14956
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

LQER: Low-Rank Quantization Error Reconstruction for LLMs
par: Zhang, Cheng, et autres
Publié: (2024)

Optimised Grouped-Query Attention Mechanism for Transformers
par: Chen, Yuang, et autres
Publié: (2024)

A3 : an Analytical Low-Rank Approximation Framework for Attention
par: Wong, Jeffrey T. H., et autres
Publié: (2025)

Ensembles of Low-Rank Expert Adapters
par: Li, Yinghao, et autres
Publié: (2025)

Deep Kernel Fusion for Transformers
par: Zhang, Zixi, et autres
Publié: (2026)

Scaling Laws For Mixed Quantization
par: Cao, Zeyu, et autres
Publié: (2024)

Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
par: Song, Guanghui, et autres
Publié: (2025)

LoQT: Low-Rank Adapters for Quantized Pretraining
par: Loeschcke, Sebastian, et autres
Publié: (2024)

Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
par: Shi, Haizhou, et autres
Publié: (2024)

Symbiotic-MoE: Unlocking the Synergy between Generation and Understanding
par: Liu, Xiangyue, et autres
Publié: (2026)

Finer Parameter Steps for Low-Rank PEFT: A Controlled Study with CP Tensor Adapters
par: Wang, Xinjue, et autres
Publié: (2026)

LoRA-Pro: Are Low-Rank Adapters Properly Optimized?
par: Wang, Zhengbo, et autres
Publié: (2024)

Multiple Choice Learning of Low-Rank Adapters for Language Modeling
par: Letzelter, Victor, et autres
Publié: (2025)

Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
par: Muñoz, J. Pablo, et autres
Publié: (2025)

zFLoRA: Zero-Latency Fused Low-Rank Adapters
par: Gowda, Dhananjaya, et autres
Publié: (2025)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
par: Zhang, Cheng, et autres
Publié: (2023)

FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts
par: Wang, Xinyi, et autres
Publié: (2025)

AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
par: Gimenes, Pedro, et autres
Publié: (2025)

Learning Adapter Rank via Symmetry Breaking
par: Doyle, Cooper, et autres
Publié: (2025)

OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy
par: Wang, Haowen, et autres
Publié: (2024)

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks
par: Shen, Hanzhang, et autres
Publié: (2026)

QERA: an Analytical Framework for Quantization Error Reconstruction
par: Zhang, Cheng, et autres
Publié: (2024)

Hardware and Software Platform Inference
par: Zhang, Cheng, et autres
Publié: (2024)

Reasoning: From Reflection to Solution
par: Li, Zixi
Publié: (2025)

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
par: Tang, Chuanyu, et autres
Publié: (2024)

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
par: Zhang, Zixi, et autres
Publié: (2023)

Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models
par: Chen, Yuyan, et autres
Publié: (2024)

DropLoRA: Sparse Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
par: Zhang, Haojie
Publié: (2025)

On the Existence and Behavior of Secondary Attention Sinks
par: Wong, Jeffrey T. H., et autres
Publié: (2025)

LoLA: Low-Rank Linear Attention With Sparse Caching
par: McDermott, Luke, et autres
Publié: (2025)

SARA: Singular-Value Based Adaptive Low-Rank Adaption
par: Gu, Jihao, et autres
Publié: (2024)

Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models
par: Liu, Xinxin, et autres
Publié: (2024)

Watermarking Needs Input Repetition Masking
par: Khachaturov, David, et autres
Publié: (2025)

Personas within Parameters: Fine-Tuning Small Language Models with Low-Rank Adapters to Mimic User Behaviors
par: Thakur, Himanshu, et autres
Publié: (2025)

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition
par: He, Zhengfu, et autres
Publié: (2025)

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning
par: Liu, Yongkang, et autres
Publié: (2026)

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
par: Zhao, Ziyu, et autres
Publié: (2026)

BiDoRA: Bi-level Optimization-Based Weight-Decomposed Low-Rank Adaptation
par: Qin, Peijia, et autres
Publié: (2024)

Shears: Unstructured Sparsity with Neural Low-rank Adapter Search
par: Muñoz, J. Pablo, et autres
Publié: (2024)

Accelerating the Low-Rank Decomposed Models
par: Hajimolahoseini, Habib, et autres
Publié: (2024)