:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Yuang, Zhang, Cheng, Gao, Xitong, Mullins, Robert D., Constantinides, George A., Zhao, Yiren
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2406.14963
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024)

LQER: Low-Rank Quantization Error Reconstruction for LLMs
by: Zhang, Cheng, et al.
Published: (2024)

Deep Kernel Fusion for Transformers
by: Zhang, Zixi, et al.
Published: (2026)

Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
by: Song, Guanghui, et al.
Published: (2025)

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)

AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
by: Gimenes, Pedro, et al.
Published: (2025)

QERA: an Analytical Framework for Quantization Error Reconstruction
by: Zhang, Cheng, et al.
Published: (2024)

Hardware and Software Platform Inference
by: Zhang, Cheng, et al.
Published: (2024)

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks
by: Shen, Hanzhang, et al.
Published: (2026)

A3 : an Analytical Low-Rank Approximation Framework for Attention
by: Wong, Jeffrey T. H., et al.
Published: (2025)

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
by: Foerster, Hanna, et al.
Published: (2026)

ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks
by: Clifford, Eleanor, et al.
Published: (2022)

Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)

Architectural Neural Backdoors from First Principles
by: Langford, Harry, et al.
Published: (2024)

NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
by: Andronic, Marta, et al.
Published: (2025)

LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
by: Zhang, Zixi, et al.
Published: (2023)

Locking Machine Learning Models into Hardware
by: Clifford, Eleanor, et al.
Published: (2024)

NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions
by: Andronic, Marta, et al.
Published: (2024)

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
by: Andronic, Marta, et al.
Published: (2023)

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
by: Foerster, Hanna, et al.
Published: (2025)

Direction-Preserving Number Representations
by: Zadeh, Bardia, et al.
Published: (2026)

PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning
by: Andronic, Marta, et al.
Published: (2025)

Convergence for Discrete Parameter Update Schemes
by: Wilson, Paul, et al.
Published: (2025)

Learning to Optimise Wind Farms with Graph Transformers
by: Li, Siyi, et al.
Published: (2023)

On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)

ATHEENA: A Toolflow for Hardware Early-Exit Network Automation
by: Biggs, Benjamin, et al.
Published: (2023)

Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
by: Filipek, Adam
Published: (2025)

Adversarial Suffix Filtering: a Defense Pipeline for LLMs
by: Khachaturov, David, et al.
Published: (2025)

Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
by: Khan, Zohaib, et al.
Published: (2024)

Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)

Exploring FPGA designs for MX and beyond
by: Samson, Ebby, et al.
Published: (2024)

ReducedLUT: Table Decomposition with "Don't Care" Conditions
by: Cassidy, Oliver, et al.
Published: (2024)

Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding
by: Liu, Taowen, et al.
Published: (2025)

KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware
by: Nie, Jiayi, et al.
Published: (2026)

Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness
by: Khachaturov, David, et al.
Published: (2024)

When and Why Grouping Attention Heads Accelerates Muon Optimization
by: Zhang, Hongtao, et al.
Published: (2026)

Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task
by: Jian, Yiren, et al.
Published: (2023)

Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis
by: Thakur, Anshul, et al.
Published: (2024)

Provably Learning Attention with Queries
by: Bhattamishra, Satwik, et al.
Published: (2026)

Generalized Probabilistic Attention Mechanism in Transformers
by: Heo, DongNyeong, et al.
Published: (2024)