Saved in:
| Main Authors: | Chen, Yuang, Zhang, Cheng, Gao, Xitong, Mullins, Robert D., Constantinides, George A., Zhao, Yiren |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.14963 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024)
by: Zhang, Zixi, et al.
Published: (2024)
LQER: Low-Rank Quantization Error Reconstruction for LLMs
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
Deep Kernel Fusion for Transformers
by: Zhang, Zixi, et al.
Published: (2026)
by: Zhang, Zixi, et al.
Published: (2026)
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
by: Song, Guanghui, et al.
Published: (2025)
by: Song, Guanghui, et al.
Published: (2025)
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)
by: Zhang, Cheng, et al.
Published: (2023)
AMPLE: Event-Driven Accelerator for Mixed-Precision Inference of Graph Neural Networks
by: Gimenes, Pedro, et al.
Published: (2025)
by: Gimenes, Pedro, et al.
Published: (2025)
QERA: an Analytical Framework for Quantization Error Reconstruction
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
Hardware and Software Platform Inference
by: Zhang, Cheng, et al.
Published: (2024)
by: Zhang, Cheng, et al.
Published: (2024)
TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks
by: Shen, Hanzhang, et al.
Published: (2026)
by: Shen, Hanzhang, et al.
Published: (2026)
A3 : an Analytical Low-Rank Approximation Framework for Attention
by: Wong, Jeffrey T. H., et al.
Published: (2025)
by: Wong, Jeffrey T. H., et al.
Published: (2025)
Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
by: Foerster, Hanna, et al.
Published: (2026)
by: Foerster, Hanna, et al.
Published: (2026)
ImpNet: Imperceptible and blackbox-undetectable backdoors in compiled neural networks
by: Clifford, Eleanor, et al.
Published: (2022)
by: Clifford, Eleanor, et al.
Published: (2022)
Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)
by: Cao, Zeyu, et al.
Published: (2024)
Architectural Neural Backdoors from First Principles
by: Langford, Harry, et al.
Published: (2024)
by: Langford, Harry, et al.
Published: (2024)
NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
by: Andronic, Marta, et al.
Published: (2025)
by: Andronic, Marta, et al.
Published: (2025)
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
by: Zhang, Zixi, et al.
Published: (2023)
by: Zhang, Zixi, et al.
Published: (2023)
Locking Machine Learning Models into Hardware
by: Clifford, Eleanor, et al.
Published: (2024)
by: Clifford, Eleanor, et al.
Published: (2024)
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions
by: Andronic, Marta, et al.
Published: (2024)
by: Andronic, Marta, et al.
Published: (2024)
PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference
by: Andronic, Marta, et al.
Published: (2023)
by: Andronic, Marta, et al.
Published: (2023)
Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated
by: Foerster, Hanna, et al.
Published: (2025)
by: Foerster, Hanna, et al.
Published: (2025)
Direction-Preserving Number Representations
by: Zadeh, Bardia, et al.
Published: (2026)
by: Zadeh, Bardia, et al.
Published: (2026)
PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning
by: Andronic, Marta, et al.
Published: (2025)
by: Andronic, Marta, et al.
Published: (2025)
Convergence for Discrete Parameter Update Schemes
by: Wilson, Paul, et al.
Published: (2025)
by: Wilson, Paul, et al.
Published: (2025)
Learning to Optimise Wind Farms with Graph Transformers
by: Li, Siyi, et al.
Published: (2023)
by: Li, Siyi, et al.
Published: (2023)
On the Existence and Behavior of Secondary Attention Sinks
by: Wong, Jeffrey T. H., et al.
Published: (2025)
by: Wong, Jeffrey T. H., et al.
Published: (2025)
ATHEENA: A Toolflow for Hardware Early-Exit Network Automation
by: Biggs, Benjamin, et al.
Published: (2023)
by: Biggs, Benjamin, et al.
Published: (2023)
Sparse Query Attention (SQA): A Computationally Efficient Attention Mechanism with Query Heads Reduction
by: Filipek, Adam
Published: (2025)
by: Filipek, Adam
Published: (2025)
Adversarial Suffix Filtering: a Defense Pipeline for LLMs
by: Khachaturov, David, et al.
Published: (2025)
by: Khachaturov, David, et al.
Published: (2025)
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention
by: Khan, Zohaib, et al.
Published: (2024)
by: Khan, Zohaib, et al.
Published: (2024)
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
by: Chen, Yingfa, et al.
Published: (2025)
by: Chen, Yingfa, et al.
Published: (2025)
Exploring FPGA designs for MX and beyond
by: Samson, Ebby, et al.
Published: (2024)
by: Samson, Ebby, et al.
Published: (2024)
ReducedLUT: Table Decomposition with "Don't Care" Conditions
by: Cassidy, Oliver, et al.
Published: (2024)
by: Cassidy, Oliver, et al.
Published: (2024)
Training with Fewer Bits: Unlocking Edge LLMs Training with Stochastic Rounding
by: Liu, Taowen, et al.
Published: (2025)
by: Liu, Taowen, et al.
Published: (2025)
KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware
by: Nie, Jiayi, et al.
Published: (2026)
by: Nie, Jiayi, et al.
Published: (2026)
Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness
by: Khachaturov, David, et al.
Published: (2024)
by: Khachaturov, David, et al.
Published: (2024)
When and Why Grouping Attention Heads Accelerates Muon Optimization
by: Zhang, Hongtao, et al.
Published: (2026)
by: Zhang, Hongtao, et al.
Published: (2026)
Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task
by: Jian, Yiren, et al.
Published: (2023)
by: Jian, Yiren, et al.
Published: (2023)
Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis
by: Thakur, Anshul, et al.
Published: (2024)
by: Thakur, Anshul, et al.
Published: (2024)
Provably Learning Attention with Queries
by: Bhattamishra, Satwik, et al.
Published: (2026)
by: Bhattamishra, Satwik, et al.
Published: (2026)
Generalized Probabilistic Attention Mechanism in Transformers
by: Heo, DongNyeong, et al.
Published: (2024)
by: Heo, DongNyeong, et al.
Published: (2024)
Similar Items
-
Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024) -
LQER: Low-Rank Quantization Error Reconstruction for LLMs
by: Zhang, Cheng, et al.
Published: (2024) -
Deep Kernel Fusion for Transformers
by: Zhang, Zixi, et al.
Published: (2026) -
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
by: Song, Guanghui, et al.
Published: (2025) -
Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?
by: Zhang, Cheng, et al.
Published: (2023)