Saved in:
| Main Authors: | Zeng, Zhanpeng, Davies, Michael, Pulijala, Pranav, Sankaralingam, Karthikeyan, Singh, Vikas |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.07221 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
by: Zeng, Zhanpeng, et al.
Published: (2024)
by: Zeng, Zhanpeng, et al.
Published: (2024)
Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World
by: Sankaralingam, Karthikeyan
Published: (2026)
by: Sankaralingam, Karthikeyan
Published: (2026)
NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction
by: Wadle, Shayne, et al.
Published: (2025)
by: Wadle, Shayne, et al.
Published: (2025)
FrameQuant: Flexible Low-Bit Quantization for Transformers
by: Adepu, Harshavardhan, et al.
Published: (2024)
by: Adepu, Harshavardhan, et al.
Published: (2024)
Empirical Bayes Conformal Prediction for Vision and Language Models
by: Zeng, Jiapeng, et al.
Published: (2026)
by: Zeng, Jiapeng, et al.
Published: (2026)
Spark Transformer: Reactivating Sparsity in FFN and Attention
by: You, Chong, et al.
Published: (2025)
by: You, Chong, et al.
Published: (2025)
FFN Fusion: Rethinking Sequential Computation in Large Language Models
by: Bercovich, Akhiad, et al.
Published: (2025)
by: Bercovich, Akhiad, et al.
Published: (2025)
Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers
by: Smithline, Gabriel, et al.
Published: (2026)
by: Smithline, Gabriel, et al.
Published: (2026)
The Impact Market to Save Conference Peer Review: Decoupling Dissemination and Credentialing
by: Sankaralingam, Karthikeyan
Published: (2025)
by: Sankaralingam, Karthikeyan
Published: (2025)
Pedagogically Motivated and Composable Open-Source RISC-V Processors for Computer Science Education
by: McDougall, Ian, et al.
Published: (2025)
by: McDougall, Ian, et al.
Published: (2025)
Causal inference and model explainability tools for retail
by: Gupta, Pranav, et al.
Published: (2025)
by: Gupta, Pranav, et al.
Published: (2025)
LOOKAT: Lookup-Optimized Key-Attention for Memory-Efficient Transformers
by: Karmore, Aryan
Published: (2026)
by: Karmore, Aryan
Published: (2026)
Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
by: Zhao, Yanjun, et al.
Published: (2024)
by: Zhao, Yanjun, et al.
Published: (2024)
Mixture of Lookup Experts
by: Jie, Shibo, et al.
Published: (2025)
by: Jie, Shibo, et al.
Published: (2025)
UMoE: Unifying Attention and FFN with Shared Experts
by: Yang, Yuanhang, et al.
Published: (2025)
by: Yang, Yuanhang, et al.
Published: (2025)
Mixture of Lookup Key-Value Experts
by: Wang, Zongcheng
Published: (2025)
by: Wang, Zongcheng
Published: (2025)
Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity
by: Gautam, Aayush, et al.
Published: (2026)
by: Gautam, Aayush, et al.
Published: (2026)
LIMINAL: Exploring The Frontiers of LLM Decode Performance
by: Davies, Michael, et al.
Published: (2025)
by: Davies, Michael, et al.
Published: (2025)
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations
by: Yang, Yongyi, et al.
Published: (2026)
by: Yang, Yongyi, et al.
Published: (2026)
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025)
by: Pei, Zehua, et al.
Published: (2025)
L$^3$: Large Lookup Layers
by: Tseng, Albert, et al.
Published: (2026)
by: Tseng, Albert, et al.
Published: (2026)
Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads
by: Song, Chendong, et al.
Published: (2026)
by: Song, Chendong, et al.
Published: (2026)
Hawk: Accurate and Fast Privacy-Preserving Machine Learning Using Secure Lookup Table Computation
by: Saleem, Hamza, et al.
Published: (2024)
by: Saleem, Hamza, et al.
Published: (2024)
Kitsune: Enabling Dataflow Execution on GPUs
by: Davies, Michael, et al.
Published: (2025)
by: Davies, Michael, et al.
Published: (2025)
SAHM: State-Aware Heterogeneous Multicore for Single-Thread Performance
by: Wadle, Shayne, et al.
Published: (2025)
by: Wadle, Shayne, et al.
Published: (2025)
Analytical Exploration of Spatial Audio Cues: A Differentiable Multi-Sphere Scattering Model
by: Galougah, Siminfar Samakoush, et al.
Published: (2026)
by: Galougah, Siminfar Samakoush, et al.
Published: (2026)
Lookup multivariate Kolmogorov-Arnold Networks
by: Pozdnyakov, Sergey, et al.
Published: (2025)
by: Pozdnyakov, Sergey, et al.
Published: (2025)
Variable feature weighted fuzzy k-means algorithm for high dimensional data
by: Singh, Vikas, et al.
Published: (2019)
by: Singh, Vikas, et al.
Published: (2019)
Enhancing Machine Learning for Imbalanced Medical Data: A Quantum-Inspired Approach to Synthetic Oversampling (QI-SMOTE)
by: Kashtriya, Vikas, et al.
Published: (2025)
by: Kashtriya, Vikas, et al.
Published: (2025)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
by: Jaiswal, Ajay, et al.
Published: (2024)
by: Jaiswal, Ajay, et al.
Published: (2024)
TabConv: Low-Computation CNN Inference via Table Lookups
by: Gupta, Neelesh, et al.
Published: (2024)
by: Gupta, Neelesh, et al.
Published: (2024)
Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget
by: Harding, Michael O., et al.
Published: (2026)
by: Harding, Michael O., et al.
Published: (2026)
CAPA: Contribution-Aware Pruning and FFN Approximation for Efficient Large Vision-Language Models
by: Jha, Samyak, et al.
Published: (2026)
by: Jha, Samyak, et al.
Published: (2026)
RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks
by: Liu, Ningyuan, et al.
Published: (2025)
by: Liu, Ningyuan, et al.
Published: (2025)
Perturbation Probing: A Two-Pass-per-Prompt Diagnostic for FFN Behavioral Circuits in Aligned LLMs
by: Liu, Hongliang, et al.
Published: (2026)
by: Liu, Hongliang, et al.
Published: (2026)
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity
by: Song, Chenyang, et al.
Published: (2025)
by: Song, Chenyang, et al.
Published: (2025)
Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs
by: Kaliaperumal, Pranav Kumar
Published: (2026)
by: Kaliaperumal, Pranav Kumar
Published: (2026)
Beyond Static Policies: Exploring Dynamic Policy Selection for Single-Thread Performance Optimization
by: Zhang, Yanxin, et al.
Published: (2026)
by: Zhang, Yanxin, et al.
Published: (2026)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
by: Li, Bingrui, et al.
Published: (2024)
by: Li, Bingrui, et al.
Published: (2024)
Score-based Causal Representation Learning: Linear and General Transformations
by: Varıcı, Burak, et al.
Published: (2024)
by: Varıcı, Burak, et al.
Published: (2024)
Similar Items
-
IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
by: Zeng, Zhanpeng, et al.
Published: (2024) -
Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World
by: Sankaralingam, Karthikeyan
Published: (2026) -
NeuroScalar: A Deep Learning Framework for Fast, Accurate, and In-the-Wild Cycle-Level Performance Prediction
by: Wadle, Shayne, et al.
Published: (2025) -
FrameQuant: Flexible Low-Bit Quantization for Transformers
by: Adepu, Harshavardhan, et al.
Published: (2024) -
Empirical Bayes Conformal Prediction for Vision and Language Models
by: Zeng, Jiapeng, et al.
Published: (2026)