Saved in:
| Main Authors: | Huang, Ning-Chi, Chang, Chi-Chih, Lin, Wei-Cheng, Taka, Endri, Marculescu, Diana, Wu, Kai-Chiang |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.09708 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
by: Taka, Endri, et al.
Published: (2025)
by: Taka, Endri, et al.
Published: (2025)
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs
by: Taka, Endri, et al.
Published: (2024)
by: Taka, Endri, et al.
Published: (2024)
NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU
by: Ma, Cong, et al.
Published: (2025)
by: Ma, Cong, et al.
Published: (2025)
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
by: Chiang, Hung-Yueh, et al.
Published: (2024)
by: Chiang, Hung-Yueh, et al.
Published: (2024)
Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs
by: Taka, Endri, et al.
Published: (2025)
by: Taka, Endri, et al.
Published: (2025)
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
by: Chiang, Hung-Yueh, et al.
Published: (2025)
by: Chiang, Hung-Yueh, et al.
Published: (2025)
GAMA: High-Performance GEMM Acceleration on AMD Versal ML-Optimized AI Engines
by: Mhatre, Kaustubh, et al.
Published: (2025)
by: Mhatre, Kaustubh, et al.
Published: (2025)
QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models
by: Chi, Tien-Yu, et al.
Published: (2025)
by: Chi, Tien-Yu, et al.
Published: (2025)
Spatial Re-parameterization for N:M Sparsity
by: Zhang, Yuxin, et al.
Published: (2023)
by: Zhang, Yuxin, et al.
Published: (2023)
UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
by: Chiang, Hung-Yueh, et al.
Published: (2025)
by: Chiang, Hung-Yueh, et al.
Published: (2025)
V"Mean"ba: Visual State Space Models only need 1 hidden dimension
by: Chi, Tien-Yu, et al.
Published: (2024)
by: Chi, Tien-Yu, et al.
Published: (2024)
DARE: Diffusion Language Model Activation Reuse for Efficient Inference
by: Frumkin, Natalia, et al.
Published: (2026)
by: Frumkin, Natalia, et al.
Published: (2026)
Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers
by: Bambhaniya, Abhimanyu Rajeshkumar, et al.
Published: (2024)
by: Bambhaniya, Abhimanyu Rajeshkumar, et al.
Published: (2024)
Efficient Column-Wise N:M Pruning on RISC-V CPU
by: Chu, Chi-Wei, et al.
Published: (2025)
by: Chu, Chi-Wei, et al.
Published: (2025)
MaxQ: Multi-Axis Query for N:M Sparsity Network
by: Xiang, Jingyang, et al.
Published: (2023)
by: Xiang, Jingyang, et al.
Published: (2023)
Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
by: Chmiel, Brian, et al.
Published: (2022)
by: Chmiel, Brian, et al.
Published: (2022)
Toward Efficient Permutation for Hierarchical N:M Sparsity on GPUs
by: Yu, Seungmin, et al.
Published: (2024)
by: Yu, Seungmin, et al.
Published: (2024)
Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
by: Wang, Pei-Shuo, et al.
Published: (2025)
by: Wang, Pei-Shuo, et al.
Published: (2025)
E-Sparse: Boosting the Large Language Model Inference through Entropy-based N:M Sparsity
by: Li, Yun, et al.
Published: (2023)
by: Li, Yun, et al.
Published: (2023)
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
by: Ramachandran, Akshat, et al.
Published: (2025)
by: Ramachandran, Akshat, et al.
Published: (2025)
SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search
by: Chiang, Hung-Yueh, et al.
Published: (2024)
by: Chiang, Hung-Yueh, et al.
Published: (2024)
Amber Pruner: Leveraging N:M Activation Sparsity for Efficient Prefill in Large Language Models
by: An, Tai, et al.
Published: (2025)
by: An, Tai, et al.
Published: (2025)
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
by: Frumkin, Natalia, et al.
Published: (2023)
by: Frumkin, Natalia, et al.
Published: (2023)
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on LLMs
by: Sun, Yan, et al.
Published: (2025)
by: Sun, Yan, et al.
Published: (2025)
Motivating Next-Gen Accelerators with Flexible (N:M) Activation Sparsity via Benchmarking Lightweight Post-Training Sparsification Approaches
by: Alanova, Shirin, et al.
Published: (2025)
by: Alanova, Shirin, et al.
Published: (2025)
xKV: Cross-Layer KV-Cache Compression via Aligned Singular Vector Extraction
by: Chang, Chi-Chih, et al.
Published: (2025)
by: Chang, Chi-Chih, et al.
Published: (2025)
Can Asymmetric Tile Buffering Be Beneficial?
by: Wang, Chengyue, et al.
Published: (2025)
by: Wang, Chengyue, et al.
Published: (2025)
ELANA: A Simple Energy and Latency Analyzer for LLMs
by: Chiang, Hung-Yueh, et al.
Published: (2025)
by: Chiang, Hung-Yueh, et al.
Published: (2025)
ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers
by: Hsu, Chih-Chung, et al.
Published: (2026)
by: Hsu, Chih-Chung, et al.
Published: (2026)
Palu: Compressing KV-Cache with Low-Rank Projection
by: Chang, Chi-Chih, et al.
Published: (2024)
by: Chang, Chi-Chih, et al.
Published: (2024)
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
by: Liang, Feng, et al.
Published: (2022)
by: Liang, Feng, et al.
Published: (2022)
Similarity Trajectories: Linking Sampling Process to Artifacts in Diffusion-Generated Images
by: Menn, Dennis, et al.
Published: (2024)
by: Menn, Dennis, et al.
Published: (2024)
Accelerating Diffusion Transformers with Token-wise Feature Caching
by: Zou, Chang, et al.
Published: (2024)
by: Zou, Chang, et al.
Published: (2024)
P‐144: Ultra‐High Brightness Quantum‐Dot Light‐Emitting Diodes with ZnO Nanoparticles Charge‐Control Layer
by: Chih-Jung Chen, et al.
Published: (2025)
by: Chih-Jung Chen, et al.
Published: (2025)
On Asymmetric Optimization of Reasoning and Perception in Vision-Language Model Post-Training
by: Wu, Xueqing, et al.
Published: (2026)
by: Wu, Xueqing, et al.
Published: (2026)
Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
by: Mahmud, Tanvir, et al.
Published: (2024)
by: Mahmud, Tanvir, et al.
Published: (2024)
Determining Layer-wise Sparsity for Large Language Models Through a Theoretical Perspective
by: Huang, Weizhong, et al.
Published: (2025)
by: Huang, Weizhong, et al.
Published: (2025)
DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
by: Tang, Zhenheng, et al.
Published: (2025)
by: Tang, Zhenheng, et al.
Published: (2025)
Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling
by: Frumkin, Natalia, et al.
Published: (2025)
by: Frumkin, Natalia, et al.
Published: (2025)
Geo-OLM: Enabling Sustainable Earth Observation Studies with Cost-Efficient Open Language Models & State-Driven Workflows
by: Stamoulis, Dimitrios, et al.
Published: (2025)
by: Stamoulis, Dimitrios, et al.
Published: (2025)
Similar Items
-
Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
by: Taka, Endri, et al.
Published: (2025) -
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs
by: Taka, Endri, et al.
Published: (2024) -
NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU
by: Ma, Cong, et al.
Published: (2025) -
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
by: Chiang, Hung-Yueh, et al.
Published: (2024) -
Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs
by: Taka, Endri, et al.
Published: (2025)