Saved in:
| Main Authors: | Lee, Jungi, Lee, Wonbeom, Sim, Jaewoong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.12930 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
by: Lee, Jungi, et al.
Published: (2025)
by: Lee, Jungi, et al.
Published: (2025)
GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering
by: Lee, Junseo, et al.
Published: (2026)
by: Lee, Junseo, et al.
Published: (2026)
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
by: Kim, Taehyun, et al.
Published: (2024)
by: Kim, Taehyun, et al.
Published: (2024)
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
by: Lee, Wonbeom, et al.
Published: (2024)
by: Lee, Wonbeom, et al.
Published: (2024)
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
by: Ma, Shaobo, et al.
Published: (2024)
by: Ma, Shaobo, et al.
Published: (2024)
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
by: Kim, Wonung, et al.
Published: (2025)
by: Kim, Wonung, et al.
Published: (2025)
SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models
by: Yang, Jinho, et al.
Published: (2025)
by: Yang, Jinho, et al.
Published: (2025)
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
by: Zhang, Chengming, et al.
Published: (2024)
by: Zhang, Chengming, et al.
Published: (2024)
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
by: Fu, Yonggan, et al.
Published: (2023)
by: Fu, Yonggan, et al.
Published: (2023)
Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
by: Wu, Ka Wai
Published: (2024)
by: Wu, Ka Wai
Published: (2024)
Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics
by: Afifi, Salma, et al.
Published: (2024)
by: Afifi, Salma, et al.
Published: (2024)
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies
by: Sharma, Amit
Published: (2025)
by: Sharma, Amit
Published: (2025)
FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction
by: Kulp, Gabriel, et al.
Published: (2024)
by: Kulp, Gabriel, et al.
Published: (2024)
A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs
by: Kabir, Ehsan, et al.
Published: (2024)
by: Kabir, Ehsan, et al.
Published: (2024)
Token-Picker: Accelerating Attention in Text Generation with Minimized Memory Transfer via Probability Estimation
by: Park, Junyoung, et al.
Published: (2024)
by: Park, Junyoung, et al.
Published: (2024)
InstantFT: An FPGA-Based Runtime Subsecond Fine-tuning of CNN Models
by: Sugiura, Keisuke, et al.
Published: (2025)
by: Sugiura, Keisuke, et al.
Published: (2025)
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
by: Wolters, Christopher, et al.
Published: (2024)
by: Wolters, Christopher, et al.
Published: (2024)
Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
by: Jang, Hongsun, et al.
Published: (2024)
by: Jang, Hongsun, et al.
Published: (2024)
LLM-Aided Compilation for Tensor Accelerators
by: Hong, Charles, et al.
Published: (2024)
by: Hong, Charles, et al.
Published: (2024)
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
by: Yun, Sungmin, et al.
Published: (2024)
by: Yun, Sungmin, et al.
Published: (2024)
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
by: Lin, Yujun, et al.
Published: (2025)
by: Lin, Yujun, et al.
Published: (2025)
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
by: Koo, Jahyun, et al.
Published: (2024)
by: Koo, Jahyun, et al.
Published: (2024)
GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design
by: You, Haoran, et al.
Published: (2021)
by: You, Haoran, et al.
Published: (2021)
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
by: Shen, Chaoyao, et al.
Published: (2026)
by: Shen, Chaoyao, et al.
Published: (2026)
LLM-USO: Large Language Model-based Universal Sizing Optimizer
by: S, Karthik Somayaji N., et al.
Published: (2025)
by: S, Karthik Somayaji N., et al.
Published: (2025)
APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration
by: Ma, Shaobo, et al.
Published: (2025)
by: Ma, Shaobo, et al.
Published: (2025)
EVA: Accelerating LLM Decoding via an Efficient Vector Quantization Architecture
by: Duan, Bowen, et al.
Published: (2026)
by: Duan, Bowen, et al.
Published: (2026)
SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training
by: Pan, Yunjie, et al.
Published: (2026)
by: Pan, Yunjie, et al.
Published: (2026)
Periodic Online Testing for Sparse Systolic Tensor Arrays
by: Peltekis, Christodoulos, et al.
Published: (2025)
by: Peltekis, Christodoulos, et al.
Published: (2025)
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
by: Kim, Sungbin, et al.
Published: (2025)
by: Kim, Sungbin, et al.
Published: (2025)
Accelerating Diffusion Models for Generative AI Applications with Silicon Photonics
by: Suresh, Tharini, et al.
Published: (2026)
by: Suresh, Tharini, et al.
Published: (2026)
Hardware-Efficient Photonic Tensor Core: Accelerating Deep Neural Networks with Structured Compression
by: Ning, Shupeng, et al.
Published: (2025)
by: Ning, Shupeng, et al.
Published: (2025)
Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts
by: Banasik, Spencer
Published: (2025)
by: Banasik, Spencer
Published: (2025)
LLM4DV: Using Large Language Models for Hardware Test Stimuli Generation
by: Zhang, Zixi, et al.
Published: (2023)
by: Zhang, Zixi, et al.
Published: (2023)
ESACT: An End-to-End Sparse Accelerator for Compute-Intensive Transformers via Local Similarity
by: Liu, Hongxiang, et al.
Published: (2025)
by: Liu, Hongxiang, et al.
Published: (2025)
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
by: Hong, Charles, et al.
Published: (2025)
by: Hong, Charles, et al.
Published: (2025)
ReducedLUT: Table Decomposition with "Don't Care" Conditions
by: Cassidy, Oliver, et al.
Published: (2024)
by: Cassidy, Oliver, et al.
Published: (2024)
DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics
by: Kim, Yoonsung, et al.
Published: (2024)
by: Kim, Yoonsung, et al.
Published: (2024)
Similar Items
-
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
by: Lee, Jungi, et al.
Published: (2025) -
GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering
by: Lee, Junseo, et al.
Published: (2026) -
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models
by: Kim, Taehyun, et al.
Published: (2024) -
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
by: Lee, Wonbeom, et al.
Published: (2024) -
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
by: Ma, Shaobo, et al.
Published: (2024)