:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Cuyckens, Stef, Antonio, Ryan, Fang, Chao, Verhelst, Marian
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Hardware Architecture Machine Learning
Online-Zugang:	https://arxiv.org/abs/2511.05503
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU Integration
von: Cuyckens, Stef, et al.
Veröffentlicht: (2025)

Efficient Precision-Scalable Hardware for Microscaling (MX) Processing in Robotics Learning
von: Cuyckens, Stef, et al.
Veröffentlicht: (2025)

P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats
von: Chen, Yuzong, et al.
Veröffentlicht: (2025)

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
von: Fang, Chao, et al.
Veröffentlicht: (2024)

MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration
von: Zhao, Shirui, et al.
Veröffentlicht: (2025)

FSL-HDnn: A 5.7 TOPS/W End-to-end Few-shot Learning Classifier Accelerator with Feature Extraction and Hyperdimensional Computing
von: Yang, Haichao, et al.
Veröffentlicht: (2024)

Towards Efficient Hyperdimensional Computing Using Photonics
von: Fayza, Farbin, et al.
Veröffentlicht: (2023)

Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration
von: Geens, Robin, et al.
Veröffentlicht: (2025)

DataMaestro: A Versatile and Efficient Data Streaming Engine Bringing Decoupled Memory Access To Dataflow Accelerators
von: Yi, Xiaoling, et al.
Veröffentlicht: (2025)

Clo-HDnn: A 4.66 TFLOPS/W and 3.78 TOPS/W Continual On-Device Learning Accelerator with Energy-efficient Hyperdimensional Computing via Progressive Search
von: Song, Chang Eun, et al.
Veröffentlicht: (2025)

ESACT: An End-to-End Sparse Accelerator for Compute-Intensive Transformers via Local Similarity
von: Liu, Hongxiang, et al.
Veröffentlicht: (2025)

Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention
von: Geens, Robin, et al.
Veröffentlicht: (2025)

MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures
von: Kang, Do Yeong, et al.
Veröffentlicht: (2025)

Enabling Efficient Hardware Acceleration of Hybrid Vision Transformer (ViT) Networks at the Edge
von: Dumoulin, Joren, et al.
Veröffentlicht: (2025)

Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference
von: Geens, Robin, et al.
Veröffentlicht: (2026)

APT-LLM: Exploiting Arbitrary-Precision Tensor Core Computing for LLM Acceleration
von: Ma, Shaobo, et al.
Veröffentlicht: (2025)

A 16 nm 1.60TOPS/W High Utilization DNN Accelerator with 3D Spatial Data Reuse and Efficient Shared Memory Access
von: Yi, Xiaoling, et al.
Veröffentlicht: (2026)

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs
von: de Lima, João Paulo Cardoso, et al.
Veröffentlicht: (2025)

Enabling Unstructured Sparse Acceleration on Structured Sparse Accelerators
von: Jeong, Geonhwa, et al.
Veröffentlicht: (2024)

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
von: Wu, Ka Wai
Veröffentlicht: (2024)

FedUHD: Unsupervised Federated Learning using Hyperdimensional Computing
von: Lee, You Hak, et al.
Veröffentlicht: (2025)

FLAASH: Flexible Accelerator Architecture for Sparse High-Order Tensor Contraction
von: Kulp, Gabriel, et al.
Veröffentlicht: (2024)

XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs
von: Kong, Fanchen, et al.
Veröffentlicht: (2025)

How to keep pushing ML accelerator performance? Know your rooflines!
von: Verhelst, Marian, et al.
Veröffentlicht: (2025)

HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning
von: Chen, Hanning, et al.
Veröffentlicht: (2024)

Efficient Implementation of LinearUCB through Algorithmic Improvements and Vector Computing Acceleration for Embedded Learning Systems
von: Angioli, Marco, et al.
Veröffentlicht: (2025)

Accelerating Computer Architecture Simulation through Machine Learning
von: Ali, Wajid, et al.
Veröffentlicht: (2024)

Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators
von: Symons, Arne, et al.
Veröffentlicht: (2022)

Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing
von: Afifi, S., et al.
Veröffentlicht: (2026)

ElasticAI: Creating and Deploying Energy-Efficient Deep Learning Accelerator for Pervasive Computing
von: Qian, Chao, et al.
Veröffentlicht: (2024)

Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration
von: Fang, Jiaxun, et al.
Veröffentlicht: (2025)

OpenGeMM: A High-Utilization GeMM Accelerator Generator with Lightweight RISC-V Control and Tight Memory Coupling
von: Yi, Xiaoling, et al.
Veröffentlicht: (2024)

Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators
von: Zhou, Wenyong, et al.
Veröffentlicht: (2025)

U-SWIM: Universal Selective Write-Verify for Computing-in-Memory Neural Accelerators
von: Yan, Zheyu, et al.
Veröffentlicht: (2023)

An Analog and Digital Hybrid Attention Accelerator for Transformers with Charge-based In-memory Computing
von: Moradifirouzabadi, Ashkan, et al.
Veröffentlicht: (2024)

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores
von: Ma, Shaobo, et al.
Veröffentlicht: (2024)

An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems
von: Antonio, Ryan Albert, et al.
Veröffentlicht: (2025)

Pack my weights and run! Minimizing overheads for in-memory computing accelerators
von: Houshmand, Pouya, et al.
Veröffentlicht: (2024)

Optical Computing for Deep Neural Network Acceleration: Foundations, Recent Developments, and Emerging Directions
von: Pasricha, Sudeep
Veröffentlicht: (2024)

VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers
von: Wang, Run, et al.
Veröffentlicht: (2025)