:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Singh, Suyash Vardhan, Ahmad, Iftakhar, Andrews, David, Huang, Miaoqing, Downey, Austin R. J., Bakos, Jason D.
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Hardware Architecture
Online-Zugang:	https://arxiv.org/abs/2504.04661
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs
von: Kabir, Ehsan, et al.
Veröffentlicht: (2024)

A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs
von: Kabir, Ehsan, et al.
Veröffentlicht: (2024)

ProTEA: Programmable Transformer Encoder Acceleration on FPGA
von: Kabir, Ehsan, et al.
Veröffentlicht: (2024)

IMAGine: An In-Memory Accelerated GEMV Engine Overlay
von: Kabir, MD Arafat, et al.
Veröffentlicht: (2024)

The BRAM is the Limit: Shattering Myths, Shaping Standards, and Building Scalable PIM Accelerators
von: Kabir, MD Arafat, et al.
Veröffentlicht: (2024)

Switchable Single/Dual Edge Registers for Pipeline Architecture
von: Singh, Suyash Vardhan, et al.
Veröffentlicht: (2024)

CrossNAS: A Cross-Layer Neural Architecture Search Framework for PIM Systems
von: Amin, Md Hasibul, et al.
Veröffentlicht: (2025)

FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators
von: Jia, Shuao, et al.
Veröffentlicht: (2025)

EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology
von: Wu, Qizhe, et al.
Veröffentlicht: (2024)

TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks
von: Bertuletti, Marco, et al.
Veröffentlicht: (2026)

SKYLIGHT: A Scalable Hundred-Channel 3D Photonic In-Memory Tensor Core Architecture for Real-time AI Inference
von: Zhang, Meng, et al.
Veröffentlicht: (2026)

StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs
von: Ye, Hanchen, et al.
Veröffentlicht: (2025)

Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
von: Mallasén, David, et al.
Veröffentlicht: (2023)

A Tensor-Train Decomposition based Compression of LLMs on Group Vector Systolic Accelerator
von: Huang, Sixiao, et al.
Veröffentlicht: (2025)

Holistic Optimization Framework for FPGA Accelerators
von: Pouget, Stéphane, et al.
Veröffentlicht: (2025)

GTA: a new General Tensor Accelerator with Better Area Efficiency and Data Reuse
von: Ai, Chenyang, et al.
Veröffentlicht: (2024)

Accelerating Detailed Routing Convergence through Offline Reinforcement Learning
von: Khan, Afsara, et al.
Veröffentlicht: (2025)

Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory Tensor Manipulation for High-Throughput AI SoC
von: Zhou, Weiyu, et al.
Veröffentlicht: (2025)

Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration
von: Taka, Endri, et al.
Veröffentlicht: (2025)

ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition
von: Zheng, Keran, et al.
Veröffentlicht: (2025)

Error Checking for Sparse Systolic Tensor Arrays
von: Peltekis, Christodoulos, et al.
Veröffentlicht: (2024)

NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
von: Zhou, Zhe, et al.
Veröffentlicht: (2024)

Open-source Stand-Alone Versatile Tensor Accelerator
von: Faure-Gignoux, Anthony, et al.
Veröffentlicht: (2025)

ATiM: Autotuning Tensor Programs for Processing-in-DRAM
von: Shin, Yongwon, et al.
Veröffentlicht: (2024)

ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics
von: Gao, Ruijie, et al.
Veröffentlicht: (2026)

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization
von: Wu, Ka Wai
Veröffentlicht: (2024)

Linear Complexity Fermionic Simulation on Quantum Devices with Hardware Connectivity Constraints
von: Gao, Xiangyu, et al.
Veröffentlicht: (2026)

TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
von: Guan, Yue, et al.
Veröffentlicht: (2026)

Device-Level Optimization Techniques for Solid-State Drives: A Survey
von: Ren, Tianyu, et al.
Veröffentlicht: (2025)

Real Time FPGA Based Transformers & VLMs for Vision Tasks: SOTA Designs and Optimizations
von: Sali, Safa Mohammed, et al.
Veröffentlicht: (2025)

Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity
von: Xue, Zi Yu, et al.
Veröffentlicht: (2023)

Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality
von: Hoornaert, Denis, et al.
Veröffentlicht: (2026)

AME-PIM: Can Memory be Your Next Tensor Accelerator?
von: Venieri, Emanuele, et al.
Veröffentlicht: (2026)

PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance
von: Ji, Shixin, et al.
Veröffentlicht: (2026)

No Redundancy, No Stall: Lightweight Streaming 3D Gaussian Splatting for Real-time Rendering
von: Wei, Linye, et al.
Veröffentlicht: (2025)

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning
von: Shen, Chaoyao, et al.
Veröffentlicht: (2026)

FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training
von: Lu, Jinming, et al.
Veröffentlicht: (2025)

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators
von: Nayak, Nandeeka, et al.
Veröffentlicht: (2023)

Real-time Object Detection and Associated Hardware Accelerators Targeting Autonomous Vehicles: A Review
von: Sali, Safa, et al.
Veröffentlicht: (2025)

Real Time FPGA Based CNNs for Detection, Classification, and Tracking in Autonomous Systems: State of the Art Designs and Optimizations
von: Sali, Safa Mohammed, et al.
Veröffentlicht: (2025)