:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Toupas, Petros, Yu, Zhewen, Bouganis, Christos-Savvas, Tzovaras, Dimitrios
Format:	Preprint
Published:	2024
Subjects:	Hardware Architecture Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2403.18921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

fpgaHART: A toolflow for throughput-oriented acceleration of 3D CNNs for HAR onto FPGAs
by: Toupas, Petros, et al.
Published: (2023)

FMM-X3D: FPGA-based modeling and mapping of X3D for Human Action Recognition
by: Toupas, Petros, et al.
Published: (2023)

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices
by: Toupas, Petros, et al.
Published: (2023)

ATHEENA: A Toolflow for Hardware Early-Exit Network Automation
by: Biggs, Benjamin, et al.
Published: (2023)

ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition
by: Zheng, Keran, et al.
Published: (2025)

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator
by: Yu, Zhewen, et al.
Published: (2024)

Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs
by: Aggarwal, Shivam, et al.
Published: (2023)

A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)

From Detection to Action Recognition: An Edge-Based Pipeline for Robot Human Perception
by: Toupas, Petros, et al.
Published: (2023)

QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning
by: Wang, Hanrui, et al.
Published: (2022)

Real-Time Object Detection and Classification using YOLO for Edge FPGAs
by: Amin, Rashed Al, et al.
Published: (2025)

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
by: Sun, Hao-Lun, et al.
Published: (2023)

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
by: You, Haoran, et al.
Published: (2022)

Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs
by: Mao, Gang, et al.
Published: (2025)

ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA
by: Lyu, Shengzhe, et al.
Published: (2026)

Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation
by: Swaminathan, Tushar Prasanna, et al.
Published: (2024)

Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication
by: Jaswal, Pragun, et al.
Published: (2025)

Stella Nera: A Differentiable Maddness-Based Hardware Accelerator for Efficient Approximate Matrix Multiplication
by: Schönleber, Jannis, et al.
Published: (2023)

AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization
by: Zhang, Wenlun, et al.
Published: (2025)

Ditto: Accelerating Diffusion Model via Temporal Value Similarity
by: Kim, Sungbin, et al.
Published: (2025)

Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning
by: Walton, Steven
Published: (2025)

CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning
by: Aggarwal, Shivam, et al.
Published: (2023)

Performance Analysis of Edge and In-Sensor AI Processors: A Comparative Review
by: Capogrosso, Luigi, et al.
Published: (2026)

QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention
by: Oh, Hyunwoo, et al.
Published: (2025)

Neuro-Channel Networks: A Multiplication-Free Architecture by Biological Signal Transmission
by: Mete, Emrah, et al.
Published: (2026)

Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs
by: Monachan, Danial, et al.
Published: (2026)

TsetlinWiSARD: On-Chip Training of Weightless Neural Networks using Tsetlin Automata on FPGAs
by: Duan, Shengyu, et al.
Published: (2026)

Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks
by: Wu, Qizhe, et al.
Published: (2024)

Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization
by: Wang, Zhican, et al.
Published: (2025)

Neural Architecture Search of Hybrid Models for NPU-CIM Heterogeneous AR/VR Devices
by: Zhao, Yiwei, et al.
Published: (2024)

RaGNNarok: A Light-Weight Graph Neural Network for Enhancing Radar Point Clouds on Unmanned Ground Vehicles
by: Hunt, David, et al.
Published: (2025)

On Latency Predictors for Neural Architecture Search
by: Akhauri, Yash, et al.
Published: (2024)

Vision-Based Perception for Autonomous Vehicles in Off-Road Environment Using Deep Learning
by: Neto, Nelson Alves Ferreira
Published: (2025)

Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC
by: Leon, Vasileios, et al.
Published: (2024)

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference
by: Manzour, M., et al.
Published: (2025)

TSLA: A Task-Specific Learning Adaptation for Semantic Segmentation on Autonomous Vehicles Platform
by: Liu, Jun, et al.
Published: (2025)

SQ-DM: Accelerating Diffusion Models with Aggressive Quantization and Temporal Sparsity
by: Fan, Zichen, et al.
Published: (2025)

SageAttention2++: A More Efficient Implementation of SageAttention2
by: Zhang, Jintao, et al.
Published: (2025)

FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries
by: Jiang, Yuqi, et al.
Published: (2024)

Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows
by: Posso, Julien, et al.
Published: (2025)