:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Erhardt, Jack, Li, Ziang, Pinkham, Reid, Berkovich, Andrew, Zhang, Zhengya
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Hardware Architecture
Online Access:	https://arxiv.org/abs/2502.09528
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference
by: Oztas, Ali Emre, et al.
Published: (2026)

Vision Transformer Computation and Resilience for Dynamic Inference
by: Sreedhar, Kavya, et al.
Published: (2022)

Dedicated Inference Engine and Binary-Weight Neural Networks for Lightweight Instance Segmentation
by: Chen, Tse-Wei, et al.
Published: (2025)

Physically Grounded Monocular Depth via Nanophotonic Wavefront Prompting
by: Li, Bingxuan, et al.
Published: (2025)

Using GUI Agent for Electronic Design Automation
by: Li, Chunyi, et al.
Published: (2025)

Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting
by: Jo, Joongho, et al.
Published: (2024)

ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA
by: Lyu, Shengzhe, et al.
Published: (2026)

Gen-NeRF: Efficient and Generalizable Neural Radiance Fields via Algorithm-Hardware Co-Design
by: Fu, Yonggan, et al.
Published: (2023)

Ternary-Input Binary-Weight CNN Accelerator Design for Miniature Object Classification System with Query-Driven Spatial DVS
by: Li, Yuyang, et al.
Published: (2025)

Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration
by: Oh, Changhun, et al.
Published: (2025)

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN
by: Zhang, Baoheng, et al.
Published: (2024)

GS-TG: 3D Gaussian Splatting Accelerator with Tile Grouping for Reducing Redundant Sorting while Preserving Rasterization Efficiency
by: Jo, Joongho, et al.
Published: (2025)

BILLNET: A Binarized Conv3D-LSTM Network with Logic-gated residual architecture for hardware-efficient video inference
by: Nguyen, Van Thien, et al.
Published: (2025)

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
by: You, Haoran, et al.
Published: (2022)

CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference
by: Sadeghi, Mohammad Erfan, et al.
Published: (2024)

GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering
by: Lee, Junseo, et al.
Published: (2026)

TinyIceNet: Low-Power SAR Sea Ice Segmentation for On-Board FPGA Inference
by: Koutayni, Mhd Rashed Al, et al.
Published: (2026)

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
by: Sun, Hao-Lun, et al.
Published: (2023)

AppSign: Multi-level Approximate Computing for Real-Time Traffic Sign Recognition in Autonomous Vehicles
by: Omidian, Fatemeh, et al.
Published: (2024)

Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation
by: Carrigg, Kieran, et al.
Published: (2026)

ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration
by: Lee, Hangyeol, et al.
Published: (2026)

Primitive-Driven Acceleration of Hyperdimensional Computing for Real-Time Image Classification
by: Parikh, Dhruv, et al.
Published: (2026)

Vision Transformers on the Edge: A Comprehensive Survey of Model Compression and Acceleration Strategies
by: Saha, Shaibal, et al.
Published: (2025)

TIMERIPPLE: Accelerating vDiTs by Understanding the Spatio-Temporal Correlations in Latent Space
by: Miao, Wenxuan, et al.
Published: (2025)

MVQ:Towards Efficient DNN Compression and Acceleration with Masked Vector Quantization
by: Li, Shuaiting, et al.
Published: (2024)

Real-Time Object Detection and Classification using YOLO for Edge FPGAs
by: Amin, Rashed Al, et al.
Published: (2025)

FG-Attn: Leveraging Fine-Grained Sparsity In Diffusion Transformers
by: Durvasula, Sankeerth, et al.
Published: (2025)

A Parameterizable Convolution Accelerator for Embedded Deep Learning Applications
by: Mousouliotis, Panagiotis, et al.
Published: (2026)

Efficient stereo matching on embedded GPUs with zero-means cross correlation
by: Chang, Qiong, et al.
Published: (2022)

SpNeRF: Memory Efficient Sparse Volumetric Neural Rendering Accelerator for Edge Devices
by: Zhang, Yipu, et al.
Published: (2025)

LoRA-Edge: Tensor-Train-Assisted LoRA for Practical CNN Fine-Tuning on Edge Devices
by: Kwak, Hyunseok, et al.
Published: (2025)

SF-MMCN: Low-Power Sever Flow Multi-Mode Diffusion Model Accelerator
by: Hsu, Huan-Ke, et al.
Published: (2024)

hARMS: A Hardware Acceleration Architecture for Real-Time Event-Based Optical Flow
by: Stumpp, Daniel C., et al.
Published: (2021)

On-Orbit Real-Time Wildfire Detection Under On-Board Constraints
by: Rötzer, Matthias, et al.
Published: (2026)

CAMO: Correlation-Aware Mask Optimization with Modulated Reinforcement Learning
by: Liang, Xiaoxiao, et al.
Published: (2024)

Accelerating AI and Computer Vision for Satellite Pose Estimation on the Intel Myriad X Embedded SoC
by: Leon, Vasileios, et al.
Published: (2024)

Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning
by: Walton, Steven
Published: (2025)

QUILL: An Algorithm-Architecture Co-Design for Cache-Local Deformable Attention
by: Oh, Hyunwoo, et al.
Published: (2025)

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices
by: Toupas, Petros, et al.
Published: (2023)

ASC: Adaptive Scale Feature Map Compression for Deep Neural Network
by: Yao, Yuan, et al.
Published: (2023)