:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhou, Yaoyun, Wang, Qian
Format:	Preprint
Published:	2025
Subjects:	Hardware Architecture
Online Access:	https://arxiv.org/abs/2512.23969
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HERO: Hardware-Efficient RL-based Optimization Framework for NeRF Quantization
by: Zhang, Yipu, et al.
Published: (2025)

WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026)

TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
by: Guan, Yue, et al.
Published: (2026)

CODO: An Automated Compiler for Comprehensive Dataflow Optimization
by: Zhang, Weichuang, et al.
Published: (2026)

A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)

CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
by: Gouk, Donghyun, et al.
Published: (2025)

Hardware-Aware Neural Network Compilation with Learned Optimization: A RISC-V Accelerator Approach
by: Ganti, Ravindra, et al.
Published: (2025)

GPU-Accelerated Simulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems
by: Gonul, Yilmaz Ege, et al.
Published: (2025)

OpenACM: An Open-Source SRAM-Based Approximate CiM Compiler
by: Zhou, Yiqi, et al.
Published: (2026)

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
by: Luo, Weile, et al.
Published: (2024)

Evaluation of GPU Video Encoder for Low-Latency Real-Time 4K UHD Encoding
by: Arunruangsirilert, Kasidis, et al.
Published: (2025)

PIMCOMP: An End-to-End DNN Compiler for Processing-In-Memory Accelerators
by: Sun, Xiaotian, et al.
Published: (2024)

A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips
by: Zhou, Wenyong, et al.
Published: (2025)

Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)

Compilation and Execution of an Embeddable YOLO-NAS on the VTA
by: Faure-Gignoux, Anthony, et al.
Published: (2026)

Revet: A Language and Compiler for Dataflow Threads
by: Rucker, Alexander, et al.
Published: (2023)

Efficient LLM inference solution on Intel GPU
by: Wu, Hui, et al.
Published: (2023)

RoboGPU: Accelerating GPU Collision Detection for Robotics
by: Liu, Lufei, et al.
Published: (2026)

Analyzing Modern NVIDIA GPU cores
by: Huerta, Rodrigo, et al.
Published: (2025)

RePart: Efficient Hypergraph Partitioning with Logic Replication Optimization for Multi-FPGA System
by: Fu, Zizhuo, et al.
Published: (2026)

Bi-SamplerZ: A Hardware-Efficient Gaussian Sampler Architecture for Quantum-Resistant Falcon Signatures
by: Zhao, Binke, et al.
Published: (2025)

SPPAM: Signature Pattern Prediction and Access-Map Prefetcher
by: Merrell, Maccoy, et al.
Published: (2026)

Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration
by: Shahawy, Mohamed, et al.
Published: (2025)

An FPGA Compiler for On-the-Fly Adaptive CNN Deployment and Reconfiguration
by: Mazouz, Alaa, et al.
Published: (2025)

Structural Mutation Based Differential Testing for FPGA Logic Synthesis Compilers
by: Xu, Zhihao, et al.
Published: (2025)

Design of a GPU with Heterogeneous Cores for Graphics
by: Tomás, Aurora, et al.
Published: (2026)

COOK Access Control on an embedded Volta GPU
by: Lesage, Benjamin, et al.
Published: (2024)

PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction
by: Zhang, Kaixuan, et al.
Published: (2026)

PipeRTL: Timing-Aware Pipeline Optimization at IR-Level for RTL Generation
by: Yin, Shuo, et al.
Published: (2026)

Adapting Atmospheric Chemistry Components for Efficient GPU Accelerators
by: Ruiz, Christian Guzman, et al.
Published: (2024)

Leveraging Application-Specific Knowledge for Energy-Efficient Deep Learning Accelerators on Resource-Constrained FPGAs
by: Qian, Chao
Published: (2025)

SEGA-DCIM: Design Space Exploration-Guided Automatic Digital CIM Compiler with Multiple Precision Support
by: Diao, Haikang, et al.
Published: (2025)

Multiport Support for Vortex OpenGPU Memory Hierarchy
by: Shin, Injae, et al.
Published: (2025)

CuLifter: Lifting GPU Binaries to Typed IR
by: Zhao, Jisheng, et al.
Published: (2026)

Generation of Compiler Backends from Formal Models of Hardware
by: Smith, Gus Henry
Published: (2024)

Capstone: Power-Capped Pipelining for Coarse-Grained Reconfigurable Array Compilers
by: Yarzada, Sabrina, et al.
Published: (2026)

Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices
by: Zang, Zhenya, et al.
Published: (2023)

SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
by: Xia, Yuhuan, et al.
Published: (2026)

GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment
by: Zhao, Chunyuan, et al.
Published: (2025)

Thermal Analysis for NVIDIA GTX480 Fermi GPU Architecture
by: Nagendra, Savinay
Published: (2024)