Saved in:
| Main Authors: | Zhou, Yaoyun, Wang, Qian |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.23969 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HERO: Hardware-Efficient RL-based Optimization Framework for NeRF Quantization
by: Zhang, Yipu, et al.
Published: (2025)
by: Zhang, Yipu, et al.
Published: (2025)
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026)
by: Zhang, Kaixuan, et al.
Published: (2026)
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
by: Guan, Yue, et al.
Published: (2026)
by: Guan, Yue, et al.
Published: (2026)
CODO: An Automated Compiler for Comprehensive Dataflow Optimization
by: Zhang, Weichuang, et al.
Published: (2026)
by: Zhang, Weichuang, et al.
Published: (2026)
A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)
by: Cheng, Jianyi, et al.
Published: (2023)
CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies
by: Gouk, Donghyun, et al.
Published: (2025)
by: Gouk, Donghyun, et al.
Published: (2025)
Hardware-Aware Neural Network Compilation with Learned Optimization: A RISC-V Accelerator Approach
by: Ganti, Ravindra, et al.
Published: (2025)
by: Ganti, Ravindra, et al.
Published: (2025)
GPU-Accelerated Simulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems
by: Gonul, Yilmaz Ege, et al.
Published: (2025)
by: Gonul, Yilmaz Ege, et al.
Published: (2025)
OpenACM: An Open-Source SRAM-Based Approximate CiM Compiler
by: Zhou, Yiqi, et al.
Published: (2026)
by: Zhou, Yiqi, et al.
Published: (2026)
Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
by: Luo, Weile, et al.
Published: (2024)
by: Luo, Weile, et al.
Published: (2024)
Evaluation of GPU Video Encoder for Low-Latency Real-Time 4K UHD Encoding
by: Arunruangsirilert, Kasidis, et al.
Published: (2025)
by: Arunruangsirilert, Kasidis, et al.
Published: (2025)
PIMCOMP: An End-to-End DNN Compiler for Processing-In-Memory Accelerators
by: Sun, Xiaotian, et al.
Published: (2024)
by: Sun, Xiaotian, et al.
Published: (2024)
A Time- and Energy-Efficient CNN with Dense Connections on Memristor-Based Chips
by: Zhou, Wenyong, et al.
Published: (2025)
by: Zhou, Wenyong, et al.
Published: (2025)
Be CIM or Be Memory: A Dual-mode-aware DNN Compiler for CIM Accelerators
by: Zhao, Shixin, et al.
Published: (2025)
by: Zhao, Shixin, et al.
Published: (2025)
Compilation and Execution of an Embeddable YOLO-NAS on the VTA
by: Faure-Gignoux, Anthony, et al.
Published: (2026)
by: Faure-Gignoux, Anthony, et al.
Published: (2026)
Revet: A Language and Compiler for Dataflow Threads
by: Rucker, Alexander, et al.
Published: (2023)
by: Rucker, Alexander, et al.
Published: (2023)
Efficient LLM inference solution on Intel GPU
by: Wu, Hui, et al.
Published: (2023)
by: Wu, Hui, et al.
Published: (2023)
RoboGPU: Accelerating GPU Collision Detection for Robotics
by: Liu, Lufei, et al.
Published: (2026)
by: Liu, Lufei, et al.
Published: (2026)
Analyzing Modern NVIDIA GPU cores
by: Huerta, Rodrigo, et al.
Published: (2025)
by: Huerta, Rodrigo, et al.
Published: (2025)
RePart: Efficient Hypergraph Partitioning with Logic Replication Optimization for Multi-FPGA System
by: Fu, Zizhuo, et al.
Published: (2026)
by: Fu, Zizhuo, et al.
Published: (2026)
Bi-SamplerZ: A Hardware-Efficient Gaussian Sampler Architecture for Quantum-Resistant Falcon Signatures
by: Zhao, Binke, et al.
Published: (2025)
by: Zhao, Binke, et al.
Published: (2025)
SPPAM: Signature Pattern Prediction and Access-Map Prefetcher
by: Merrell, Maccoy, et al.
Published: (2026)
by: Merrell, Maccoy, et al.
Published: (2026)
Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration
by: Shahawy, Mohamed, et al.
Published: (2025)
by: Shahawy, Mohamed, et al.
Published: (2025)
An FPGA Compiler for On-the-Fly Adaptive CNN Deployment and Reconfiguration
by: Mazouz, Alaa, et al.
Published: (2025)
by: Mazouz, Alaa, et al.
Published: (2025)
Structural Mutation Based Differential Testing for FPGA Logic Synthesis Compilers
by: Xu, Zhihao, et al.
Published: (2025)
by: Xu, Zhihao, et al.
Published: (2025)
Design of a GPU with Heterogeneous Cores for Graphics
by: Tomás, Aurora, et al.
Published: (2026)
by: Tomás, Aurora, et al.
Published: (2026)
COOK Access Control on an embedded Volta GPU
by: Lesage, Benjamin, et al.
Published: (2024)
by: Lesage, Benjamin, et al.
Published: (2024)
PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction
by: Zhang, Kaixuan, et al.
Published: (2026)
by: Zhang, Kaixuan, et al.
Published: (2026)
PipeRTL: Timing-Aware Pipeline Optimization at IR-Level for RTL Generation
by: Yin, Shuo, et al.
Published: (2026)
by: Yin, Shuo, et al.
Published: (2026)
Adapting Atmospheric Chemistry Components for Efficient GPU Accelerators
by: Ruiz, Christian Guzman, et al.
Published: (2024)
by: Ruiz, Christian Guzman, et al.
Published: (2024)
Leveraging Application-Specific Knowledge for Energy-Efficient Deep Learning Accelerators on Resource-Constrained FPGAs
by: Qian, Chao
Published: (2025)
by: Qian, Chao
Published: (2025)
SEGA-DCIM: Design Space Exploration-Guided Automatic Digital CIM Compiler with Multiple Precision Support
by: Diao, Haikang, et al.
Published: (2025)
by: Diao, Haikang, et al.
Published: (2025)
Multiport Support for Vortex OpenGPU Memory Hierarchy
by: Shin, Injae, et al.
Published: (2025)
by: Shin, Injae, et al.
Published: (2025)
CuLifter: Lifting GPU Binaries to Typed IR
by: Zhao, Jisheng, et al.
Published: (2026)
by: Zhao, Jisheng, et al.
Published: (2026)
Generation of Compiler Backends from Formal Models of Hardware
by: Smith, Gus Henry
Published: (2024)
by: Smith, Gus Henry
Published: (2024)
Capstone: Power-Capped Pipelining for Coarse-Grained Reconfigurable Array Compilers
by: Yarzada, Sabrina, et al.
Published: (2026)
by: Yarzada, Sabrina, et al.
Published: (2026)
Building a Reusable and Extensible Automatic Compiler Infrastructure for Reconfigurable Devices
by: Zang, Zhenya, et al.
Published: (2023)
by: Zang, Zhenya, et al.
Published: (2023)
SimulatorCoder: DNN Accelerator Simulator Code Generation and Optimization via Large Language Models
by: Xia, Yuhuan, et al.
Published: (2026)
by: Xia, Yuhuan, et al.
Published: (2026)
GAP-LA: GPU-Accelerated Performance-Driven Layer Assignment
by: Zhao, Chunyuan, et al.
Published: (2025)
by: Zhao, Chunyuan, et al.
Published: (2025)
Thermal Analysis for NVIDIA GTX480 Fermi GPU Architecture
by: Nagendra, Savinay
Published: (2024)
by: Nagendra, Savinay
Published: (2024)
Similar Items
-
HERO: Hardware-Efficient RL-based Optimization Framework for NeRF Quantization
by: Zhang, Yipu, et al.
Published: (2025) -
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026) -
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments
by: Guan, Yue, et al.
Published: (2026) -
CODO: An Automated Compiler for Comprehensive Dataflow Optimization
by: Zhang, Weichuang, et al.
Published: (2026) -
A Dataflow Compiler for Efficient LLM Inference using Custom Microscaling Formats
by: Cheng, Jianyi, et al.
Published: (2023)