Saved in:
| Main Authors: | Wu, Yanran, Hua, Inez, Ding, Yi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.11256 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing
by: Wu, Yanran, et al.
Published: (2025)
by: Wu, Yanran, et al.
Published: (2025)
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
by: Zheng, Pingqing, et al.
Published: (2025)
by: Zheng, Pingqing, et al.
Published: (2025)
GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
by: Shi, Tianyao, et al.
Published: (2024)
by: Shi, Tianyao, et al.
Published: (2024)
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
by: Koo, Jahyun, et al.
Published: (2024)
by: Koo, Jahyun, et al.
Published: (2024)
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
DeepRTL2: A Versatile Model for RTL-Related Tasks
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving
by: Kim, Wonung, et al.
Published: (2025)
by: Kim, Wonung, et al.
Published: (2025)
Llumnix: Dynamic Scheduling for Large Language Model Serving
by: Sun, Biao, et al.
Published: (2024)
by: Sun, Biao, et al.
Published: (2024)
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
by: Chen, Hongzheng, et al.
Published: (2023)
by: Chen, Hongzheng, et al.
Published: (2023)
MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving
by: Lee, Jungi, et al.
Published: (2025)
by: Lee, Jungi, et al.
Published: (2025)
PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models
by: Cho, Eunyeong, et al.
Published: (2026)
by: Cho, Eunyeong, et al.
Published: (2026)
Speculative Decoding for Verilog: Speed and Quality, All in One
by: Xu, Changran, et al.
Published: (2025)
by: Xu, Changran, et al.
Published: (2025)
Characterizing the Behavior of Training Mamba-based State Space Models on GPUs
by: Baruah, Trinayan, et al.
Published: (2025)
by: Baruah, Trinayan, et al.
Published: (2025)
Is Finer Better? The Limits of Microscaling Formats in Large Language Models
by: Fasoli, Andrea, et al.
Published: (2026)
by: Fasoli, Andrea, et al.
Published: (2026)
D2S-FLOW: Automated Parameter Extraction from Datasheets for SPICE Model Generation Using Large Language Models
by: Chen, Hong Cai, et al.
Published: (2025)
by: Chen, Hong Cai, et al.
Published: (2025)
Serving Large Language Models on Huawei CloudMatrix384
by: Zuo, Pengfei, et al.
Published: (2025)
by: Zuo, Pengfei, et al.
Published: (2025)
Scaling Laws for Floating Point Quantization Training
by: Sun, Xingwu, et al.
Published: (2025)
by: Sun, Xingwu, et al.
Published: (2025)
Leveraging High-Level Synthesis and Large Language Models to Generate, Simulate, and Deploy a Uniform Random Number Generator Hardware Design
by: Meech, James T.
Published: (2023)
by: Meech, James T.
Published: (2023)
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
by: Jiang, Wenqi, et al.
Published: (2023)
by: Jiang, Wenqi, et al.
Published: (2023)
Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization
by: Tian, Jiayi, et al.
Published: (2025)
by: Tian, Jiayi, et al.
Published: (2025)
Understanding and Mitigating Errors of LLM-Generated RTL Code
by: Zhang, Jiazheng, et al.
Published: (2025)
by: Zhang, Jiazheng, et al.
Published: (2025)
From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR
by: Wang, Erwei, et al.
Published: (2025)
by: Wang, Erwei, et al.
Published: (2025)
Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference
by: Kumaresan, Ramchand
Published: (2026)
by: Kumaresan, Ramchand
Published: (2026)
GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation
by: Kochar, Dimple Vijay, et al.
Published: (2026)
by: Kochar, Dimple Vijay, et al.
Published: (2026)
Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees
by: Heakl, Ahmed, et al.
Published: (2025)
by: Heakl, Ahmed, et al.
Published: (2025)
When Servers Meet Species: A Fab-to-Grave Lens on Computing's Biodiversity Impact
by: Shi, Tianyao, et al.
Published: (2025)
by: Shi, Tianyao, et al.
Published: (2025)
Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Lorecast: Layout-Aware Performance and Power Forecasting from Natural Language
by: Wang, Runzhi, et al.
Published: (2025)
by: Wang, Runzhi, et al.
Published: (2025)
Memory Access Characterization of Large Language Models in CPU Environment and its Potential Impacts
by: Banasik, Spencer
Published: (2025)
by: Banasik, Spencer
Published: (2025)
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
by: Li, Jinhao, et al.
Published: (2024)
by: Li, Jinhao, et al.
Published: (2024)
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors
by: Zhang, Chengming, et al.
Published: (2024)
by: Zhang, Chengming, et al.
Published: (2024)
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture
by: Wu, Taiqiang, et al.
Published: (2025)
by: Wu, Taiqiang, et al.
Published: (2025)
LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits
by: Zhou, Zikai, et al.
Published: (2025)
by: Zhou, Zikai, et al.
Published: (2025)
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
by: Hendria, Willy Fitra
Published: (2026)
by: Hendria, Willy Fitra
Published: (2026)
A Survey on Hardware Accelerators for Large Language Models
by: Kachris, Christoforos
Published: (2024)
by: Kachris, Christoforos
Published: (2024)
Observation, Not Prediction: Conversation-Level Disaggregated Scheduling for Agentic Serving
by: Ding, Jianru, et al.
Published: (2026)
by: Ding, Jianru, et al.
Published: (2026)
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design
by: Xia, Haojun, et al.
Published: (2024)
by: Xia, Haojun, et al.
Published: (2024)
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
by: Aralimatti, Rakshit, et al.
Published: (2025)
by: Aralimatti, Rakshit, et al.
Published: (2025)
Allo: A Programming Model for Composable Accelerator Design
by: Chen, Hongzheng, et al.
Published: (2024)
by: Chen, Hongzheng, et al.
Published: (2024)
Similar Items
-
Not All Water Consumption Is Equal: A Water Stress Weighted Metric for Sustainable Computing
by: Wu, Yanran, et al.
Published: (2025) -
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
by: Zheng, Pingqing, et al.
Published: (2025) -
GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
by: Shi, Tianyao, et al.
Published: (2024) -
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models
by: Koo, Jahyun, et al.
Published: (2024) -
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
by: Li, Yang, et al.
Published: (2024)