Saved in:
| Main Authors: | Feng, Dahu, Feng, Erhu, Du, Dong, Xu, Pinjie, Xia, Yubin, Chen, Haibo, Zhao, Rong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.11446 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference
by: Chen, Le, et al.
Published: (2025)
by: Chen, Le, et al.
Published: (2025)
Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025)
by: Scionti, Alberto, et al.
Published: (2025)
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
by: Zhang, Chen, et al.
Published: (2026)
by: Zhang, Chen, et al.
Published: (2026)
Handling of Memory Page Faults during Virtual-Address RDMA
by: Psistakis, Antonis
Published: (2025)
by: Psistakis, Antonis
Published: (2025)
ELK: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques
by: Liu, Yiqi, et al.
Published: (2025)
by: Liu, Yiqi, et al.
Published: (2025)
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment
by: Psistakis, Antonis
Published: (2025)
by: Psistakis, Antonis
Published: (2025)
Deadlock-free routing for Full-mesh networks without using Virtual Channels
by: Cano, Alejandro, et al.
Published: (2025)
by: Cano, Alejandro, et al.
Published: (2025)
TeraPool: A Physical Design Aware, 1024 RISC-V Cores Shared-L1-Memory Scaled-up Cluster Design with High Bandwidth Main Memory Link
by: Zhang, Yichao, et al.
Published: (2026)
by: Zhang, Yichao, et al.
Published: (2026)
NetSmith: An Optimization Framework for Machine-Discovered Network Topologies
by: Green, Conor, et al.
Published: (2024)
by: Green, Conor, et al.
Published: (2024)
cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications
by: Wang, Xi, et al.
Published: (2025)
by: Wang, Xi, et al.
Published: (2025)
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception
by: Odema, Mohanad, et al.
Published: (2024)
by: Odema, Mohanad, et al.
Published: (2024)
A Modern Primer on Processing in Memory
by: Mutlu, Onur, et al.
Published: (2020)
by: Mutlu, Onur, et al.
Published: (2020)
ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System
by: Barkhordar, Marzieh, et al.
Published: (2026)
by: Barkhordar, Marzieh, et al.
Published: (2026)
A New Family of Thread to Core Allocation Policies for an SMT ARM Processor
by: Navarro, Marta, et al.
Published: (2025)
by: Navarro, Marta, et al.
Published: (2025)
CCSS: Hardware-Accelerated RTL Simulation with Fast Combinational Logic Computing and Sequential Logic Synchronization
by: Feng, Weigang, et al.
Published: (2025)
by: Feng, Weigang, et al.
Published: (2025)
Optimizing Task Scheduling in Fog Computing with Deadline Awareness
by: Sirjani, Mohammad Sadegh, et al.
Published: (2025)
by: Sirjani, Mohammad Sadegh, et al.
Published: (2025)
Accelerating Triangle Counting with Real Processing-in-Memory Systems
by: Asquini, Lorenzo, et al.
Published: (2025)
by: Asquini, Lorenzo, et al.
Published: (2025)
Balanced Data Placement for GEMV Acceleration with Processing-In-Memory
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
by: Ibrahim, Mohamed Assem, et al.
Published: (2024)
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
by: Mutlu, Onur, et al.
Published: (2024)
by: Mutlu, Onur, et al.
Published: (2024)
MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing
by: Oliveira, Geraldo F., et al.
Published: (2024)
by: Oliveira, Geraldo F., et al.
Published: (2024)
SpeedMalloc: Improving Multi-threaded Applications via a Lightweight Core for Memory Allocation
by: Li, Ruihao, et al.
Published: (2025)
by: Li, Ruihao, et al.
Published: (2025)
Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training
by: Adnan, Muhammad, et al.
Published: (2024)
by: Adnan, Muhammad, et al.
Published: (2024)
New Tools, Programming Models, and System Support for Processing-in-Memory Architectures
by: Oliveira, Geraldo F.
Published: (2025)
by: Oliveira, Geraldo F.
Published: (2025)
RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory
by: Ghiasi, Nika Mansouri, et al.
Published: (2022)
by: Ghiasi, Nika Mansouri, et al.
Published: (2022)
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
by: Zhou, Zhuoshan, et al.
Published: (2026)
by: Zhou, Zhuoshan, et al.
Published: (2026)
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
by: Zhang, Qijun, et al.
Published: (2026)
by: Zhang, Qijun, et al.
Published: (2026)
PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM
by: Kubo, Tatsuya, et al.
Published: (2025)
by: Kubo, Tatsuya, et al.
Published: (2025)
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A
by: Jarmusch, Aaron, et al.
Published: (2026)
by: Jarmusch, Aaron, et al.
Published: (2026)
PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System
by: Liu, Lian, et al.
Published: (2026)
by: Liu, Lian, et al.
Published: (2026)
Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)
by: Tian, Yuyang, et al.
Published: (2025)
PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices
by: Noh, Si Ung, et al.
Published: (2024)
by: Noh, Si Ung, et al.
Published: (2024)
Automated Deep Neural Network Inference Partitioning for Distributed Embedded Systems
by: Kreß, Fabian, et al.
Published: (2024)
by: Kreß, Fabian, et al.
Published: (2024)
RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs
by: Chen, Yanru, et al.
Published: (2025)
by: Chen, Yanru, et al.
Published: (2025)
Sequence-Aware Split Heuristic to Mitigate SM Underutilization in FlashAttention-3 Low-Head-Count Decoding
by: Font, Martí Llopart, et al.
Published: (2026)
by: Font, Martí Llopart, et al.
Published: (2026)
Experimental Assessment of Containers Running on Top of Virtual Machines
by: Aqasizade, Hossein, et al.
Published: (2024)
by: Aqasizade, Hossein, et al.
Published: (2024)
Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in Solid State Drives
by: Nadig, Rakesh, et al.
Published: (2026)
by: Nadig, Rakesh, et al.
Published: (2026)
Proteus: Enabling High-Performance Processing-Using-DRAM with Dynamic Bit-Precision, Adaptive Data Representation, and Flexible Arithmetic
by: Oliveira, Geraldo F., et al.
Published: (2025)
by: Oliveira, Geraldo F., et al.
Published: (2025)
TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios
by: Zhang, Yichao, et al.
Published: (2024)
by: Zhang, Yichao, et al.
Published: (2024)
FLEX: Leveraging FPGA-CPU Synergy for Mixed-Cell-Height Legalization Acceleration
by: Liu, Xingyu, et al.
Published: (2025)
by: Liu, Xingyu, et al.
Published: (2025)
Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service
by: Zheng, Xianzhe, et al.
Published: (2026)
by: Zheng, Xianzhe, et al.
Published: (2026)
Similar Items
-
Characterizing Mobile SoC for Accelerating Heterogeneous LLM Inference
by: Chen, Le, et al.
Published: (2025) -
Enabling Time-Aware Priority Traffic Management over Distributed FPGA Nodes
by: Scionti, Alberto, et al.
Published: (2025) -
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
by: Zhang, Chen, et al.
Published: (2026) -
Handling of Memory Page Faults during Virtual-Address RDMA
by: Psistakis, Antonis
Published: (2025) -
ELK: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques
by: Liu, Yiqi, et al.
Published: (2025)