:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Salishev, Sergey
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Hardware Architecture Distributed, Parallel, and Cluster Computing Data Structures and Algorithms Numerical Analysis B.2.4
Online-Zugang:	https://arxiv.org/abs/2505.06728
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

On Optimizing Locality of Graph Transposition on Modern Architectures
von: Esfahani, Mohsen Koohi, et al.
Veröffentlicht: (2025)

OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths
von: Gold, Leo, et al.
Veröffentlicht: (2024)

Synthesis of signal processing algorithms with constraints on minimal parallelism and memory space
von: Salishev, Sergey
Veröffentlicht: (2025)

GPU-Augmented OLAP Execution Engine: GPU Offloading
von: Chang, Ilsun
Veröffentlicht: (2025)

Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices
von: Ganbarov, Ali, et al.
Veröffentlicht: (2024)

GPU-Parallelizable Randomized Sketch-and-Precondition for Linear Regression using Sparse Sign Sketches
von: Chen, Tyler, et al.
Veröffentlicht: (2025)

On Some Peculiarities of Dynamic Switch between Component Implementations in an Autonomic Computing System
von: Mackarov, Igor
Veröffentlicht: (2006)

A Reexamination of the COnfLUX 2.5D LU Factorization Algorithm
von: Tang, Yuan
Veröffentlicht: (2024)

A Reexamination of the Communication Bandwidth Cost Analysis of A Parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations
von: Tang, Yuan
Veröffentlicht: (2024)

Parallel GPU-Accelerated Randomized Construction of Approximate Cholesky Preconditioners
von: Liang, Tianyu, et al.
Veröffentlicht: (2025)

Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSV
von: Chen, Qian, et al.
Veröffentlicht: (2024)

Construction of a Byzantine Linearizable SWMR Atomic Register from SWSR Atomic Registers
von: Kshemkalyani, Ajay D., et al.
Veröffentlicht: (2024)

Parallel Simulation for Log-concave Sampling and Score-based Diffusion Models
von: Zhou, Huanjian, et al.
Veröffentlicht: (2024)

TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification
von: Sharma, Vijay Pratap, et al.
Veröffentlicht: (2026)

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
von: Pati, Suchita, et al.
Veröffentlicht: (2024)

HYLU: Hybrid Parallel Sparse LU Factorization
von: Chen, Xiaoming
Veröffentlicht: (2025)

Practical Byzantine Reliable Broadcast on Partially Connected Networks (Extended version)
von: Bonomi, Silvia, et al.
Veröffentlicht: (2021)

MementoHash: A Stateful, Minimal Memory, Best Performing Consistent Hash Algorithm
von: Coluzzi, Massimo, et al.
Veröffentlicht: (2023)

On the Resilience of Fast Failover Routing Against Dynamic Link Failures
von: Dai, Wenkai, et al.
Veröffentlicht: (2024)

Evaluation of computational and energy performance in matrix multiplication algorithms on CPU and GPU using MKL, cuBLAS and SYCL
von: Torres, L. A., et al.
Veröffentlicht: (2024)

On the Randomized Locality of Matching Problems in Regular Graphs
von: Khoury, Seri, et al.
Veröffentlicht: (2025)

PASS: An Asynchronous Probabilistic Processor for Next Generation Intelligence
von: Patel, Saavan, et al.
Veröffentlicht: (2024)

Sky$^ε$-Tree: Embracing the Batch Updates of B$^ε$-trees through Access Port Parallelism on Skyrmion Racetrack Memory
von: Tsai, Yu-Shiang, et al.
Veröffentlicht: (2024)

Constitutional Consensus for Democratic Governance
von: Keidar, Idit, et al.
Veröffentlicht: (2025)

Accelerating Distributed Deep Learning using Lossless Homomorphic Compression
von: Li, Haoyu, et al.
Veröffentlicht: (2024)

TT-Edge: A Hardware-Software Co-Design for Energy-Efficient Tensor-Train Decomposition on Edge AI
von: Kwak, Hyunseok, et al.
Veröffentlicht: (2025)

The DEEP-ER project: I/O and resiliency extensions for the Cluster-Booster architecture
von: Kreuzer, Anke, et al.
Veröffentlicht: (2019)

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator
von: Qiu, Tong Dong, et al.
Veröffentlicht: (2023)

SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding
von: Xu, Weihong, et al.
Veröffentlicht: (2025)

COMET: A Framework for Modeling Compound Operation Dataflows with Explicit Collectives
von: Negi, Shubham, et al.
Veröffentlicht: (2025)

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
von: Kubo, Tatsuya, et al.
Veröffentlicht: (2025)

PAM: Processing Across Memory Hierarchy for Efficient KV-centric LLM Serving System
von: Liu, Lian, et al.
Veröffentlicht: (2026)

RAPID-Graph: Recursive All-Pairs Shortest Paths Using Processing-in-Memory for Dynamic Programming on Graphs
von: Chen, Yanru, et al.
Veröffentlicht: (2025)

Chopper: A Multi-Level GPU Characterization Tool & Derived Insights Into LLM Training Inefficiency
von: Kurzynski, Marco, et al.
Veröffentlicht: (2025)

Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems
von: Zhang, Chen, et al.
Veröffentlicht: (2026)

Efficient deadlock avoidance for 2D mesh NoCs that use OQ or VOQ routers
von: Papaphilippou, Philippos, et al.
Veröffentlicht: (2023)

DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications
von: Orenes-Vera, Marcelo, et al.
Veröffentlicht: (2023)

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores
von: García-García, Adrián, et al.
Veröffentlicht: (2024)

FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs
von: Li, Bohan, et al.
Veröffentlicht: (2026)

Sequence-Aware Split Heuristic to Mitigate SM Underutilization in FlashAttention-3 Low-Head-Count Decoding
von: Font, Martí Llopart, et al.
Veröffentlicht: (2026)