:: Library Catalog

Image de couverture de livre

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Luo, Jingjia, Zhang, Mingxing, Chen, Kang, Liao, Xia, Shan, Yingdi, Jiang, Jinlei, Wu, Yongwei
Format:	Preprint
Publié:	2025
Sujets:	Distributed, Parallel, and Cluster Computing
Accès en ligne:	https://arxiv.org/abs/2504.20461
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

Documents similaires

Efficient Heterogeneous Large Language Model Decoding with Model-Attention Disaggregation
par: Chen, Shaoyuan, et autres
Publié: (2024)

PathWeaver: A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
par: Kim, Sukjin, et autres
Publié: (2025)

Scalable Graph Indexing using GPUs for Approximate Nearest Neighbor Search
par: Li, Zhonggen, et autres
Publié: (2025)

TrEnv-X: Transparently Share Serverless Execution Environments Across Different Functions and Nodes
par: Huang, Jialiang, et autres
Publié: (2025)

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU
par: V., Karthik, et autres
Publié: (2024)

GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph Construction
par: Li, Xiang, et autres
Publié: (2025)

PECANN: Parallel Efficient Clustering with Graph-Based Approximate Nearest Neighbor Search
par: Yu, Shangdi, et autres
Publié: (2023)

Exact Nearest-Neighbor Search on Energy-Efficient FPGA Devices
par: Dazzi, Patrizio, et autres
Publié: (2025)

Advancing RT Core-Accelerated Fixed-Radius Nearest Neighbor Search
par: Meneses, Enzo, et autres
Publié: (2026)

CleANN: Efficient Full Dynamism in Graph-based Approximate Nearest Neighbor Search
par: Zhang, Ziyu, et autres
Publié: (2025)

Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
par: Qin, Ruoyu, et autres
Publié: (2025)

BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG
par: Malkhi, Dahlia, et autres
Publié: (2023)

Shift Parallelism: Low-Latency, High-Throughput LLM Inference for Dynamic Workloads
par: Hidayetoglu, Mert, et autres
Publié: (2025)

Cortex: Achieving Low-Latency, Cost-Efficient Remote Data Access For LLM via Semantic-Aware Knowledge Caching
par: Ruan, Chaoyi, et autres
Publié: (2025)

Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing
par: Mandarapu, Durga, et autres
Publié: (2023)

On the Effectiveness of Graph Reordering for Accelerating Approximate Nearest Neighbor Search on GPU
par: Oguri, Yutaro, et autres
Publié: (2025)

NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
par: Zou, Cheng, et autres
Publié: (2026)

PiPNN: Ultra-Scalable Graph-Based Nearest Neighbor Indexing
par: Rubel, Tobias, et autres
Publié: (2026)

Falcon: Advancing Asynchronous BFT Consensus for Lower Latency and Enhanced Throughput
par: Dai, Xiaohai, et autres
Publié: (2025)

Torpor: GPU-Enabled Serverless Computing for Low-Latency, Resource-Efficient Inference
par: Yu, Minchen, et autres
Publié: (2023)

CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs
par: Ootomo, Hiroyuki, et autres
Publié: (2023)

OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency
par: Wang, Jun, et autres
Publié: (2025)

Surviving Partial Rank Failures in Wide Expert-Parallel MoE Inference
par: Sun, Xun, et autres
Publié: (2026)

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter
par: Qin, Ruoyu, et autres
Publié: (2026)

Low-Latency Federated Fine-Tuning for Large Language Models Over Wireless Networks
par: Pang, Zhiwen, et autres
Publié: (2026)

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
par: Agrawal, Amey, et autres
Publié: (2024)

Fast Iterative Graph Computing with Updated Neighbor States
par: Zhou, Yijie, et autres
Publié: (2024)

TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving
par: Ren, Feng, et autres
Publié: (2026)

Dumbo-NG: Fast Asynchronous BFT Consensus with Throughput-Oblivious Latency
par: Gao, Yingzi, et autres
Publié: (2022)

SOLANET: Distributed Neighbor Graph Construction on GPU-Accelerated Systems
par: Iwabuchi, Keita, et autres
Publié: (2026)

Revisiting Speculative Leaderless Protocols for Low-Latency BFT Replication
par: Qian, Daniel, et autres
Publié: (2026)

Low Latency, High Bandwidth Streaming of Experimental Data with EJFAT
par: Baldin, Ilya, et autres
Publié: (2025)

Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput
par: Song, Jingwei, et autres
Publié: (2025)

Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
par: Dai, Yinwei, et autres
Publié: (2023)

OMEGA: A Low-Latency GNN Serving System for Large Graphs
par: Kim, Geon-Woo, et autres
Publié: (2025)

NAVIS: Concurrent Search and Update with Low Position-Seeking Overhead in On-SSD Graph-Based Vector Search
par: Song, Jaeyong, et autres
Publié: (2026)

Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing
par: Liu, Mengjie, et autres
Publié: (2024)

Chasing the Speed of Light: Low-Latency Planetary-Scale Adaptive Byzantine Consensus
par: Berger, Christian, et autres
Publié: (2023)

Large-Scale Graph Building in Dynamic Environments: Low Latency and High Quality
par: de Almeida, Filipe Miguel Gonçalves, et autres
Publié: (2025)

Scene-Aware Latency Estimation for Microservices via Multi-Scale Graph Fusion
par: Sun, Zhichao, et autres
Publié: (2026)