Saved in:
Bibliographic Details
Main Authors: Xu, Yuemeng, Chen, Haoran, Guo, Jiarui, Cui, Mingwei, Yin, Qiuheng, Dong, Cheng, Kang, Daxiang, Wu, Xian, Sun, Chenmin, He, Peng, Gao, Yang, Lai, Lirong, Wang, Kai, Wu, Hongyu, Yang, Tong, Xu, Xiyun
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.11043
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912778524033024
author Xu, Yuemeng
Chen, Haoran
Guo, Jiarui
Cui, Mingwei
Yin, Qiuheng
Dong, Cheng
Kang, Daxiang
Wu, Xian
Sun, Chenmin
He, Peng
Gao, Yang
Lai, Lirong
Wang, Kai
Wu, Hongyu
Yang, Tong
Xu, Xiyun
author_facet Xu, Yuemeng
Chen, Haoran
Guo, Jiarui
Cui, Mingwei
Yin, Qiuheng
Dong, Cheng
Kang, Daxiang
Wu, Xian
Sun, Chenmin
He, Peng
Gao, Yang
Lai, Lirong
Wang, Kai
Wu, Hongyu
Yang, Tong
Xu, Xiyun
contents Operating at petabit-scale, ByteDance's cloud gateways are deployed at critical aggregation points to orchestrate a wide array of business traffic. However, this massive scale imposes significant resource pressure on our previous-generation cloud gateways, rendering them unsustainable in the face of ever-growing cloud-network traffic. As the DPU market rapidly expands, we see a promising path to meet our escalating business traffic demands by integrating DPUs with our established Tofino-based gateways. DPUs augment these gateways with substantially larger table capacities and richer programmability without compromising previously low-latency and high-throughput forwarding. Despite compelling advantages, the practical integration of DPUs into cloud gateways remains unexplored, primarily due to underlying challenges. In this paper, we present Zephyrus, a production-scale gateway built upon a unified P4 pipeline spanning high-performance Tofino and feature-rich DPUs, which successfully overcomes these challenges. We further introduce a hierarchical co-offloading architecture (HLCO) to orchestrate traffic flow within this heterogeneous gateway, achieving > 99% hardware offloading while retaining software fallback paths for complex operations. Zephyrus outperforms LuoShen (NSDI '24) with 33% higher throughput and our evaluation further indicates 21% lower power consumption and 14% lower hardware cost. Against FPGA-based systems, Albatross (SIGCOMM '25), it doubles the throughput at a substantially lower Total Cost of Ownership (TCO), showcasing its superior performance-per-dollar. Beyond these performance gains, we also share key lessons from several years of developing and operating Zephyrus at production scale. We believe these insights provide valuable references for researchers and practitioners designing performant cloud gateways.
format Preprint
id arxiv_https___arxiv_org_abs_2510_11043
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Zephyrus: Scaling Gateways Beyond the Petabit-Era with DPU-Augmented Hierarchical Co-Offloading
Xu, Yuemeng
Chen, Haoran
Guo, Jiarui
Cui, Mingwei
Yin, Qiuheng
Dong, Cheng
Kang, Daxiang
Wu, Xian
Sun, Chenmin
He, Peng
Gao, Yang
Lai, Lirong
Wang, Kai
Wu, Hongyu
Yang, Tong
Xu, Xiyun
Networking and Internet Architecture
Operating at petabit-scale, ByteDance's cloud gateways are deployed at critical aggregation points to orchestrate a wide array of business traffic. However, this massive scale imposes significant resource pressure on our previous-generation cloud gateways, rendering them unsustainable in the face of ever-growing cloud-network traffic. As the DPU market rapidly expands, we see a promising path to meet our escalating business traffic demands by integrating DPUs with our established Tofino-based gateways. DPUs augment these gateways with substantially larger table capacities and richer programmability without compromising previously low-latency and high-throughput forwarding. Despite compelling advantages, the practical integration of DPUs into cloud gateways remains unexplored, primarily due to underlying challenges. In this paper, we present Zephyrus, a production-scale gateway built upon a unified P4 pipeline spanning high-performance Tofino and feature-rich DPUs, which successfully overcomes these challenges. We further introduce a hierarchical co-offloading architecture (HLCO) to orchestrate traffic flow within this heterogeneous gateway, achieving > 99% hardware offloading while retaining software fallback paths for complex operations. Zephyrus outperforms LuoShen (NSDI '24) with 33% higher throughput and our evaluation further indicates 21% lower power consumption and 14% lower hardware cost. Against FPGA-based systems, Albatross (SIGCOMM '25), it doubles the throughput at a substantially lower Total Cost of Ownership (TCO), showcasing its superior performance-per-dollar. Beyond these performance gains, we also share key lessons from several years of developing and operating Zephyrus at production scale. We believe these insights provide valuable references for researchers and practitioners designing performant cloud gateways.
title Zephyrus: Scaling Gateways Beyond the Petabit-Era with DPU-Augmented Hierarchical Co-Offloading
topic Networking and Internet Architecture
url https://arxiv.org/abs/2510.11043