Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Liu, Hui, Cao, Yi, Cai, Zehan, Mao, Hua, Chen, Jie
Format:	Preprint
Published:	2021
Subjects:	Computational Complexity
Online Access:	https://arxiv.org/abs/2108.04000
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912239302213632
author	Liu, Hui Cao, Yi Cai, Zehan Mao, Hua Chen, Jie
author_facet	Liu, Hui Cao, Yi Cai, Zehan Mao, Hua Chen, Jie
contents	Compiling the statistics of large-scale IP address data is an essential task in network traffic measurement. The statistical results are used to evaluate the potential impact of user behaviors on network traffic. This requires algorithms that are capable of storing and retrieving a high volume of IP addresses within time and memory constraints. In this paper, we present two efficient algorithms for collecting the statistics of large-scale IP addresses that balance time efficiency and memory consumption. The proposed solutions take into account the sparse nature of the statistics of IP addresses while building the hash function and maintain a dynamic balance among layered memory blocks. There are two layers in the first proposed method, each of which contains a limited number of memory blocks. Each memory block contains 256 elements of size $256 \times 8$ bytes for a 64-bit system. In contrast to built-in hash mapping functions, the proposed solution completely avoids expensive hash collisions while retaining the linear time complexity of hash-based solutions. Moreover, the mechanism dynamically determines the hash index length according to the range of IP addresses, and can balance the time and memory constraints. In addition, we propose an efficient parallel scheme to speed up the collection of statistics. The experimental results on several synthetic datasets show that the proposed method substantially outperforms the baselines with respect to time and memory space efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2108_04000
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Efficient algorithms for collecting the statistics of large-scale IP address data Liu, Hui Cao, Yi Cai, Zehan Mao, Hua Chen, Jie Computational Complexity Compiling the statistics of large-scale IP address data is an essential task in network traffic measurement. The statistical results are used to evaluate the potential impact of user behaviors on network traffic. This requires algorithms that are capable of storing and retrieving a high volume of IP addresses within time and memory constraints. In this paper, we present two efficient algorithms for collecting the statistics of large-scale IP addresses that balance time efficiency and memory consumption. The proposed solutions take into account the sparse nature of the statistics of IP addresses while building the hash function and maintain a dynamic balance among layered memory blocks. There are two layers in the first proposed method, each of which contains a limited number of memory blocks. Each memory block contains 256 elements of size $256 \times 8$ bytes for a 64-bit system. In contrast to built-in hash mapping functions, the proposed solution completely avoids expensive hash collisions while retaining the linear time complexity of hash-based solutions. Moreover, the mechanism dynamically determines the hash index length according to the range of IP addresses, and can balance the time and memory constraints. In addition, we propose an efficient parallel scheme to speed up the collection of statistics. The experimental results on several synthetic datasets show that the proposed method substantially outperforms the baselines with respect to time and memory space efficiency.
title	Efficient algorithms for collecting the statistics of large-scale IP address data
topic	Computational Complexity
url	https://arxiv.org/abs/2108.04000

Similar Items