Saved in:
Bibliographic Details
Main Authors: Mohapatra, Sasmita, Yang, Weiming, Yang, Zhengtang, Wang, Chenxiao, Ma, Jinxin, Pavlis, Gary L., Wang, Yinzhi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2504.09075
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909764859985920
author Mohapatra, Sasmita
Yang, Weiming
Yang, Zhengtang
Wang, Chenxiao
Ma, Jinxin
Pavlis, Gary L.
Wang, Yinzhi
author_facet Mohapatra, Sasmita
Yang, Weiming
Yang, Zhengtang
Wang, Chenxiao
Ma, Jinxin
Pavlis, Gary L.
Wang, Yinzhi
contents This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance compute cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model where processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service's Lamba service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data.
format Preprint
id arxiv_https___arxiv_org_abs_2504_09075
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Parallel Seismic Data Processing Performance with Cloud-based Storage
Mohapatra, Sasmita
Yang, Weiming
Yang, Zhengtang
Wang, Chenxiao
Ma, Jinxin
Pavlis, Gary L.
Wang, Yinzhi
Geophysics
Distributed, Parallel, and Cluster Computing
This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance compute cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model where processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service's Lamba service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data.
title Parallel Seismic Data Processing Performance with Cloud-based Storage
topic Geophysics
Distributed, Parallel, and Cluster Computing
url https://arxiv.org/abs/2504.09075