Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.09075 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909764859985920 |
|---|---|
| author | Mohapatra, Sasmita Yang, Weiming Yang, Zhengtang Wang, Chenxiao Ma, Jinxin Pavlis, Gary L. Wang, Yinzhi |
| author_facet | Mohapatra, Sasmita Yang, Weiming Yang, Zhengtang Wang, Chenxiao Ma, Jinxin Pavlis, Gary L. Wang, Yinzhi |
| contents | This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance compute cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model where processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service's Lamba service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2504_09075 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Parallel Seismic Data Processing Performance with Cloud-based Storage Mohapatra, Sasmita Yang, Weiming Yang, Zhengtang Wang, Chenxiao Ma, Jinxin Pavlis, Gary L. Wang, Yinzhi Geophysics Distributed, Parallel, and Cluster Computing This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance compute cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model where processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service's Lamba service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data. |
| title | Parallel Seismic Data Processing Performance with Cloud-based Storage |
| topic | Geophysics Distributed, Parallel, and Cluster Computing |
| url | https://arxiv.org/abs/2504.09075 |