Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.11568 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908608197820416 |
|---|---|
| author | Stasinos, Stylianos Mensio, Martino Lazovik, Elena Trantas, Athanasios |
| author_facet | Stasinos, Stylianos Mensio, Martino Lazovik, Elena Trantas, Athanasios |
| contents | Biodiversity research requires complete and detailed information to study ecosystem dynamics at different scales. Employing data-driven methods like Machine Learning is getting traction in ecology and more specific biodiversity, offering alternative modelling pathways. For these methods to deliver accurate results there is the need for large, curated and multimodal datasets that offer granular spatial and temporal resolutions. In this work, we introduce BioCube, a multimodal, fine-grained global dataset for ecology and biodiversity research. BioCube incorporates species observations through images, audio recordings and descriptions, environmental DNA, vegetation indices, agricultural, forest, land indicators, and high-resolution climate variables. All observations are geospatially aligned under the WGS84 geodetic system, spanning from 2000 to 2020. The dataset is available at https://huggingface.co/datasets/ BioDT/BioCube, the acquisition and processing code base at https://github.com/BioDT/bfm-data. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2505_11568 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | BioCube: A Multimodal Dataset for Biodiversity Research Stasinos, Stylianos Mensio, Martino Lazovik, Elena Trantas, Athanasios Quantitative Methods Artificial Intelligence Machine Learning Biodiversity research requires complete and detailed information to study ecosystem dynamics at different scales. Employing data-driven methods like Machine Learning is getting traction in ecology and more specific biodiversity, offering alternative modelling pathways. For these methods to deliver accurate results there is the need for large, curated and multimodal datasets that offer granular spatial and temporal resolutions. In this work, we introduce BioCube, a multimodal, fine-grained global dataset for ecology and biodiversity research. BioCube incorporates species observations through images, audio recordings and descriptions, environmental DNA, vegetation indices, agricultural, forest, land indicators, and high-resolution climate variables. All observations are geospatially aligned under the WGS84 geodetic system, spanning from 2000 to 2020. The dataset is available at https://huggingface.co/datasets/ BioDT/BioCube, the acquisition and processing code base at https://github.com/BioDT/bfm-data. |
| title | BioCube: A Multimodal Dataset for Biodiversity Research |
| topic | Quantitative Methods Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2505.11568 |